Friday, February 21, 2014

Java Threads by Scott Oaks

This was a great book, I highly recommend it for anyone who wants a thorough introduction to threads, how to use them and the terminology. If you don't have time to read the whole book then here is my summary of the main points made in each chapter:

Chapter 1 Intro:

Chapter 2:
Thread: separate task within a running application. Every app has at least the main thread which is where most or all of the code is run, many applications have several threads that the dev might not know about for things like a GUI.
Lifecycle: call start, start creates a new thread which executes the run method, when run finishes the thread is shut down, so now there are only the threads running that were running before start was called.  Most common ways to stop a thread are a flag and the interrupt method.

Chapter 3 Data Synchronization:
When two different threads could access the same method declare the method synchronized and then the JVM will manage it for you by only allowing one thread in at a time.
What if however you have a flag that stops the run method, if you declared the run method and the method that set the flag as synchronized then you could never set the flag. So there are other ways, you could synchronize just the section of code that checks the flag, or you could declare the flag as volatile. A variable that is volatile will always be loaded, instead of getting used from a cache, synchronized getters and setters have the same effect, but volatile removes the boilerplate. Also in java reading and writing variables is atomic except for long and double.
JVM scheduling changes can occur at any time, atomic operations are the only thing guaranteed to finish before the scheduler swaps out one thread for another.
To make a block of code synchronized there are a few ways to do that: use classes that implement the Lock interface, use the lock() and unlock() methods in a try/finally. The synchronized keyword can be used on a block of code, you just need to provide an object to synchronize on, usually this is fine, unless there is a specific object the block uses, then use that object. Make the scope as small as possible.
Deadlock: two or more threads waiting on the same locks to be freed, and the circumstances of the program are such that the locks will never be freed.

Chapter 4 Thread Notification:
Wait: must be used in a synchronized block, when called it frees the lock and when it finishes it reaquires the lock.
Notify: must be used in a synchronized block, when called it notifies the any threads waiting that the condition has occurred. No guarantees on which thread will be notified, could be any.
NotifyAll: makes sure all waiting threads get the notification. Wakes up all threads, but they still have to wait for the lock, so they won’t run in parallel.
Notify and Notify all have no specific condition they notify threads about, when a notify() or notifyAll() is called all threads wake up and check to see if they can proceed, because of this, each thread that called wait() needs to have it in a loop so they can continue waiting if the notify wasn’t for them.
You can create a lock object, Object someObj = new Object() and use that for synchronization and waiting and notifications. This allows code to be more parallel so other parts of an object can be used while some part is waiting on a notification, without this the whole object is shut down while one part waits on the notification.
If you use lock objects for synchronization then you have to use Condition objects for wait and notify, because the lock object already overrides the object.wait and object.notify methods to implement the lock.

Chapter 5 Minimal Synchronization Techniques:
Not understanding synchronization can lead to slower code than single threaded code. A lock that is highly contended will slow down the app because many threads get stuck waiting at the same spot.
Try to minimize the scope of synchronization blocks.
Use atomic classes to reduce the need for synchronization. AtomicInteger, AtomicLong, AtomicBoolean, AtomicReference, AtomicIntegerArray, AtomicLongArray, AtmoicReferenceArray, AtomicIntegerFieldUpdater, AtomicLongFieldUpdater, AtomicReferenceFieldUpdater, AtomicMarkableReference, and AtomicStampedReference. The int, long, Boolean and reference classes work just like regular int, long, Boolean and ref classes do, they are just atomic. The array classes provide atomic access to one element. The field updated classes allow you to access a variable atomically that wasn’t declared atomic and you can’t change to atomic. Use the newUpdater() method on the fieldupdater classes. The reference class lets you take some non-atomic object and treat it as atomic.
The purpose of synchronization is not to prevent all race conditions; it is to prevent problem race conditions.
There are trade-offs with minimal synchronization, it might remove slowness caused by synchronization but it makes the code harder to read and maintain.
The java memory model usually puts variables in a register when a method is loaded to continue running, declaring a variable volatile makes the jvm read that value directly instead of loading it into a register first. This is important because if the code reads it out of a local variable stored in a register instead of directly then the value isn’t shared between threads and changes are local to each thread.

Chapter 6 Advanced Synchronization Topics:
Semaphore: lock with a counter. If the permit limit is set to 1, then it’s just like a lock, if it’s permit limit is set to more than one, then it lets that number of threads in before locking.
Barrier: some point where all threads must meet so results can be combined. Conditions, or wait and notify do almost the same thing.
Countdown Latch: lets you set a countdown counter, when it reaches zero all waiting threads are released.
Exchanger: class that lets two threads meet and exchange data, more like a datastructure.
Reader/Writer Locks: lock that allows multiple reads but only a single write.
Deadlock: When two or more threads are waiting on conflicting conditions. Best defense, create a lock hierarchy when designing the program. It’s difficult to debug deadlock because there could be multiple layers reaching a lock or requesting a lock and it’s not always easy to know which one is causing it. The book has a class which can be used to replace all calls to synchronized and any other java lock and the class will report errors when a deadlock condition happens, slow, not good for production but good for testing.
Lock Starvation: This happens when multiple threads contend for one lock and one or more threads never get scheduled when they can acquire the lock. Can be fixed with a fair lock which makes sure each thread gets it eventually.
Reader/Writer lock starvation: happens if the readers aren’t prevented from acquiring the lock when a writer wants it. If readers just keep getting the lock even when a writer is waiting then the writer could never get its turn.

Chapter 7
Skipped because it only dealt with GUIs and Swing

Chapter 8 Threads and Collection Classes:
Thread safe collections: Vector, Stack, Hashtable, ConcurrentHashMap, CopyOnWriteArrayList, CopyOnWriteArraySet, ConcurrentLinkedQueue
Just because those classes are thread safe, doesn’t mean they can be used in any threaded application.
There are two cases for managing synchronization with collection classes, in the collection or in your program.
The easy case is to let the collection manage it by either using a thread safe collection, or creating a wrapper for the collection or by using the Collections.synchronized<Collection Type> methods.
The harder case is when you need to do more than one thing on the collection atomically, like a get, and update. In this case we have to use synchronized blocks.
Thread notification classes, these classes can be used with threads and simplify the usage of collections by providing methods to handle out-of-space and out-of-data errors: ArrayBlockingQueue, LinkedBlockingQueue, SynchronousQueue, PriorityBlockingQueue, DelayQueue.
When using iterators you have to consider carefully what will happen, either synchronize the object and the block that uses the iterator or use a class that makes a copy of the collection, the copy method can lead to race conditions so only use it if you don’t care that the data might be slightly out of date.
Producer/consumer pattern: specific threads create data and different threads use the data, more separation, less concerns with race conditions.

Chapter 9 Thread Scheduling:
The JVM or System has to manage which threads are running on the CPU when there are more threads than CPUs. There is no specification on how the JVM needs to manage the threads so different JVM’s may manage them differently. The only requirement is that the JVM implement some kind of priority based scheduling, this allows the developer to give threads different priorities so higher priority threads will get run more often.
Preemption: higher priority threads take control of the CPU from lower priority threads and get to do their work before the lower priority threads get scheduled again.
Time-slicing: Threads of the same priority get to run on the CPU for a short amount of time before another thread of the same priority gets to run. Kind of like kids in line for a slide, the kids get to ride the slide one at a time, but just because they went down once doesn’t mean they are done, they may get back in line to ride again.
Priority Inversion: When a low priority thread has a lock that a high priority thread is waiting on, the lower priority thread’s priority is temporarily changed to be the same as the high priority thread so that it can do its thing and release the lock its holding.
Complex Priorities: OS’s usually do something more to deal with thread priorities, like they may add the time waiting to the threads priority so that low priority threads eventually get a turn to work even though other higher priority threads are still waiting.
Thread priorities in Java are really just suggestions to the JVM. It doesn’t guarantee anything to set one thread as max priority and another as min. It will most of the time make a difference but there are no guarantees.
Java has various system depending threading models. The Green model, is all managed by the JVM, in this case threads are and idea that don’t extend past the single thread the JVM is running in, so effectively, even though threads are in use, everything is still single threaded. The windows model is one to one, if you create a java thread on a windows JVM the JVM passes that thread onto the os which creates a new thread, this is truly multithreading. Linux is similar to windows. Solaris is quite different, too hard to explain so read up on it if you care.

Chapter 10 Thread Pools:
 A pool can increase the throughput of an application by managing the threads smarter.
You need to have more threads available than CPU’s so that if a thread blocks another one can work in its place.

Chapter 11 Task Scheduling:
Timer class is like a single threaded pool  for tasks that should be run after some amount of time.
TimerTask class needs extended to allow other classes to be scheduled with the timer class. Instances of this class should do checks to make sure it should run, because there are times where it could get scheduled multiple times and not have a chance to run, so when it does run it might be running after another instance that ran and was rescheduled.

Chapter 12 Threads and I/O:
In older versions of Java i/o requests were always blocking, this affects the ability of threads and the system to get work done when entire threads would halt waiting for data. In more recent versions of Java there is a new i/o package which doesn’t block on i/o requests. The new i/o uses a thread to step through all i/o connections and check for ones that are ready, it processes those then goes back to looking for connections that are ready. The main difference between blocking and non-blocking is when you do a read or write in blocking i/o the system will write or read everything before it returns from that call, in non-blocking i/o when you read or write, it will read or write all it can at the time and then in your program you have to account for the times when not all the data is read or written and then you have to handle those cases.

Chapter 13 Miscellaneous Thread Topics:
Thread Groups: every thread created belongs to a group, the default is the main group. You can interrupt all threads  in a group by calling the interrupt method on the group.
Deamon threads: threads that serve other threads, like the garbage collector. When all user threads are closed the deamon threads will close and then the jvm can shut down.
Thread stacks aren’t stored on the heap they are stored on the general jvm memory. So creating new threads could cause an out of memory error if there isn’t enough memory set aside for the jvm.

Chapter 14 Thread Performance:
Java programs are optimized as they are run, so before testing for performance you need to run the code a bunch.
Don’t optimize early, you will make things overly complex and probably won’t gain any performance improvements. Wait until the application is in development and have regular benchmark tests and only optimize when something isn’t within the performance standards.
There is almost no performance improvement for using synchronized collections versus un-sychronized collections when you are in a single threaded environment
Switching from regular variables to atomic variables gives a significant performance boost. Code complexity increases though.
Don’t overuse thread pools, if the application design makes sense with a thread pool use it, otherwise don’t. The performance improvement easily gets lost in the execution of the thread’s content.

Chapter 15 Parallelizing Loops for Multiprocessor Machines:
Parallelize the outer loop, re-write if needed so you can.
Do parallelization where CPU intensive work is happening. Don’t worry about other places. This is usually going to be in some type of loop.
 

No comments:

Post a Comment