A Programmer's Bane: February 2014

This was another great book I read last year. It covers several ways to get better performance from your website without optimizing your code. Surprisingly, as Steve shows, you get better performance gains by making these non-code related tweaks than you do by optimizing code.

Find below my summary of the chapters and what I thought was important, for his own summaries go to his blog at http://stevesouders.com/efws/blogposts.php

Chapter A:

%80-%90 of response time is because of front end problems not back end ones.

Chapter B:

Chapter 1: Make fewer http requests

There are several things that can be done here.

-Combine images and use image maps or css sprites so that for a page only one image is downloaded instead of multiple small images.

-Combine javascript into one minified file

-Combine CSS into one file.

The majority of a pages download time is spent downloading images, scripts and stylesheets so by limiting the number of files a page has to download you eliminate expensive overhead caused by multiple HTTP requests for the individual files.

Chapter 2: Use a CDN

A CDN is a server specifically for static content, these servers should be located closer to users, since they only host the static content, fewer are required to serve the same load. Page load times decrease because the data is closer to the user.

Chapter 3: Add Expires Headers

Adding a max-age or expires header tells the browser how long the file should be cached for. On Apache you can set a default that will be used for all files of a certain type,

<FilesMatch “\.(gif|jpg|js|css)$”> ExpiresDefault “access plus 2 months” </FilesMatch>

Chapter 4: Use gZip

Compress all html, scripts and css. Don’t compress images or pdfs because they are already compressed and compressing them again could increase file size. Configure apache to compress automatically if file sizes are greater than 1-2k. The improvements seen will depend on the size of the file, the users connection speed, and the distance the packets have to travel. gZipping does add load to the server.

Apache 2.x uses the mod_deflate module. Use AddOutputFilterByType DEFLATE text/html text/css application/x-javascript to compress html, css and js files.

Chapter 5: Put stylesheets at the top

All stylesheets should be in the HTML <Head> </head> tags, if they aren’t then you risk one of two problems. Either the user won’t see any of the page until everything is downloaded which makes the page look like it’s frozen, or you show the user portions of the page before the style is downloaded, so they see it unstyled and then styled when the css is downloaded, both are bad user experiences. If style sheets are kept in the <head> tag then the page can be loaded progressively and styled correctly.

Chapter 6: Put scripts at the bottom

Scripts block parallel downloading so having a script at the top or middle of the page causes the entire page to wait for it to download. Putting scripts at the bottom of the page lets all the content that can be downloaded parallel to finish before the scripts start, that way most if not all of the content on the page is displayed before the scripts start downloading. Sometimes you can’t move a script because the layout of the page depends on it. If you can move it, you should.

Chapter 7: Avoid CSS Expressions

If you have to use an expression, have it call a javascript function that overwrites the expression so it is only evaluated once. Expressions can be evaluated thousands of times if they aren’t overwritten.

Chapter 8: Make JavaScript and CSS external

Move javascript into external files so that the files can be cached which will reduce the number of http requests for primed caches. Using the future expires with this will help speed things up even more because a user’s cache will stay primed longer. Inlining javascript will make for fewer http requests but no caching will be possible, and the page size will be bigger.

If you have to inline, you could try dynamically inlinning. This is where in your php/jsp/.net code you copy the js/css file contents into a <script>/<style> tag when the page is requested and then write some javascript that will download the files separately after the page has loaded, then you set a cookie and in the jsp/php/asp file you check for that cookie and only inline the javascript if the cookie doesn’t exist. This way the user gets the files when they aren’t doing anything else anyway, but the next time they hit the page they won’t have to download the javascript again, resulting in faster page loads and fewer http requests when users access the same page multiple times.

Chapter 9: Reduce DNS Lookups

Have fewer domains to go to on a page, try to put everything on the same domain so it only has to do one lookup.

Use keep-alive so more data can be retrieved on a connection

Chapter 10: Minify JavaScript

Gzip combined with JSMin can reduce javascript files sizes by almost 80%

JSMin probably has a version that will work with jsps so we can minify inline scripts

Chapter 11: Avoid Redirects

Instead of using server redirects, server aliases, mod_rewrite and DirectorySlash can be used to accomplish the same task.

Programming to the root instead of the current folder helps because then urls like something.com/something don’t have to be changed to something.com/something/

To track web page use, instead of using redirects you can use referrer logging, where you log the referrer site for all traffic.

Chapter 12: Remove duplicate scripts

If you have the same script included twice IE will download it twice and won’t allow it to be cached, increasing page load time every time.

Functions in duplicated scripts will be executed as many times as the script is duplicated. So slower run times, and potentially invalid results.

Chapter 13: ETags

Make sure etags are set up right. Etags allow you to give content a unique id, however they don’t work across domains so multiple server setups probably shouldn’t use them.

Chapter 14: Make Ajax Cacheable

Use query strings when making ajax requests and make sure they have far future expire dates so that ajax requests that do the same thing each time don’t have to go all the way to the server to get that data. Some obviously can’t be cached because they return different data each time.

Use the YSlow firebug plug in to get help improving page load times.

This was a great book, I highly recommend it for anyone who wants a thorough introduction to threads, how to use them and the terminology. If you don't have time to read the whole book then here is my summary of the main points made in each chapter:

Chapter 1 Intro:

Chapter 2:

Thread: separate task within a running application. Every app has at least the main thread which is where most or all of the code is run, many applications have several threads that the dev might not know about for things like a GUI.

Lifecycle: call start, start creates a new thread which executes the run method, when run finishes the thread is shut down, so now there are only the threads running that were running before start was called. Most common ways to stop a thread are a flag and the interrupt method.

Chapter 3 Data Synchronization:

When two different threads could access the same method declare the method synchronized and then the JVM will manage it for you by only allowing one thread in at a time.

What if however you have a flag that stops the run method, if you declared the run method and the method that set the flag as synchronized then you could never set the flag. So there are other ways, you could synchronize just the section of code that checks the flag, or you could declare the flag as volatile. A variable that is volatile will always be loaded, instead of getting used from a cache, synchronized getters and setters have the same effect, but volatile removes the boilerplate. Also in java reading and writing variables is atomic except for long and double.

JVM scheduling changes can occur at any time, atomic operations are the only thing guaranteed to finish before the scheduler swaps out one thread for another.

To make a block of code synchronized there are a few ways to do that: use classes that implement the Lock interface, use the lock() and unlock() methods in a try/finally. The synchronized keyword can be used on a block of code, you just need to provide an object to synchronize on, usually this is fine, unless there is a specific object the block uses, then use that object. Make the scope as small as possible.

Deadlock: two or more threads waiting on the same locks to be freed, and the circumstances of the program are such that the locks will never be freed.

Chapter 4 Thread Notification:

Wait: must be used in a synchronized block, when called it frees the lock and when it finishes it reaquires the lock.

Notify: must be used in a synchronized block, when called it notifies the any threads waiting that the condition has occurred. No guarantees on which thread will be notified, could be any.

NotifyAll: makes sure all waiting threads get the notification. Wakes up all threads, but they still have to wait for the lock, so they won’t run in parallel.

Notify and Notify all have no specific condition they notify threads about, when a notify() or notifyAll() is called all threads wake up and check to see if they can proceed, because of this, each thread that called wait() needs to have it in a loop so they can continue waiting if the notify wasn’t for them.

You can create a lock object, Object someObj = new Object() and use that for synchronization and waiting and notifications. This allows code to be more parallel so other parts of an object can be used while some part is waiting on a notification, without this the whole object is shut down while one part waits on the notification.

If you use lock objects for synchronization then you have to use Condition objects for wait and notify, because the lock object already overrides the object.wait and object.notify methods to implement the lock.

Chapter 5 Minimal Synchronization Techniques:

Not understanding synchronization can lead to slower code than single threaded code. A lock that is highly contended will slow down the app because many threads get stuck waiting at the same spot.

Try to minimize the scope of synchronization blocks.

Use atomic classes to reduce the need for synchronization. AtomicInteger, AtomicLong, AtomicBoolean, AtomicReference, AtomicIntegerArray, AtomicLongArray, AtmoicReferenceArray, AtomicIntegerFieldUpdater, AtomicLongFieldUpdater, AtomicReferenceFieldUpdater, AtomicMarkableReference, and AtomicStampedReference. The int, long, Boolean and reference classes work just like regular int, long, Boolean and ref classes do, they are just atomic. The array classes provide atomic access to one element. The field updated classes allow you to access a variable atomically that wasn’t declared atomic and you can’t change to atomic. Use the newUpdater() method on the fieldupdater classes. The reference class lets you take some non-atomic object and treat it as atomic.

The purpose of synchronization is not to prevent all race conditions; it is to prevent problem race conditions.

There are trade-offs with minimal synchronization, it might remove slowness caused by synchronization but it makes the code harder to read and maintain.

The java memory model usually puts variables in a register when a method is loaded to continue running, declaring a variable volatile makes the jvm read that value directly instead of loading it into a register first. This is important because if the code reads it out of a local variable stored in a register instead of directly then the value isn’t shared between threads and changes are local to each thread.

Chapter 6 Advanced Synchronization Topics:

Semaphore: lock with a counter. If the permit limit is set to 1, then it’s just like a lock, if it’s permit limit is set to more than one, then it lets that number of threads in before locking.

Barrier: some point where all threads must meet so results can be combined. Conditions, or wait and notify do almost the same thing.

Countdown Latch: lets you set a countdown counter, when it reaches zero all waiting threads are released.

Exchanger: class that lets two threads meet and exchange data, more like a datastructure.

Reader/Writer Locks: lock that allows multiple reads but only a single write.

Deadlock: When two or more threads are waiting on conflicting conditions. Best defense, create a lock hierarchy when designing the program. It’s difficult to debug deadlock because there could be multiple layers reaching a lock or requesting a lock and it’s not always easy to know which one is causing it. The book has a class which can be used to replace all calls to synchronized and any other java lock and the class will report errors when a deadlock condition happens, slow, not good for production but good for testing.

Lock Starvation: This happens when multiple threads contend for one lock and one or more threads never get scheduled when they can acquire the lock. Can be fixed with a fair lock which makes sure each thread gets it eventually.

Reader/Writer lock starvation: happens if the readers aren’t prevented from acquiring the lock when a writer wants it. If readers just keep getting the lock even when a writer is waiting then the writer could never get its turn.

Chapter 7

Skipped because it only dealt with GUIs and Swing

Chapter 8 Threads and Collection Classes:

Thread safe collections: Vector, Stack, Hashtable, ConcurrentHashMap, CopyOnWriteArrayList, CopyOnWriteArraySet, ConcurrentLinkedQueue

Just because those classes are thread safe, doesn’t mean they can be used in any threaded application.

There are two cases for managing synchronization with collection classes, in the collection or in your program.

The easy case is to let the collection manage it by either using a thread safe collection, or creating a wrapper for the collection or by using the Collections.synchronized<Collection Type> methods.

The harder case is when you need to do more than one thing on the collection atomically, like a get, and update. In this case we have to use synchronized blocks.

Thread notification classes, these classes can be used with threads and simplify the usage of collections by providing methods to handle out-of-space and out-of-data errors: ArrayBlockingQueue, LinkedBlockingQueue, SynchronousQueue, PriorityBlockingQueue, DelayQueue.

When using iterators you have to consider carefully what will happen, either synchronize the object and the block that uses the iterator or use a class that makes a copy of the collection, the copy method can lead to race conditions so only use it if you don’t care that the data might be slightly out of date.

Producer/consumer pattern: specific threads create data and different threads use the data, more separation, less concerns with race conditions.

Chapter 9 Thread Scheduling:

The JVM or System has to manage which threads are running on the CPU when there are more threads than CPUs. There is no specification on how the JVM needs to manage the threads so different JVM’s may manage them differently. The only requirement is that the JVM implement some kind of priority based scheduling, this allows the developer to give threads different priorities so higher priority threads will get run more often.

Preemption: higher priority threads take control of the CPU from lower priority threads and get to do their work before the lower priority threads get scheduled again.

Time-slicing: Threads of the same priority get to run on the CPU for a short amount of time before another thread of the same priority gets to run. Kind of like kids in line for a slide, the kids get to ride the slide one at a time, but just because they went down once doesn’t mean they are done, they may get back in line to ride again.

Priority Inversion: When a low priority thread has a lock that a high priority thread is waiting on, the lower priority thread’s priority is temporarily changed to be the same as the high priority thread so that it can do its thing and release the lock its holding.

Complex Priorities: OS’s usually do something more to deal with thread priorities, like they may add the time waiting to the threads priority so that low priority threads eventually get a turn to work even though other higher priority threads are still waiting.

Thread priorities in Java are really just suggestions to the JVM. It doesn’t guarantee anything to set one thread as max priority and another as min. It will most of the time make a difference but there are no guarantees.

Java has various system depending threading models. The Green model, is all managed by the JVM, in this case threads are and idea that don’t extend past the single thread the JVM is running in, so effectively, even though threads are in use, everything is still single threaded. The windows model is one to one, if you create a java thread on a windows JVM the JVM passes that thread onto the os which creates a new thread, this is truly multithreading. Linux is similar to windows. Solaris is quite different, too hard to explain so read up on it if you care.

Chapter 10 Thread Pools:

A pool can increase the throughput of an application by managing the threads smarter.

You need to have more threads available than CPU’s so that if a thread blocks another one can work in its place.

Chapter 11 Task Scheduling:

Timer class is like a single threaded pool for tasks that should be run after some amount of time.

TimerTask class needs extended to allow other classes to be scheduled with the timer class. Instances of this class should do checks to make sure it should run, because there are times where it could get scheduled multiple times and not have a chance to run, so when it does run it might be running after another instance that ran and was rescheduled.

Chapter 12 Threads and I/O:

In older versions of Java i/o requests were always blocking, this affects the ability of threads and the system to get work done when entire threads would halt waiting for data. In more recent versions of Java there is a new i/o package which doesn’t block on i/o requests. The new i/o uses a thread to step through all i/o connections and check for ones that are ready, it processes those then goes back to looking for connections that are ready. The main difference between blocking and non-blocking is when you do a read or write in blocking i/o the system will write or read everything before it returns from that call, in non-blocking i/o when you read or write, it will read or write all it can at the time and then in your program you have to account for the times when not all the data is read or written and then you have to handle those cases.

Chapter 13 Miscellaneous Thread Topics:

Thread Groups: every thread created belongs to a group, the default is the main group. You can interrupt all threads in a group by calling the interrupt method on the group.

Deamon threads: threads that serve other threads, like the garbage collector. When all user threads are closed the deamon threads will close and then the jvm can shut down.

Thread stacks aren’t stored on the heap they are stored on the general jvm memory. So creating new threads could cause an out of memory error if there isn’t enough memory set aside for the jvm.

Chapter 14 Thread Performance:

Java programs are optimized as they are run, so before testing for performance you need to run the code a bunch.

Don’t optimize early, you will make things overly complex and probably won’t gain any performance improvements. Wait until the application is in development and have regular benchmark tests and only optimize when something isn’t within the performance standards.

There is almost no performance improvement for using synchronized collections versus un-sychronized collections when you are in a single threaded environment

Switching from regular variables to atomic variables gives a significant performance boost. Code complexity increases though.

Don’t overuse thread pools, if the application design makes sense with a thread pool use it, otherwise don’t. The performance improvement easily gets lost in the execution of the thread’s content.

Chapter 15 Parallelizing Loops for Multiprocessor Machines:

Parallelize the outer loop, re-write if needed so you can.

Do parallelization where CPU intensive work is happening. Don’t worry about other places. This is usually going to be in some type of loop.

A Programmer's Bane

Friday, February 21, 2014

High Performance Web Sites by Steve Souders

Java Threads by Scott Oaks