Having spent a couple of evenings back porting Async to iOS 7 and OS X Mavericks (10.9) and releasing it as Async.legacy I've gone back to trying to squeeze some more performance out of the GrayScott Cellular Automata app that Simon Gladman presented at the last London Swift Meetup.
For me this was an interesting case to see how fast something that is almost entirely CPU and memory bound can be made and it gave me a chance to play. Not many things need optimising if well strutured but this was a case where it could clearly be relevant.
Simon's original code calculates about 10fps in debug mode and displays many of them. Built with optimisations it increases to about 41fps calculated but it very rarely updates the screen due to the timing mechanism he used rather than calling back to the main thread. All this was done on a 70 pixel square calculation.
Running the latest code on a 70 pixel square calculation it calculates between 1550 and 1600 frames most seconds for a speedup of about 40 times and it is displaying far more frames to the screen too (well assigning the images to the image property of the imageView, the screen framerate is far lower).
This post focusses on making the main solving work multi-threaded for performance and in the optimisation of the inner loop. At this point we are moving beyond the point where we are optimising by improving the style, purity and immutability of the code. Some of the changes (inlining simple functions) go directly against good style and should only be done in inner loops. The parallelisation of the main solver is also something which makes the code less clean and tidy as is the incorporation of the pixelData generation into the main loop.