Optimising Swift With Functional Style - 50x Speed boost from changing 1 Keyword

At yesterday's Swift London Meetup Simon Gladman (aka @FlexMonkey) presented the Gray Scott cellular automata application he had been developing to explore threading in iOS using NSOperation. During the presentation there were a couple of things that were apparent and looked possible to improve on. Firstly Simon had used a timer to work around a difficulty that he had in calling back onto the main thread and secondly he found that he got better performance using an NSMutableArray than using Swift Arrays. When I got home I forked the repo and got to work. This post describes the changes I made. The bulk were made together in parallel before I even ran the code but I will break down the changes.

This post describes the significant changes that resulted in needing less code, being clearer (at least in my view) and actually speeding up some sections by about 18 times. This speedup is in particular array processing code and largely the result of changing from NSMutableArray to a Swift array of structs which should be accessed with much less indirection. This improvement wasn't a direct path and if you browse the branches in my fork of the Repo you can see some dead ends and some of the steps along the way. The changes I'm discussing in this post can be seen in the pull request.

NSOperation to GCD (Grand Central Dispatch)

Simon used NSOperation because that felt familiar to him from the languages that he had developed in previously. I believe that GCD is now the recommended way of doing background and asynchronous calls on iOS and OS X although I often find with the Apple documentation that it doesn't guide you very well when you have a choice between different approaches in a similar area of the code so it is completely understandable. I haven't really looked at NSOperation much myself but I have used GCD and I understand how to make calls asynchronously and critically how to make them make to the main queue.

The NSOperation classes (the renderer and the solver) could be simplified by removing the classes entirely and leaving just the bare function to be called changing from:

to:

Then change of approach from NSOperation to GCD enabled the passing in of arguments to what could become pure functions which will be important in the next section.

The actual GCD code to dispatch the functions I kept within the ViewController and the whole flow is within a single function responsible for first dispatching the solver and then the renderer on the background queue and finally calling back to the main queue to update the image in the view.

As you can see with Swift block syntax using the GCD functions becomes quite a succinct way of handling concurrency and that passing results back to the main thread can be done by nesting a call back to the main queue within it. I've taken advantage of the Swift syntax that allows a last argument that is a block can be placed after the closing parenthesis of the function call. The trick to keeping the code clear is to keep the contents of the block small by just calling into a couple of functions rather than directly including significant logic (it will work but be hard to read).

I didn't mind capturing self in this instance because the viewcontroller is long lived in this case as the main viewcontroller of the app. In other situations other approaches such as holding only a weak reference to self and stopping execution if that becomes nil may be appropriate which would look something like this:

Functional and Pure Swift Style

The NSOperation style of concurrency forced Simon into setting up objects, calling methods on them to set properties that the main function (which takes no arguments) can use to perform the operation. By switching to GCD I was able to convert all those properties to arguments to what could be pure functions taking arguments and returning values based purely on those inputs.

Value Types - Use Swift Arrays instead of NSMutableArrays

I immediately switched from using NSMutableArrays to using a Swift Array of the GrayScottStruct that Simon had defined as the way of handling the bulk data. I probably should have done that as a separate change but I was sure that I wanted to try to use the Swift arrays as I thought that they should be at least as quick as the Foundation classes in principle and I figured I could find a way to make it work as quickly.

One immediate advantage of this is that I could remove the casts:

When the changes were made and I got it to compile it worked first time. However it was significantly slower (about 3x) than Simon's code. The initial thing that I did was return to creating the full array at the start (actually by copying the input which is the result of the last iteration) and then update by index. This was actually a reversion to Simon's behaviour and his comment about it performing better was very useful. The improvement was noticeable but not massive, Simon's 15% estimate sounds about right.

Value Types - GrayScottStruct should be a Struct - 54x Speedup

I had assumed that the GrayScottStruct type was actually a struct. This was foolish of me for two reasons, firstly assumption is not a good practice for such things and secondly I knew that these objects were being stored in NSMutableArrays which is only possible for object types.

The change of the way this type is declared changes the time for one pass of the solver from 0.065s (release build on iPad Retina Mini) down to 0.0012s! A 54 times speedup for changing one keyword. I guess (and may investigate at a later date) that this is the benefit from not having to allocate thousands of simple objects on the heap. This speedup took the performance with Swift arrays ahead of that of the NSMutableArray version of the code by a factor of about 18 in the Solver and about two in the renderer.

You can see the structure and the main solver function that is massively affected by the nature of the structure here:

As you can see it is heavily array and memory access bound so most code will not be affected to anything like the same degree but it does show you what difference datatypes do make. If you want to try the difference with class and struct you can checkout this version and experiment yourself.

Style Issues

Struct For Parameters

I elected to wrap the parameters (the settings from the user interface) in a small struct rather than pass them as separate arguments. This is not the only way to do it, a named tuple would also have been a good option or even just separate arguments but this made sense to me. You can see it here along with how clean it makes the function signature. You've already seen the call made to this from the the dispatch method (at the moment the viewcontroller has the parameters as individual properties and it would be a good idea to make it hold the structure instead but it wasn't an area I was changing so I haven't made that change yet). I also ensured that all the contents were non optional to keep the code using it safe and clean.

Semi-colons;

Where I've been editing lines anyway I've been removing them but they are still through a lot of the code. I generally don't use them unless I really want two statements on the same line.

What Is Next?

The code is fully optimised yet, the renderer function now dominates the execution time drawing 1x1 point rectangles, a CoreImage source or something similar may be able to speed that up massively (if the code gets fast lazy execution may mean that some images never need to be rendered). [UPDATE - See next post for implementation of efficient bitmap drawing so it is no longer the bottleneck] It may also be possible to dispatch the render to multiple queues and/or solve the next image while one is rendering to increase the parallelism.

As Simon mentioned in his talk this algorithm may be suitable for executing in metal on the GPU for massive performance. That will only really help if the rendering can also be sped up which may well be possible. If you have any ideas fork the code and send a pull request.

Links: My repo, Simon's repo, Swift London Meetup

Get in touch if you need iOS development or coaching. Objective-C or Swift. See the main webpage for contact details.
5 responses
after running for one second it crashes with: fatal error: UnsafeMutablePointer.initializeFrom non-following overlapping range
Sorry I should update the post. Which branch/build were you using? It is also worth noting that Swift has come a long way since I wrote that particularly in the latest release but structs will still tend to be faster than classes (due to ARC not being needed in many cases) but that classes methods are now much faster at least when final (or automatically made final). The UnsafePointer/UnsafeMutablePointer stuff is no longer needed except for the multi threaded cases as Arrays are now much faster too. If you are using a branch using unsafe pointers without getting them from the array using withUnsafeMutablePointer and executing the operations within that block you might want to find a different version. I'll try to update at some point with some more recent branch recommendations.
or ThreadsExperiment(63320,0xb0093000) malloc: *** error for object 0x7bb0fa04: incorrect checksum for freed object - object was probably modified after being freed. :/ both seem to be in grayScottPartialSolver
thanks for swift reply. I am using |fullspeed| branch.
1 visitor upvoted this post.