Posts for Tag: optimisation

Swift Arrays are not Threadsafe

Swift arrays are not necessarily contiguous blocks of memory (although they may be in some cases so some code may work in some cases but not in others). Some operations may trigger them to be converted to a contiguous block of memory and I believe that is what can particularly cause issues when there is access from multiple threads.

[Update - This post is more popular than I expected (thanks Reddit). I've added some additional notes at the bottom. Also the previous post about optimisation is probably more interesting and valuable.]

Even the code as described here may not work (and I haven't tested at all) when there are multiple reads or writes that access the same elements of the array. All the code I used was either reading from a constant array (safe from multiple threads/queues) or writing into non-overlapping ranges with no reads occurring (only seems safe with the technique described here).

.withUnsafeMutableBufferPointer

Swift Optimisation Again - Slides from Swift London 21st October 2014

Update (23rd October 2014) - Now with video link HERE.

Swift optimisation presentation refined and developed further.

Please contact me if you want the source version and please let me know if you make use of it or find it useful.

Description of the demo from a previous event. Note that for Xcode 6.1 you will need to use the 0x, 1x, 2x and fullSpeedXC61 branches instead of the 0, 1, 2 and fullSpeed branches respectively. Likewise master currently has an Xcode 6.1 version although I will probably merge that to the real master shortly.


London Swift Presentation - Swifter Swift

This is the Swifter Swift presentation I gave at the London Swift Meetup in Clapham today (15th September 2014). Apologies for the stock photos/standard template, I focussed on the content.

I also showed the optimisation levels as described in this blog post.

Things I Forgot To Mention

Use constants - declare with let unless avoidable, allows extra optimisation. Test before changing to mutable state within a function for performance reasons.

Currently the optimiser can do better within a particular file although this will be fixed at some point.

Github Repos

https://github.com/josephlord/GrayScott - Optimised version of Simon Gladwell's Cellular Automata project.

Async.legacy - My iOS7 / OS X 10.9 compatible fork.

Async - Original by Tobias Duemunk




From 41 frames per second to 1560 - Full app 38x speedup

Having spent a couple of evenings back porting Async to iOS 7 and OS X Mavericks (10.9) and releasing it as Async.legacy I've gone back to trying to squeeze some more performance out of the GrayScott Cellular Automata app that Simon Gladman presented at the last London Swift Meetup.

For me this was an interesting case to see how fast something that is almost entirely CPU and memory bound can be made and it gave me a chance to play. Not many things need optimising if well strutured but this was a case where it could clearly be relevant.

Simon's original code calculates about 10fps in debug mode and displays many of them. Built with optimisations it increases to about 41fps calculated but it very rarely updates the screen due to the timing mechanism he used rather than calling back to the main thread. All this was done on a 70 pixel square calculation.

Running the latest code on a 70 pixel square calculation it calculates between 1550 and 1600 frames most seconds for a speedup of about 40 times and it is displaying far more frames to the screen too (well assigning the images to the image property of the imageView, the screen framerate is far lower).

This post focusses on making the main solving work multi-threaded for performance and in the optimisation of the inner loop. At this point we are moving beyond the point where we are optimising by improving the style, purity and immutability of the code. Some of the changes (inlining simple functions) go directly against good style and should only be done in inner loops. The parallelisation of the main solver is also something which makes the code less clean and tidy as is the incorporation of the pixelData generation into the main loop.