Swift Performance - iOSDevUK

I presented today at iOSDevUK in Aberystwyth with some more material on Swift Performance and how to profile Swift. Also on wrapping classes in value types. Unfortunately much of the presentation was a demo and there is no video but I'm posting the materials here and will try to make a follow up post covering some more of the details in the next week or two (no promises).

Slides (note that slides a terrible communication mechanism and caveats and subtleties in the talk may be lost):

How Swift is Swift?

Slides, video and links related to my Swift Summit talk on Swift performance. Video should be available later is now available and it might be worth watching Airspeed's talk that preceded mine before watching it when it is available I'm also expecting a blog post from him shortly on his static vs. compile time talk which is also highly relevant to optimisation.

The key point in Swift is that as the compiler gets better there is no need to choose between nice code and fast code. With a little thought and knowledge it is usually possible to get close to the speed of lowish level C code with nice abstractions. Some may say that C++ is there already but I enjoy writing Swift code much more and C++ is hampered by it's C legacy and choices made many years ago (for good reasons) from becoming a truly nice language in my view (non-Nullable by default would be hard to retrofit for example).

There were also many other excellent talks at Swift Summit, and all were at least good. I'm looking forward to watching several again.

Swift 1.2 Update (Xcode 6.3 beta 2) - Performance

Apple have shipped another major update (release notes - registered devs only) only two weeks after shipping the beta 1. There are major updates to Swift Playgrounds, additional syntax support for more flexible `if let`, added a zip function, various other tweaks and fixed a tonne of bugs. This is all on top of the major changes in 1.2 beta 1 that I discussed at Swift London.

Erica Sadun has already blogged about the Playgrounds and `if let` changes and I'm sure that there will be plenty more over the next week. I don't intend to go over that ground but instead discuss the performance changes in Swift 1.2 versions. [Update: I should also have mentioned Jamesson Quave's post that covers the new zip function.]

The performance jumps are very significant in Swift 1.2 and certainly in Beta 2 the steps necessary to optimise your code are significantly changed. This post will cover some information on the improvements and how to get the best out of the Swift compiler. In most cases I have encountered the performance is now very close to C/C++ code and may be faster at times.

Swift 1.2 beta 2 Big Performance News

High performance can be achieved in Xcode 6.3 beta 2 without making code changes to achieve the result that were required to get close previously. Specifically the performance gains of these changes seem to have been largely eliminated:

  1. `final` methods/properties which used to be massive (although I still recommend it where possible for design/safety reasons).
  2. unsafeMutableBuffer instead of array (at least in -Ounchecked builds)
  3. moving code into the same file to allow the compiler to inline (with -whole-module-optimizations build option new in Xcode 6.3 beta 2).

There also seems to have been a slight general performance gain (about 5% above beta 1 from my initial tests).

Is Swift Fast Yet?

Yes. I've always believed that the language had potential to be as fast or faster than C because of strict compile time type checking, immutability and a lack of pointer aliasing concerns. With the Xcode 6.3 betas that promise is being delivered on. I'm sure the work is continuing and there is more to come but it is already fast in most cases. My thanks to the Apple team for making a fast platform but you have just taken away my ability to look impressive by speeding up code significantly with a few simple changes - you bastards.

Overcoming Swift Resistance Explored

This post is a response to David Owens' Swift Resistance Explored post.

First I want to acknowledge that Swift Debug builds are hideously slow. This isn't news to me or anyone who has seen my talk on Swift optimisation. To be clear this is a problem which Apple have said that they are working on but I still don't think it is an absolute block on any real development because there are fairly simple workarounds.

If you want to see the code under discussion it is all in my repository here with a log of my development. The first commit is the Zipfile that David made available in his blog post, the rest is my testing and development of the Swift version.

These are my performance figures (for 100 runs so it isn't directly comparable to David's):


Obj-C

Swift

Optimised

0.077 (1.8)

0.042 (1)

Debug

0.38 (9.1)

13.7939 (327)

The Importance of Being `final`

[UPDATE - Swift 1.2 from Xcode 6.3 Beta 2 brings performance benefits to non-final properties and methods making final unnecessary for performance from that time - see my initial reaction to Beta 2. I still recommend final where possible as it avoids need to consider effects of an object being subclassed and methods overwritten that change the behaviour.]

This post is intended to quantify, explain and show the performance hit that you take from not adding the word final to your classes (or their properties and methods). The post was triggered by a blog post complaining about Swift performance and showing some performance figures where Swift was significantly slower than Objective-C. In the optimised builds that gap could be closed just by adding a single final keyword. I'm grateful to David Owens for showing the code he was having trouble and giving me a base to demonstrate the difference final can make without it being me cherry picking any code of my choosing.

As with most performance issues this doesn't matter most of the time. Most code is waiting on user input, network responses or other slow things and it really doesn't matter that much if the code is being accessed a few times a second. However when you have got performance critical code and in particular that inner loop that is being executed hundreds of thousands of times a second it can make a huge difference.

What does final do

Swift Arrays are not Threadsafe

Swift arrays are not necessarily contiguous blocks of memory (although they may be in some cases so some code may work in some cases but not in others). Some operations may trigger them to be converted to a contiguous block of memory and I believe that is what can particularly cause issues when there is access from multiple threads.

[Update - This post is more popular than I expected (thanks Reddit). I've added some additional notes at the bottom. Also the previous post about optimisation is probably more interesting and valuable.]

Even the code as described here may not work (and I haven't tested at all) when there are multiple reads or writes that access the same elements of the array. All the code I used was either reading from a constant array (safe from multiple threads/queues) or writing into non-overlapping ranges with no reads occurring (only seems safe with the technique described here).

.withUnsafeMutableBufferPointer

Swift Optimisation Again - Slides from Swift London 21st October 2014

Update (23rd October 2014) - Now with video link HERE.

Swift optimisation presentation refined and developed further.

Please contact me if you want the source version and please let me know if you make use of it or find it useful.

Description of the demo from a previous event. Note that for Xcode 6.1 you will need to use the 0x, 1x, 2x and fullSpeedXC61 branches instead of the 0, 1, 2 and fullSpeed branches respectively. Likewise master currently has an Xcode 6.1 version although I will probably merge that to the real master shortly.


London Swift Presentation - Swifter Swift

This is the Swifter Swift presentation I gave at the London Swift Meetup in Clapham today (15th September 2014). Apologies for the stock photos/standard template, I focussed on the content.

I also showed the optimisation levels as described in this blog post.

Things I Forgot To Mention

Use constants - declare with let unless avoidable, allows extra optimisation. Test before changing to mutable state within a function for performance reasons.

Currently the optimiser can do better within a particular file although this will be fixed at some point.

Github Repos

https://github.com/josephlord/GrayScott - Optimised version of Simon Gladwell's Cellular Automata project.

Async.legacy - My iOS7 / OS X 10.9 compatible fork.

Async - Original by Tobias Duemunk




From 41 frames per second to 1560 - Full app 38x speedup

Having spent a couple of evenings back porting Async to iOS 7 and OS X Mavericks (10.9) and releasing it as Async.legacy I've gone back to trying to squeeze some more performance out of the GrayScott Cellular Automata app that Simon Gladman presented at the last London Swift Meetup.

For me this was an interesting case to see how fast something that is almost entirely CPU and memory bound can be made and it gave me a chance to play. Not many things need optimising if well strutured but this was a case where it could clearly be relevant.

Simon's original code calculates about 10fps in debug mode and displays many of them. Built with optimisations it increases to about 41fps calculated but it very rarely updates the screen due to the timing mechanism he used rather than calling back to the main thread. All this was done on a 70 pixel square calculation.

Running the latest code on a 70 pixel square calculation it calculates between 1550 and 1600 frames most seconds for a speedup of about 40 times and it is displaying far more frames to the screen too (well assigning the images to the image property of the imageView, the screen framerate is far lower).

This post focusses on making the main solving work multi-threaded for performance and in the optimisation of the inner loop. At this point we are moving beyond the point where we are optimising by improving the style, purity and immutability of the code. Some of the changes (inlining simple functions) go directly against good style and should only be done in inner loops. The parallelisation of the main solver is also something which makes the code less clean and tidy as is the incorporation of the pixelData generation into the main loop.