Am I holding it wrong? - AsyncSequence file reading experiments with iOS 15 beta 1

[Update: Compiler but in Xcode 13 beta 1 confirmed. See below update]

[Update2: New post up with some rough and ready benchmarking of AsyncSequence file reading using URLSession]

So what I wanted to know was how to efficiently and asynchronously read a file into a Data object using the new structured concurrency approach in Swift. The short version is that I didn't manage to get the async approaches to file reading working, not sure yet if I was doing it wrong or there are currently problems with the frameworks (I have filed a bug report with Apple - FB9177012). Either way the async approaches were crashing for me. And I do welcome responses letting me know what I'm doing wrong.


I couldn't see any exact APIs that looked right for asynchronously reading a file. The Meet AsyncSequence talk from WWDC gave one possibility although it didn't look very efficient of handling the URL as an AsyncSequence of bytes but I thought I would do some investigation to check. I later as you can see at the end of this post realised that URLSession provides the method I need for a one shot request (that also wasn't working). I still would like to test how the performance of all these approaches varies.

Update

Confirmation that it is a beta 1 from someone who worked on some of the AsyncSequence stuff at Apple.

End update

Setup

Created a new iOS project with tests, didn't touch the app but wrote performance tests and added a file with semi random content (I used https://generatedata.com to make some random content then kept it appending it to a file until it got to 1MB which I thought was enough for a useful test). I'm running Xcode 13 on Big Sur (on a MacBook Pro 2013 that unfortunately won't be able to run Monterey) but with iOS 15 on my iPhone XS to run tests on to get realistic results as I don't know whether in the Simulator on Big Sur the full optimisation and threading model of the async/await is possible.

The Tests

First to get a baseline I wrote a fully synchronous version:

class TestAsyncFileReadPerformanceTests: XCTestCase {

    let fileUrl: URL = {

        let bundle = Bundle(for: TestAsyncFileReadPerformanceTests.self)!

        return bundle.url(forResource: "randomData", withExtension: "txt")!

    }()


    func testPerformanceSyncExample() throws {

        // This is an example of a performance test case.

        var data: Data?

        self.measure {

            data = try! Data(contentsOf: fileUrl)

        }

        XCTAssert(data!.count > 1)

    }

}

This worked (although it seemed file caching was in effect so I may have needed to do something else for realistic performance data) and I was ready to add tests for the async approaches.

AsyncBytes approaches

Reduce

The first approach that I took was this approach using reduce. Note that at least until you have successfully tested it do not really use this because at least in theory it is highly inefficient having to create two arrays for every byte in the file (one of a single item and one of all the bytes so far). I am still interested in it though as it may be that the compiler can be much smarter and optimise it so that it performs well. If I get it working I'll see how it performs and may have a look at it in Hopper (disassembler).

var data: Data?

    func read(url: URL) async  -> Data? {

        do {

           return try await Data(url.resourceBytes.reduce([]) { $0 + [$1] })

        } catchreturn nil }

    }

   func testPerformanceAsync() throws {

        let e = expectation(description: "Completed load")

        self.measure {

            async {

                self.data = await read(url: fileUrl)

                e.fulfill()

            }

            waitForExpectations(timeout: 5, handler: nil)

        }

        XCTAssert(data!.count > 1)

    }

Append

This is similar but doesn't rely on the compiler understanding the reduce. I tried with different reservedCapacities but with no joy.

        func read2(url: URL) async  -> Data? {

        do {

            var array: [UInt8] = .init()

            array.reserveCapacity(50000)

            for try await byte in url.resourceBytes {

                array.append(byte)

            }

            return Data(array)

        } catch {

            return nil

        }

    }

    var data2: Data?

    func testPerformanceAsync2() throws {

        let e = expectation(description: "Completed load")

        self.measure {

            async {

                self.data2 = await read2(url: fileUrl)

                e.fulfill()

            }

            waitForExpectations(timeout: 5, handler: nil)

        }

        XCTAssert(data2!.count > 1)

    }

URLSession - The recommended approach?

My final attempt used URL session which already has an async call to get the whole file. The results were similar with a crash. Using URLSession for local files feels wrong to me but given we are using a URL there is no reason why it shouldn't work. Would be nice to have a convenience method on Data or or URL though.

func testPerformanceAsync4() throws {

        let e = expectation(description: "Completed load")

        let urlSession = URLSession.shared

        self.measure {

            async {

                (self.data, _) = try await urlSession.data(from: fileUrl)

                e.fulfill()

            }

            waitForExpectations(timeout: 5, handler: nil)

        }

        XCTAssert(data!.count > 1)

    } 

Conclusion

I'm probably doing something obviously wrong. Please let me know what if you know and I'll update and credit you. If not it is likely beta release issues, and that is fine, that is what betas are there to sort out. I do really like the looks of the Structured Concurrency changes, and I'm curious to test out the performance of tight loops over async sequences (reading byte at a time etc).