GSoC ’14 Progress: File I/O Performance and Promise Chains

In my last post I included a to-do list of sorts that I expected to complete before writing the next one. None of the items in the list have been crossed off, but the progress over the last couple of days call for another post so here goes.

First off: file I/O strategies. I had a few enlightening discussions on #perf with Yoric and avih about the strategy I proposed about log writing and file I/O in general. The strategy – having an OS.File instance open and repeatedly appending to it using write() – was deemed feasible, but then I started thinking about a problem Florian hinted – what happens when we try to read a log file that’s currently open for writing (and possibly during a pending write)?

I talked to Yoric about possible race conditions between reads and writes, and it turns out this isn’t a problem because OS.File does I/O on a single thread. However, he warned me that opening a file for reading while it was already open for writing might fail on Windows (in general, opening a file twice concurrently).

As a solution to this, I proposed that, instead of keeping the file open, we open it, write to it, and close it immediately whenever we need to append a message. Not keeping the file open means that we don’t have to worry about opening it twice simultaneously, but now I had to worry about overhead added from opening and closing the file every time. What would the impact be if, for example, 50 conversations were open and each of them had up to 30 incoming messages per second? Would the overhead added by opening/closing visibly impact performance in this kind of a situation?

I asked about this on #perf again, and this time avih responded with some valuable insight. He explained that OSes cache opens and closes (and even seeks) so that successively opening a file would cause negligible overhead. This was of course only considering OS level file handling, not counting overhead caused by the OS.File implementation.

Now that I was confident that opening/closing the file every time wasn’t totally insane, I wrote a small benchmark to compare performance between the two strategies for appending a string to a file 1000 times. I ran it on my MacBook’s SSD, a FAT32 USB 2 flash drive, and a HFS+ partition on my USB 3 hard drive. The results were similar: opening/closing the file every time was about 3-4 times slower than keeping it open (absolute values were between 0.5-1.5 seconds keeping it open, and 1.5-5 seconds opening/closing every time).

However, that was for 1000 consecutive writes – not likely in a realistic scenario, and even so, decent enough to go unnoticed by a user. As avih put it, “optimization is great, but if applied where it’s not really needed, then it just adds code and complexities overheads, but giving you nothing meaningful in return”. Of course, Florian might have something to say about it when he’s back 😉

With the strategy decided, I set about adapting the code accordingly, and realized it was still possible for a read to be called on a file during a pending write. I needed a queue system to ensure all operations on a given file happened one after another. Since all OS.File operations are represented by promises, I decided to map each file path to the promise for the ongoing operation on it. Then to queue an operation on a file, do the operation in the existing promise’s then. Here’s some code to make that clear:

let gFilePromises = new Map();

function queueFileOperation(aPath, aOperation) {
  // If there's no promise existing for the
  // given path already, set it to a
  // dummy pre-resolved promise.
  if (!gFilePromises.has(aPath))
    gFilePromises.set(aPath, Promise.resolve());

  let promise = gFilePromises.get(aPath).then(aOperation);
  gFilePromises.set(aPath, promise);
  return promise;
}

Now whenever I have to do any file operation, I just do |queueFileOperation(path, () => OS.File.foo(path, bar, …));| and presto! An async file I/O queue.

An interesting side effect of the above code snippet is that once a path is added to the map, it’s never removed (=memory leak). This is solved by a slight modification:

function queueFileOperation(aPath, aOperation) {
[...]
  let promise = gFilePromises.get(aPath).then(aOperation);
  gFilePromises.set(aPath, promise);
  promise.then(() => {
    // If no further operations have been
    // queued, remove the reference from the map.
    if (gFilePromises.get(aPath) == promise)
      gFilePromises.delete(aPath);
  });
  return promise;
}

And that’s about it! Long post, but it was a great learning experience for me and I figured it deserved one.

Cheers!

Leave a Reply

Your email address will not be published.