GSoC ’14 Progress

It’s been over a week since my last post, and my recollection of work done is a bit hazy :(. Here’s a quick summary though:

  • Tests! I’ve written pretty comprehensive tests for writing and reading of log files (including grouped-by-day reading). I learned about Assert.jsm – something so new that my tests broke one fine day because of ongoing changes in m-c (I expected it to be stable because do_check_* had already been marked deprecated in MDN).
  • Bug fixing. Writing tests uncovered several intricacies that have now been taken care of. One discovery that hasn’t been addressed yet is that if queued file operations exist during AsyncShutdown, they fail because stuff like JSON (and even dump) get removed from the scope. A simple (pending) solution is to yield on all of them when the “prpl-quit” notification is fired.
  • Thunderbird! I got a Thunderbird build running and tested the new API changes. The UI is properly listing and displaying logs. Gloda indexing has also been updated: I had to implement yet another promise queue to make sure we read only one conversation at a time (and not load all of them into memory). Debugging was a hassle and took ages: Components.utils.reportError is disabled from chrome code by default in Thunderbird! I searched around for hours to see why my code wasn’t being called when indeed it was – just that the reportError call was never firing.
  • In the middle of all this, I took a day to play with the convbrowser code to see if I could manipulate the way messages are added and so on. I succeeded in getting messages to be added in reverse, but bubbles grouping wasn’t working and my code got messy enough that I reverted everything. It was a good experiment though, a kind of warm-up exercise for the real task that looms ahead 😉
  • I also spotted and fixed a couple of minor miscellaneous bugs.

While the async-logs patches undergo review cycles, I plan to either play more with the convbrowser and churn out something concrete (my first goal is to make context messages load from the bottom up), or to start messing with databases – test out storage/retrieval performance and disk usage with full-text-search enabled if we store all messages in it for example. Maybe both!

Until next time! Hopefully the next progress update won’t take as long as this one did.

GSoC ’14 Progress: File I/O Performance and Promise Chains

In my last post I included a to-do list of sorts that I expected to complete before writing the next one. None of the items in the list have been crossed off, but the progress over the last couple of days call for another post so here goes.

First off: file I/O strategies. I had a few enlightening discussions on #perf with Yoric and avih about the strategy I proposed about log writing and file I/O in general. The strategy – having an OS.File instance open and repeatedly appending to it using write() – was deemed feasible, but then I started thinking about a problem Florian hinted – what happens when we try to read a log file that’s currently open for writing (and possibly during a pending write)?

I talked to Yoric about possible race conditions between reads and writes, and it turns out this isn’t a problem because OS.File does I/O on a single thread. However, he warned me that opening a file for reading while it was already open for writing might fail on Windows (in general, opening a file twice concurrently).

As a solution to this, I proposed that, instead of keeping the file open, we open it, write to it, and close it immediately whenever we need to append a message. Not keeping the file open means that we don’t have to worry about opening it twice simultaneously, but now I had to worry about overhead added from opening and closing the file every time. What would the impact be if, for example, 50 conversations were open and each of them had up to 30 incoming messages per second? Would the overhead added by opening/closing visibly impact performance in this kind of a situation?

I asked about this on #perf again, and this time avih responded with some valuable insight. He explained that OSes cache opens and closes (and even seeks) so that successively opening a file would cause negligible overhead. This was of course only considering OS level file handling, not counting overhead caused by the OS.File implementation.

Now that I was confident that opening/closing the file every time wasn’t totally insane, I wrote a small benchmark to compare performance between the two strategies for appending a string to a file 1000 times. I ran it on my MacBook’s SSD, a FAT32 USB 2 flash drive, and a HFS+ partition on my USB 3 hard drive. The results were similar: opening/closing the file every time was about 3-4 times slower than keeping it open (absolute values were between 0.5-1.5 seconds keeping it open, and 1.5-5 seconds opening/closing every time).

However, that was for 1000 consecutive writes – not likely in a realistic scenario, and even so, decent enough to go unnoticed by a user. As avih put it, “optimization is great, but if applied where it’s not really needed, then it just adds code and complexities overheads, but giving you nothing meaningful in return”. Of course, Florian might have something to say about it when he’s back 😉

With the strategy decided, I set about adapting the code accordingly, and realized it was still possible for a read to be called on a file during a pending write. I needed a queue system to ensure all operations on a given file happened one after another. Since all OS.File operations are represented by promises, I decided to map each file path to the promise for the ongoing operation on it. Then to queue an operation on a file, do the operation in the existing promise’s then. Here’s some code to make that clear:

let gFilePromises = new Map();

function queueFileOperation(aPath, aOperation) {
  // If there's no promise existing for the
  // given path already, set it to a
  // dummy pre-resolved promise.
  if (!gFilePromises.has(aPath))
    gFilePromises.set(aPath, Promise.resolve());

  let promise = gFilePromises.get(aPath).then(aOperation);
  gFilePromises.set(aPath, promise);
  return promise;
}

Now whenever I have to do any file operation, I just do |queueFileOperation(path, () => OS.File.foo(path, bar, …));| and presto! An async file I/O queue.

An interesting side effect of the above code snippet is that once a path is added to the map, it’s never removed (=memory leak). This is solved by a slight modification:

function queueFileOperation(aPath, aOperation) {
[...]
  let promise = gFilePromises.get(aPath).then(aOperation);
  gFilePromises.set(aPath, promise);
  promise.then(() => {
    // If no further operations have been
    // queued, remove the reference from the map.
    if (gFilePromises.get(aPath) == promise)
      gFilePromises.delete(aPath);
  });
  return promise;
}

And that’s about it! Long post, but it was a great learning experience for me and I figured it deserved one.

Cheers!

GSoC ’14 Progress: Async logger.js

Since my last post, I’ve made considerable progress in the quest to make file I/O async in logger.js.

  • I wrote some xpcshell tests for the log writing code, and tested it in a live Instantbird instance – it was good to go with a couple of minor mistakes in code.
  • Log reading is now also async, and the Instantbird log viewer has been updated for the API changes.
  • I’ve added error handling throughout. This is important because errors in Promises may be dropped to the floor if the Promise’s consumer doesn’t handle rejections, and especially in Promise chains where errors need to be propagated.
  • I had to figure out how to pass a Promise through an XPCOM interface. Turns out the return type needs to be declared as “jsval”.

All of this has been tested with the Instantbird log viewer and everything seems to be working smoothly. Coming up next:

  • The strategy I’m using for writing log files is to have an OS.File instance open and repeatedly append to it using write(). The file needs to be closed after arbitrary periods of inactivity.
  • Tests! Now that log reading is async, I need to write tests for writing/reading a conversation.
  • Other consumers of the imILogger API need to be updated (Thunderbird).

My next post will likely be when the items on the to-do list above have been crossed off. Until then, you can follow the progress of this effort on bug 955292.

GSoC ’14 Progress: Tasks and Promises

My semester finally finished last week, and we decided that a good way to start off my project would be to make chat history logging asynchronous. The current implementation in logger.js maintains an output stream to an nsIFile instance and synchronously writes to it. My job is to rewrite the relevant code using the OS.File API to make everything async.

Over the last couple of days, I’ve been researching on the Promise API and how it works with OS.File for async I/O. I learned how Task.js allows for clean code using the yield keyword, and sat around a lot pondering about preservation of sequence when working with async functions.

Things became clear once I realized that in Task.spawn(), what |yield <Promise>| was doing was simply letting the event loop continue until the Promise resolved.

I then proceeded to properly start rewriting the code (several failed attempts preceded my yield realization) for Conversation and System logs. I’ve attached a WIP to bug 955292 for feedback. The code is untested however, and my next task will be to write xpcshell tests to make sure everything is working.

Redirecting WordPress to a subdomain

After quite a few hours of messing with htaccess, I’ve finally figured out how to properly redirect my blog from http://nhnt11.com/blog to http://blog.nhnt11.com.

First of all, here were my redirecting goals:

  1. Avoid www:
    http://www.nhnt11.com/foo/bar -> http://nhnt11.com/foo/bar
  2. Remove trailing slashes from filenames:
    http://nhnt11.com/foo.bar/ -> http://nhnt11.com/foo.bar
  3. Redirect my blog:
    http://nhnt11.com/blog/foo/bar -> http://blog.nhnt11.com/foo/bar

You would think the htaccess rules would be simple. The obvious rules for each case respectively are:

  1. RewriteCond %{HTTP_HOST} ^www\.nhnt11\.com$
    RewriteRule ^(.*)$ http://nhnt11.com/$1 [L,QSA]
  2. # I don't want to mess with trailing slashes and stuff for the blog, so match only nhnt11.com.
    RewriteCond %{HTTP_HOST} ^nhnt11\.com$
    RewriteCond %{REQUEST_FILENAME} -f
    # Regex matches foo.bar/
    RewriteRule ^(.*\.[^/]+)/$ http://nhnt11.com/$1 [R=301,L]
  3. RewriteCond %{HTTP_HOST} ^nhnt11\.com$
    RewriteRule ^blog/(.*)$ http://blog.nhnt11.com/$1 [L,QSA,R=301]
    
    # nhnt11.com/blog -> blog.nhnt11.com (not covered in first rule)
    RewriteCond %{HTTP_HOST} ^nhnt11\.com$
    RewriteRule ^blog$ http://blog.nhnt11.com [L,QSA,R=301]

This should have done it. But no – for some reason, http://nhnt11.com/blog/foo/bar -> http://blog.nhnt11.com/foo/bar was refusing to work. It just gave me WordPress’s “Page not found” page.

After hours of tinkering (among other things, I could never be sure of what was working or whether my changes were being used – turns out browsers cache redirects so I had to keep clearing caches) I found out that WordPress has its own .htaccess file in the directory in which it was installed (/blog in my case). This was interfering with the .htaccess in my root directory – so I figured removing it must surely make everything start working. And it partly did. The problem now was that WordPress had been using it to serve up the right content for the fancy blog/foo/bar URLs (e.g. blog/category/foo).

A lot of tinkering later, I discovered that commenting out the redirection stuff in WordPress’s .htaccess wasn’t enough – I had to comment out the “RewriteEngine on” line as well! So maybe if I migrated the rules into the main .htaccess, it would work.

It did, with just one change – I added a condition to invoke those rewrites only for blog.nhnt11.com.

Here’s the final set of working rewrite rules:

RewriteEngine on

# Avoid www
RewriteCond %{HTTP_HOST} ^www\.nhnt11\.com$
RewriteRule ^(.*)$ http://nhnt11.com/$1 [L]

# Remove trailing slash for files
RewriteCond %{HTTP_HOST} ^nhnt11\.com$
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*\.[^/]+)/$ http://nhnt11.com/$1 [R=301,L]

# nhnt11.com/blog/foo/bar -> blog.nhnt11.com/foo/bar
RewriteCond %{HTTP_HOST} ^nhnt11\.com$
RewriteRule ^blog/(.*)$ http://blog.nhnt11.com/$1 [L,QSA,R=301]

# nhnt11.com/blog -> blog.nhnt11.com
RewriteCond %{HTTP_HOST} ^nhnt11\.com$
RewriteRule ^blog$ http://blog.nhnt11.com [L,QSA,R=301]

# WordPress fancy URL stuff
RewriteCond %{HTTP_HOST} ^blog\.nhnt11\.com$
RewriteRule ^index\.php$ - [L]
RewriteCond %{HTTP_HOST} ^blog\.nhnt11\.com$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

A more detailed GSoC post

A question I get asked often after receiving the obligatory congratulations from people is “What’s your project about?”. I’ll try in this post to answer that question satisfactorily.

First off, I’ll be working this summer with the Instantbird team, under the Mozilla umbrella. You can view my selected proposal here. My mentor will be Daniel Scruton (his internet name is usually aleth) – though if this summer is going to be anything like last year’s, my mentor is really #instantbird ;). (Here‘s a blog post that summarizes my project last year)

Many people (for some reason) have been under the impression that Instantbird itself is my project – it’s not. Instantbird is an existing cross-platform instant messaging client built on the Mozilla platform. Check it out if you haven’t already! I’ll be working to improve it this summer.

What am I working on? Two things.

  1. It’s pretty hard to search your chat history at the moment. You have to know on what date the message you’re searching for was sent/received, or know how to use grep. I’ll be working on getting all chat logs indexed and easy to search.
  2. I’ll be implementing infinite conversation scrollback – older messages should keep appearing as you scroll up.

These sound like two straightforward goals, but do involve quite a bit of work. Read my proposal for more details on the steps involved and some of the edge cases and potential difficulties I’ve predicted.

Hopefully this clears it up. I’ll be blogging my progress here through the summer, do subscribe if you’re interested!

GSoC ’13 is over!

Well, Summer of Code 2013 is officially over. My project has pretty much met what was in my proposal – the stats and ranking patch is r+ (though awaiting Florian’s approval) and should land soon.

There are a few leftover bugs that I’ll be working on when I have time (I’ve been pretty busy with my coursework lately). Many of them are small (and possibly quick to fix): bugs 2046, 2048, 2069, 2081, 2116, 2076, 2145, 2146, 2184, 2185.

Here’s a broad overview of my work over the summer:

  • Modified the tabbrowser to accept generic tabs (bug 426).
  • As an example usage of the new tabbrowser changes, made it possible to open about:* pages in a tab (bug 2002).
  • Wrote the initial new conversation tab, with a list of contacts and a filter box (bug 2015).
  • Added support for existing conversations in the new conversation tab and moved the management of possible conversation data to a new ConversationStatsService (bug 2055).
  • Wrote an API to request protocol plugins for chat room lists, and implemented it for JS-IRC. Made the stats service use this API to display IRC channels in the new conversation tab (bug 2066).
  • Experimented with IndexedDb to store conversation stats. Dropped it in favor of a JSON cache until we migrate to a more sophisticated database from which we can query possible conversations on the fly (and possibly incorporate log indexing?). Wrote code to store these conversation statistics and use them to sort possible conversations. Adapted Florian’s log sweeper code to parse JSON logs for stats (bug 2143).

I’ve learned a lot over the summer. That includes technical stuff – JavaScript, XUL and the Mozilla platform, XPCOM, etc. – as well as a much deeper insight into open source development. I’d contributed to open source before GSoC, but mostly just small patches. This summer I was actively involved in the development process, and got to see how much care and attention is paid to detail when making even small changes. I had to consider things I would’ve ignored before – localization, accessibility, edge cases, UI profiling, etc. Every UX decision was preceded by lengthy brainstorming and debating sessions on IRC. It wasn’t uncommon to spend 10 minutes (sometimes more) figuring out the best name for a variable!

A huge thanks to my mentor Benedikt Pfeifer, and the Instantbird team for all their help over the past three to four months, they’ve been a pleasure to work with! Be on the lookout for Instantbird 1.5, it’s going to be pretty awesome (:P).

PS
You can download a change log of my project from here, or if you have the source, you can generate one yourself using
$hg export `hg log -b default -u nhnt11 | grep changeset | cut -f3 -d ":"`

Progress report: September 6th

This post has been long overdue. Here’s the major progress over the last couple weeks:

  • Bug 2066 is merged! Awesometab now queries accounts for chat room lists and displays them (only implemented for IRC, you’ll have to flip the chat.irc.allowList preference to true in about:config).
  • Bug 2143 – Ranking and stats are working and have had a couple of review cycles. We decided that efficient and intelligent searching/filtering is a requirement and also that the log sweeping code could be reused for log indexing. Based on all this the IndexedDB has been dropped from this bug in favor of a simple JSON cache for now.

First thing I’ll be doing next is optimizing the chat room code from bug 2066 to eliminate UI lag while receiving chat rooms. After that, the priority will be landing bug 2143. Things will likely get slow after that while I have my first test cycle, but once that’s over (or possibly before, if 2143 lands quickly) I’ll be looking at optimization and better searching (implement a proper database, indexing, etc).

Cheers!

Weekly report: August 21st 2013

During the last 7 days, I didn’t quite get around to everything I wanted to do (which wasn’t helped by 4 of the days being holidays) but there was progress.

  • Bug 2066 (chat rooms and IRC’s LIST) is finally r+.
  • I experimented more with frecency scores (“frecency” is inspired by Firefox’s Places Frecency algorithm)
  • After a bit of experimenting, I finally decided to stop messing around and get some actual data to play with from logs.
  • Using Florian’s WIP from bug 1584, I put together an addon that goes through logs and obtains information on recency (date of most recent log file) and frequency (number of messages exchanged, with separate counters for incoming and outgoing)
  • Playing with the algorithm for a while, I finally ended up with the following:
    • Score = Total message count * frequency multiplier * recency multiplier
    • Here, frequency multiplier is the ratio of outgoing to incoming message counts, and recency multiplier is identical to what Firefox uses.
    • This gave me pretty accurate results, the conversations with the highest scores were definitely what I would most likely want to open if I pressed Command+T 🙂

Tomorrow I will be working to integrate this into the stats service and use these scores to sort conversations in the newtab.

PS.
I again notice that my work over a week seems to be concentrated over one or two days, during which I spend 10-15 hours each day on a kind of frenzy. Meh.