Skip to main content

Feng Yu

The Ontario government released their annual Sunshine List on March 24, detailing public sector employees earning more than $100,000 per year. We created a table so readers can explore the list in more detail, letting you search, sort and filter by name, salary and more.

This list is published each year on the government's website in a way that's hard to search, impossible to sort and difficult to navigate. The Globe wanted to pull the data from this year's list and publish it in a more usable way, as a tool for our reporters and our readers.

Here's a little background on how we made the tool. (Be warned: it gets technical.)

We started by building a scraper, which trolls web pages for content and saves it in a more sophisticated way than copy-paste. Using a coding language called Python, we built a universal scraper that could pull all the data back to 1997 – the first year it was released.



We cleaned this data using Google Refine, converting encoded HTML characters and renaming some categories. The next challenge was cutting the data down as much as possible. While there were only a few thousand records back in the 1990s, other years had as many as 79,000 records, making the file sizes very large. While Chrome and Firefox could handle it well, Internet Explorer chugged slowly with each new megabyte we pushed its way.

So we divided the master file into several chunks by category and opted to not use a JSON array, since the keys added unnecessary kilobytes to the file. A standard JavaScript array was used instead.

Usually when you're dealing with a big dataset, programmers use server-side language like PHP to query the data. But we were wary of doing this because the technical and administrative overhead seemed insurmountable within our timeframe. So we tried exploring browser-side options and settled on SlickGrid, an open-source JavaScript plug-in that handles massive amounts of data very well. The plug-in had to be customized to handle some extra functionality: currency sorting, historical comparisons and a universal search box.

Since we had data from the 2010 release, we added a feature to let readers compare increases or decreases. In an earlier version of the table, we also included the employer name and position with this pop-up. But we had to cut it late in development because it nearly doubled the size of the 2010 dataset.

The final tool is very simple to use and, admittedly, not very flashy. But it lets readers dig a little deeper into the list, search specific jobs and find notable people.

If you have any questions, comments or suggestions about this interactive, reach us at community@globeandmail.com.

Interact with The Globe