Rowboat Guide

            Quickly make sense of 
large and unfamiliar datasets.
          

How can we quickly get to know
large and unfamiliar datasets?

million_vehicles.tsv

1,000,000 rows|112.21 MB

Open in Rowboat

Header charts, including donut charts, a range chart, histograms, and a ZIP Code map

Ellory

New England Rotary Enthusiast

It’s pretty rare to ever open a spreadsheet with absolutely zero context, but no matter your level of expertise, it takes some effort to familiarize yourself with a new dataset—all the more so when it’s a particularly large or opaquely structured file.

Let’s say you’re handed a 112 MB file with 1,000,000 rows of data.

Where do you start?

First, you need to open it. Massive files frequently crash commonly used tools like Excel or Google Sheets.

Even when those large files can be successfully opened, applying any type of filtering or sorting tends to be slow and frustrating.

Rowboat can open one million rows in one second—and that’s important for more than just the novelty of it.

Speed fundamentally changes how you’re able to understand your data.

There’s a lot that needs to happen in between viewing a spreadsheet for the first time and actually making use of the information inside. Setting up custom code to run exploratory queries isn’t just time consuming, it pulls you away from the data you’re trying to understand.

Having a way to ask, answer, and pivot your questions in real time lets you quickly narrow in on the threads you’re most interested in exploring as you get to know your data.

Let’s turn back to our not-so-hypothetical million row file.

Rowboat automatically represents each column visually, so we can start making observations about this dataset the moment we open it.

In the header summary charts, I see date ranges, basic vehicle stats (make, model, type), and zip codes.

From these summaries, we immediately know we’re dealing with data about vehicles located within the United States over a period of time.

A glance at the Zip Code header shows a dot map with a high density of records on the East Coast.

Expanding the map, it’s clear that almost all records are in Massachusetts.

There are a few other clusters that pop out, mainly around large cities like L.A., Dallas, and Chicago.

ZIP Code map, showing dots across the US, with a more concentrated cluster in the northeast

Now we have a general sense of the range of records present and can start exploring different subsets of the data.

By applying and removing filters and immediately seeing how this impacts the rest of the data, we can start to ask questions like:

What makes and models are most common among vehicles from the 80’s?

Header charts after filtering down to vehicle years in the 1980s

How does that compare to the 90’s?

Header charts after filtering ndown to vehicle years in the 1990s

We can ask deeper questions about relationships between columns by linking them together:

Can we see when hybrid cars began being manufactured and driven more widely?

A stacked histogram of hybrids over time, filtered to cars

What does the relationship between miles driven per day and daily fuel use look like?

A scatterplot of miles per day against daily fuel use, filtered down to a section of the chart

You can clear a filter at any time, and use the ← and → filter toggle buttons to switch between recent filter states as you explore.

It’s worth noting again how uncommon it is to open a file knowing nothing about it. But in just under a minute, we were able to gather quite a few informative insights about a large, unknown dataset.

Make your own discoveries
with this dataset.

million_vehicles.tsv

1,000,000 rows|112.21 MB

Open in Rowboat

Rowboat Guide

Quickly make sense of large and unfamiliar datasets.

Quickly make sense of
large and unfamiliar datasets.