Bob Nadler, Jr. Logo

Summarizing Data with Pivot Tables in Ruport

Update: The Ruport patch I mention below has been accepted into Ruport core. The patch also includes support for other operations. See my commit for details.

Pivot tables are a useful data tool. They allow you to transform large amounts of tabular data into an easier to read format. They can be difficult to create, however, and some people have a hard time understanding how to create them. It took me awhile to fully grasp how they worked.

Pivot tables are mostly found in spreadsheet software, but there are other tools, too. One such tool is Ruport, a Ruby based reporting tool.

What is a Pivot Table

A pivot table is

a data summarization tool found in data visualization programs such as spreadsheets... pivot-table tools can automatically sort, count, and total the data stored in one table or spreadsheet and create a second table displaying the summarized data.Wikipedia

Let's say you have a table of sales data like the one below. The first column shows the regions in which sales occurred. The second shows the date of the sale. Finally, the third shows how many units were sold.

Now imagine that there were over 1000 lines of data like this, but only occurring in four regions and over the time span of one month. Not only would it be difficult to read, but we cannot tell how many units were sold in a certain region on a certain day without re-sorting and filtering the data... unless we use a pivot table.

A pivot table has three main parts: a column to group by, a column to "pivot" and a column to summarize.

  1. The group column usually consists of repeated data values that you want to, well, group. These values will become the row headings on the left-hand side of the new table.
  2. The pivot column values become the row headings at the top of the new table, these values are also grouped (this is the part that sometimes confuses people).
  3. Finally the summary column is aggregated by summing, counting, averaging, etc. The aggregations supported vary by software vendor, but all include the ones I just mentioned.

Example Pivot Table


Creating the Table

Ruport is an awesome Ruby library for generating reports using table based data. It also has some basic pivot table support.

Take a look at the irb session below. It uses the same example as above to create a Ruport::Data::Table.

>> require 'ruport'
>> t = :data => [
?> ['North', '1/1/2010', 5],
?> ['South', '1/1/2010', 2],
?> ['East', '1/1/2010', 4],
?> ['West', '1/1/2010', 1],
?> ['North', '1/15/2010', 10],
?> ['West', '1/15/2010', 9]],
?> :column_names => %w[Region Date Quantity]
>> puts t
| Region |   Date    | Quantity |
| North  | 1/1/2010  |        5 |
| South  | 1/1/2010  |        2 |
| East   | 1/1/2010  |        4 |
| West   | 1/1/2010  |        1 |
| North  | 1/15/2010 |       10 |
| West   | 1/15/2010 |        9 |

Performing the Pivot

Now, we want to pivot the table on the "Date" column, group by the "Region" column and summarize the values in the "Quantity" column. We do this using the pivot instance method on the Table object. Now our data is easier to read and is ready to be rendered using any Ruport formatter you'd like.

>> puts t.pivot 'Date', :group_by => 'Region',
?> :values => 'Quantity'
| Region | 1/15/2010 | 1/1/2010 |
| North  |        10 |        5 |
| South  |           |        2 |
| East   |           |        4 |
| West   |         9 |        1 |

One Caveat

Right now, the Ruport pivot method does not work exactly like this. Instead of summing the value column it only takes the first value encountered. It doesn't apply in this example since there is only one data point for each "pivot", but it will be a problem with larger datasets. I've written a patch that corrects this, but it hasn't been applied to the official repo. So for now, you can get the patch from my fork. I'll update this post once the patch has been accepted.

Email list
Image by husin.sani

No spam, ever. Unsubscribe at any time.