Google Sheets: How to organize your data correctly

In the company world, spreadsheets are everywhere and used – at different degrees – by everyone. In my experience as an analyst, I have been able to study why most of the people have trouble extracting meaningful information, and it’s all about a lack of clean-up and organization of their datasets.
There are dos and don’ts when it comes to prepare your data for further analyses and Ia m going to reveal them to you.

Don’ts

Here are a few examples that – please – you should never apply!

Too beautiful for analyses

The original data is already organized for visualization, but that organization makes further manipulations complicated:
For example, how do I retrieve totals for EN and ES in an easy way?

Tabs, tabs, tabs, …

Multiplying tabs for content that have similar properties complicates drastically cross-overs between the different datasets.
In this example, how do I make a chart of a monthly trend?

Different dimensions in a single column

The location, the type of pages and the categories are 3 different dimensions. Hence, they need 3 different columns

Several tables for similar content

In a similar fashion as dividing content in tabs, dividing contents with similar dimensions in tables is a no no. It might be more intuitive because more readable for a human, but it makes crossing ES and EN complex operations.

Dimensions vs. Metrics

First let’s define two major concepts: Dimensions & Metrics

Dimensions = Attributes of your data
Example: Site, country, page type, date, …

Metrics = Quantitative measurements
Example: Number of sessions, Pages/Session, Revenue, …

The golden rules of a clean dataset

  • In general, all the dimensions and metrics that we might need in the future should appear in one table.
  • Each dimension and each metric should have their own column => The table is called “flat”
  • Datasets that deserve to be split in different tables are datasets that don’t share similar attributes (different dimensions and different metrics)
  • In that case, as a general rule, set one sheet per table, and name your sheet properly
  • Any column of a table should be filterable and sortable
  • The data should be formatted correctly: numbers should be numbers, dates should be dates

At the end of the day, what counts is that the information should feel easy to manipulate