Which chart to be used to track your KPI’s?
How we helped our clients to visualize their metrics to aide decision making.

”Data is the new oil”. You’ve probably heard of this term being thrown around in the larger context of data science. But if you’ve sat on a big pile of data, you will realise that without gathering insights and aiding in the decision making progress, data is useless.

 We recently helped one of our customers with this exact issue. They wanted a customisable dashboard in order to help the key decision makers gather a 360 degree view of the relevant metrics. We took heavy inspiration from tableau, Jupiter notebooks and other BI tools used by data wizards.

 Our team was tasked with creating a solution for data visualisation to aide our clients data analysis efforts. Data visualisation involves plotting charts. Now there are multiple types of charts.

  • Scatter plot,
  • Histogram,
  • Pie chart ( and donut chart)
  • Tree map
  • Stacked Area chart
  • 100% stacked area chart
  • Bar chart (categorical)
  • Bar chart (time as a category)
  • Line chart
  • Box plot

 Wow, that’s a lot of charts . So now we need to decide which chart do we use? And how do we decide which is best for our data? A common misunderstanding when it comes to visualising data is that we often pay too much emphasis on the data itself. Here is the short answer - "The question you're trying to ask” decides the chart you choose to use.

This article will be part of a series where we will explore how to form the right questions around data to visualise and make sense of the data. This will help the chief decision makers to drive the desired outcomes.

Before we start exploring our options let's first understand the two major types of graphs (i) Cross sectional data (ii) Time series data.

 i)Cross sectional data 

Cross sectional data is when you analyse the data set over a fixed period of time. Cross sectional analysis means that we don't need to know how the data changed during that period.

Let's draw a chart to help us visualise the answer to the question “How many new customers did we acquire at each of the branches within the last week?”.Let's plot a bar chart the count (aggregation Method) of last 7 days data points (data range) grouped by branches (grouping).

fig 1 Barchart with cross sectional data

 

Using Figure 1 we can see that the answer to our question is the Jeddah (JED) branch. This graph shows how branches compare up, but this chart will not tell us how many sales we had on a specific day in the given period. Consider the question of “How many new customers did we acquire at each of the branches on Saturday?”. Clearly this graph cannot answer this question for us. And that’s where the time series comes in.

ii)Time series data 

Time series data is a series of data that is indexed by time. A time series will help us to identify how a variable changes over time. As an example let's take the same example “Which day in the last week did we see the highest number of bookings?”

Let's plot a line chart for answering this question with the following parameters Count ( aggregation method) of data points in the last week (data range) plotted with a frequency of 1 day (frequency), grouped by branches (grouping).

fig 2 line chart with one time series data

 

It's pretty clear from the graph that the answer is Friday. In this case it was important for us to know how the metric (count of bookings, in this case) had changed over the course of the last week.

Note that the above line chart only has the data points for the Riyadh branch. Line charts can also include multiple series to compare how different series (in our case bookings grouped by branches) are represented by a different line as shown below.

fig 3 line chart with multiple time series data.

 

Looks like the bump in bookings on Friday wasn't seen in the Jeddah branch. Now it's up to the decision makers to understand why it might have happened. Perhaps there was a lockdown imposed die to the pandemic?or was it because the online booking system failed? Answering these kinds of questions and acting on them is what makes data valuable.

How to decide between cross-sectional and times series graphs?

As a rule of thumb Cross sectional data is used to compare quantities that are grouped by a criteria eg: Sales by Branches, Sales by year, Expense by department.Time series data is used to understand how the data changes over a period of time, in order to find patterns with the variable itself.

So the first question that we need to consider in our data engineering pipeline for visualizing data points is  "Do we need a time index to observe the question we are trying to answer?". If the answer is yes choose a Time series plot, else we'll stick with a cross - sectional plot.

In the next part of the series we’ll explore the different options for plotting both cross-sectional and time series data(remember the big list of graphs from earlier?) and what kind of data is well represented by each type of chart.Stay tuned for Part 2 as we’ll explore what kind of questions are best answered by each type of chart.

 Be a part of our Newsletter to get updates on our best industry case studies.