Manifesto of Internal Data Densification

Sep 5, 2019

Klaus Schulte

by Klaus Schulte & Rosario Gauna

Data Densification – you will have already come across this topic, when you have gone beyond ‘Show Me’ in Tableau:

  • You’ve added curvy aspects to your designs with curved lines or polygons,
  • you’ve created radial bar charts, or
  • you’ve created other visualizations, where the original data was simply not enough, to let your design idea come to life.

For all these mentioned and unmentioned use cases, you have used Data Densification to enrich the original data by adding extra points.

1. Data Densification

There are a lot of good blogs around describing the techniques that can be used for Data Densification. We can highly recommend for example Ken Flerlage’s video “An Introduction to Data Densification” as part of the #ThinkDataTuesday campaign by Tableau and the accompanied blogpost. Tableau greats like Jeffrey Shaffer, Jonathan Drummey, and Joe Mako have also talked and written about the topic.

About Data Densification

In Ludovic Tavernier’s and my (Klaus) session Surprise Me – Creating Advanced and Unique Charts in Tableau at Tableau Conference Europe in Berlin earlier this year, Ludovic explained the concept of Data Densification with this slide:

datadens.png
Data Densification techniques

When do we need Data Densification?

  • Let’s assume, the original data had only one record.
  • Now you have your design idea to visualize the data coming from this record as a rectangle. You cannot draw a rectangle with only one point, you need four points.
  • When you want to draw a curve, you need even more than four points, for a sigmoid curve some authors recommend for example 49 points. For shorter or longer curves, you may need less or more than 49 points.
  • So, for both design ideas, you have to add extra points.

How can we add extra points?

  • There are two techniques to add extra points, for both techniques, a set of records has to be created outside Tableau in a so-called model dataset.
  • The first technique uses pre-densified data in the model dataset. That means, the model dataset includes as many rows as needed for your design (e. g. four rows for a rectangle or 49(ish) rows for a curve).
  • The second technique uses bins to artificially densify the data. To create the bins in Tableau at a later stage, it needs only two rows to define the range in the model dataset, wherein the bins will be calculated (e. g. 1 & 4 for the rectangle and 1 & 49 for the curve) later.
  • For both techniques, the Cartesian product has to be built using a common link or a join calculation.
  • That means we have multiple copies of the data for the first technique (e. g. 49 copies for curvy designs). For the range & bins-technique the data has doubled.

In a Nutshell

Both techniques perform Data Densification on row level. The main advantage is that you could drill down to row level in your visualization and the design would still work. The biggest disadvantage is, obviously, that you have to duplicate your data. When you already have a ton of data, you probably don’t want to do this.

But: Do we really have to? Let’s try something different…

2. Internal Data Densification

The common techniques of Data Densification allow users to drill down to row level. There are for sure use cases, when you visualize row level data, for example when the data is pre-aggregated. We have seen this for example in many #makeovermonday datasets.

But in many, if not even in most (real world) visualizations, we visualize aggregations, often based on transactional data, like practiced many times with Tableau Superstore. And this is the key to a technique, that we have named Internal Data Densification.

About Internal Data Densification

While Data Densification densifies and multiplies the data by a factor n using a model dataset, Internal Data Densification creates a densificator from the original data and splits up the data into n partitions (these partitions do not necessarily have the same number of records). E. g. for n=2 we would double the data with Data Densification and build two (smaller) partitions with Internal Data Densification.

This can be done based on any dimension or measure in the original data, that has at least two distinct (for doubling the data) members, and that’s why this technique only works for aggregations.

This is important: Internal Data Densification is based on the assumption, that you always have enough members for the desired number of “copies” in your densification dimension for the level of detail in your view.

Let’s come back to this assumption after introducing the techniques.

How can we add “extra” points?

Internal Data Densification techniques
  • Use IF- or CASE-Statements for qualitative dimensions like names or geographical data.
  • Use Modulus for date dimensions like minutes, hours, days, weeks, or months and for any integer dimension or measure. (With the ASCII-function you can translate every name into an integer to be able to use Modulus as well.)
  • With Data Densification, each copy of the data is identical. Therefore, you can for example sum up averages. You can’t do this with Internal Data Densification. To get back the aggregation we’ve split up into our different partitions, we can use LODs. We come back to this later when we will show some use cases for this technique.

Let’s give an example with Tableau Superstore data and try to “double” the data with Internal Data Densification. The visualization we have in mind is to show category sales 2018 by month.

EXAMPLE 1

Using [Region] as densificator with an IF-Statement:

[Region_Dens] = IF [Region]="West" THEN 1 ELSE 2 END

Creating a view with MONTH(OrderDate) on columns, Category on rows and COUNTD(Region_dens) on label shows us, whether this densificator is valid:

Tableau - Inverted Data Densification 2019-08-24 13-54-17
Densificability Check

Yes, we have two distinct values (1, and 2) in each period for each category.

Would it also work for Sub-Category?

Tableau - Inverted Data Densification 2019-08-24 13-56-08.jpg
Densificability Check

No, we have periods with only one value (e. g. region West or one of the other regions) and some periods without any sales.

EXAMPLE 2

Using [Order Date] as densificator with Modulus:

[OrderDate_Dens] = DATEPART('day',[Order Date]) % 2

Modulus calculates a remainder. In this case, the formula above is calculating a “0” for even days (e. g. 8 = 2 x 4, remainder = 0) and a “1” for odd days (e. g. 7 = 2 x 3, remainder = 1).

You can use Modulus with any other dividend to define your densificator.

Example Modulus

When we look again at MONTH(OrderDate) on columns, category on rows and COUNTD(OrderDate_dens) on label, we can see that [OrderDate_Dens] is also a valid densificator.

Tableau - Inverted Data Densification 2019-08-24 14-03-51
Densificability Check

EXAMPLE 3

What if we needed four copies of the data? We can again use [Order Date] as our densificator:

[OrderDate_Dens_4x] = STR(DATEPART('day',[Order Date]) % 2)+"-"
+STR(DATEPART('week',[Order Date]) % 2)

This formula is combining a “0” for even and a “1” for odd days with a “0” for even and a “1” for odd weeks and is giving us four possible combinations (0-0,0-1,1-0,1-1).

Is it valid? Yes, it is!

Tableau - Inverted Data Densification 2019-08-24 14-11-51
Densificability Check

EXAMPLE 4

For the visualization we have in mind here (category sales 2018 by month) the number of “copies” you can create with this data is limited. We managed to create 10 copies by combining two densificators, but not more.

Superstore is not the perfect dataset for this technique. Perfect would be for example production data, where we have logs that come in regularly like every minute or every 10 seconds or something like that.

Trying modulus with data from my (Klaus) solar power plant (where the data comes in every 5 minutes), we managed to put together 3,456 valid combinations within minutes using modulus with days, hours and minutes on MONTH(Time)- and “Dimension”-level.

Tableau - Book4 2019-08-26 13-43-26
Granular data
Tableau - Book4 2019-08-26 13-42-21
Densificability Check

Enough “copies” to draw some nice curves 😎.

In a Nutshell

  • Internal Data Densification creates a densificator from the original data. This can be done based on any dimension or measure in the original data, that has at least two distinct members.
  • The technique performs Data Densification on aggregation level.
  • In order to use the technique, it must always be ensured that there are enough members for the desired number of “copies” in the densification dimension. Therefore, we highly recommend densificability checks (this term was made up by Ken Flerlage).

Ok, let’s head over to some use cases to demonstrate the capabilities of this technique. And these are massive!

3. Use Cases

Curved Line Charts

For our first use case, we want to create a Sigmoid-Style Line Chart. You can of course argue the purpose of such a chart – but hey, this is just to demonstrate the capabilities of Internal Data Densification.

Tableau - Book2 2019-08-26 20-31-41.jpg
A curvy line chart

To create this, we are leveraging Internal Data Densification in combination with range & bins-technique. To do this, we need one extra “copy” of our data.

i. Defining the Densificator

We want to show category profits for all years. Let’s check first if there is a valid densificator and try again Modulus with day of Order Date:

[Densificator] = DATEPART('day',[Order Date]) % 2
Tableau - Book2 2019-08-26 20-38-52
Densificability Check

Yes, we have a valid densificator!

ii. Setting the Range

For the range & bins-technique we have to set the range of 49 first. This is the number of points we want to use for our sigmoid-style curve. We call this calculated field path

[Path] = [Densificator] * 48

…and this gives us either a 0 or a 48. When we now create a bin and set the bin size to 1, we should get our 49 points.

Edit Bins [Path] 2019-08-26 21-08-28
Artificial densification with bins

Let’s check: Yes, 49 marks (you have to select “Show missing values” to get this)!

Tableau - Book2 2019-08-26 21-11-39
Range & Bins-Technique

iii. Defining the Measure

We have split up our data into two parts, therefore we have to redefine our Profit using a LOD:

[Profit LOD] = { FIXED MONTH([Order Date]),[Category]:SUM([Profit])}

We want to select YEAR(Order Date) as a filter at a later point. Make sure to set the filter as a context filter, otherwise you would have to add YEAR(Order Date) to this Fixed-LOD.

You can now use [Path] and [Profit LOD] and follow for example this blog by Toan Hoang to create curvy line charts.

Check out the curvy lines using this technique on Tableau Public.

Radial Bar Charts

In this exercise, I (Rosario) used the technique to create a radial chart with weekly sales in 2018.

Radial Bar Chart

i. Defining the Densificator

In this viz, we want to show weekly sales and two data points are required per week. Let’s again check first if there is a valid densificator and try Modulus, but now with the ASCII value of the first character of the Product Name field:

[Densificator] = ASCII([Product Name]) % 2
Densificability Check

Yes, we get a valid densificator.

ii. Defining the Measure

We have split up our data into two partitions, therefore we have to redefine our weekly sales with the following LOD:

[Weekly Sales] = {FIXED [Week]: SUM([Sales])}

In this exercise, using the technique allows each of the partitions to use different values for the radius:

[Radius] = 
0.15+(ATTR([Data_Dens]) * (sum([Weekly Sales] / {FIXED: MAX([Weekly Sales])}))

Resulting in:

  • When densificator equal 0: [Radius] = 0.15
  • When densificator equal 1: [Radius] = 0.15 + (sum([Weekly Sales] / {FIXED: MAX([Weekly Sales])}))

Play with the interactive version or download the workbook here.

Enriched Line Charts

Just a few weeks ago, I (Klaus) wrote about how to encode change in line charts.

10
Visualizing change between two marks

The blogpost describes how the data was tripled to achieve this. Again, let’s try it with Internal Data Densification.

i. Defining the Densificator

In this viz, we want to show category sales for all years. Let’s check first if there is a valid densificator and try again Modulus with day of Order Date:

Densificability Check

No, we have some months with only two distinct remainders and therefore the densificator isn’t valid.

What about this:

[Densificator] = int(Sales) % 3
Densificability Check

Yes, with Modulus of integer Sales, we get a valid one.

ii. Defining the Measure

We have split up our data into three partitions, therefore we have to redefine our Sales with the following LOD:

[Sales LOD] = { FIXED YEAR([Order Date]), MONTH([Order Date]),
              [Category]: SUM([Sales])}

Here, we’ve included YEAR(Order Date) in our LOD and can take year and category just as a dimension filter.

The rest is just finding and replacing both the old densificator (“Table Name”) and the old measure (Sales) with our new ones. That’s it!

You can find and download the updated workbook on Tableau Public.

Stock Portfolio

In this exercise, I (Rosario) used the technique to avoid pivoting data with three measures (Low, High and Price) contained in each daily record by stock. Thus, I could simplify the necessary calculations.

Stock Portfolio – #WorkoutWednesday 45/2017

The challenge in question was launched by the unique and incredible Andy Kriebel and corresponds to the 45th week of 2017 of #WorkoutWednesday. In that week, he challenged us to build a beautiful display style of a portfolio of stocks, an idea originally coming from Jeffrey Shaffer. This exercise was the first time where I experienced the benefits of using the Internal Data Densification technique. Below I share a summary of the main steps of the solution, applying the technique as I use it today.

i. Defining the Densificator

In this viz, we want to show the 52-week low and the 52-week high of a stock in the last year and the current price. Three data points are required per Stock (or Symbol). Let’s again check first if there is a valid densificator:

[Marks_Dens] = 
CASE DATEPART(‘day’, [Date]) % 3
WHEN 0 THEN “Low”
WHEN 1 THEN “High”
WHEN 2 THEN “Close”
END
Densificability Check

Yes, we get a valid densificator.

ii. Defining the Measures

We have split up our data into three partitions, therefore we get the value of the minimum price and the maximum price per stock in the last year and the current price per stock (the stock, in the table is identified with the name of Symbol) with the following LOD calculations:

[52-Week Low] =  {FIXED [Symbol] : MIN([High])}
[52-Week High] = {FIXED [Symbol] : MAX([High])}
[Price] = {FIXED [Symbol]: SUM(IIF([Date] = {FIXED: MAX([Date])}, [Close], NULL))}

Subsequently, for each of the three possible densifier values, associate a value within the x-axis.

[Mark Axis Value] = 
CASE [Marks_Dens]
    WHEN "Low" THEN0.004
    WHEN "High" THEN0.996
    WHEN "Close" THEN
    ([Price] -[52-Week Low])  /
    ([52-Week High] -[52-Week Low])
END

Finally, for each of the three possible densifier values, associate a measure with the label you wish to display.

[Mark Label Value] = 
CASE [Marks_Dens]
    WHEN "Low" THEN [52-Week Low]
    WHEN "High" THEN [52-Week High]
    WHEN "Close" THEN NULL
END
Formula Results

In this exercise, using the technique allows us to show for each of the stocks three marks (thanks to densification) and also for each of the marks to use independent measure values.

The updated version of the viz can be found here.

4. Summary

We both think, that Internal Data Densification is a very powerful technique, that allows you to create any chart in Tableau that would normally require the addition of extra points. We’ve shown four use cases but could have added many more.

Internal Data Densification can be used only under certain assumptions. Like described in our use cases, there has to be a valid densificator. Furthermore, it works only on aggregation level and best with large and granular data sets. Therefore, it is a nice addition to the existing Data Densification techniques, because these data sets are the cases, where Data Densification would ‘hurt’ the most. As mentioned before: when you already have tons of data, you don’t want to multiply it. For small or/and pre-aggregated data sets, you won’t have a problem to multiply your data.

That means, you have to know your data! We don’t recommend using this technique with live data, where you cannot be sure about what data comes in. Use it with completed/closed data sets or in cases, where you have full control over extract refreshes. Make sure that your densificator will work in ANY case!

That said we hope you enjoyed reading. Please use this technique and share your use cases with us on Twitter (@ProfDrKSchulte and @rosariogaunag) or in the comments below.

Thanks for feedback along the way to Jeffery Shaffer, Ken Flerlage and Ludovic Tavernier.

Rosario & Klaus