Wednesday, 29 November 2017

Tableau repositioning itself with regards to data preparation

For those of us that used Tableau for years, the changes in every version always seem to remove some need for external tools/code for data preparation. Think of the introduction of filled maps, the union feature, the excel data interpreter, the pivot and split, the spatial file connector (don't mention the pdf connector!). While certain Tableau partners/consultants are still keen on the Tableau-Alteryx stack, I'm not convinced of its long term market viability, and neither was Gartner last time I checked. The latest announcement on Project Maestro is a rather aggressive move from Tableau's side into traditional Alteryx territory.

If you do want my advice, learn some basic scripting, some coding, regular expressions, some unix or even good editor skills. You can only go so far with 'friendly tools' and you still have to spend a lot of effort learning them, so you might as well learn an open source transferable skill instead.   

Sunday, 26 November 2017

Colouring by secondary source dimension in Tableau, avoiding the asterisk

 The data below is small that you would wonder why I bother blending and not join, or even creating a group. Let's say this is a demonstration of a technique that proved useful with much more big and complicated datasets, and where the non blending dimension of the primary source didn't have an obvious hierarchical relationship with the dimension in the secondary source that yielded the asterisk.

So we have two sources, the primary one lists European election constituency regions per UK nation. The secondary one lists all the MEPs with their region and party
So, how do we blend those two, and do a bar chart of the MEPs of each region with the appropriate party colour coding?

 As you can see, once we put nation as a dimension from the primary source, the secondary source field 'party' cannot be used as a dimension, and we get the dreaded asterisk. Fear not, not all is lost.

There is a work around, but it only works for cases like this where there is a handful of Parties. We create a separate calculated field for each party's MEP, and use measure names on colour, and throw all these party MEP calculated fields on measure values (see screenshot above, calculations below)
if [Party]='CON' or [Party]='UUP' then [MEP] end

if [Party]='LAB' then [MEP] end

if [Party]='UKIP' then [MEP] end

if [Party]='SNP' then [MEP] end

if [Party]!='SNP' and [Party]!='CON' and[Party]!='LAB' and [Party]!='UUP' and [Party]!='UKIP' then [MEP] end
I've given a different scenario of avoiding the asterisk with calculated fields in a blend in a previous post here  

Using human vision's edge detection skills as a colour comparator


When visualising data we have to keep thinking about how human vision works and how can we work in synergy with our viewer's eyes and brain.
Human vision is very good at reconstructing the 3D world from the rather limited information the eye sees, and it can also do that when presented with a 2D image of the 3D world. Artists and scientists have studied and exploited these human abilities.

Basically our brain can do edge detection. It can see that the side of the table cloth is a bit darker than the top of the table, therefore the edge of the table is where the two meet. The luminance component of an image is so much more important to us than the chrominance that we spent nearly a century being amazed at black and white photography and cinema. Even our high tech digital codecs use higher resolution for the luminance than the chrominance components.

In my map visualisations I have tried to exploit this the other way around: use the eye's edge detection ability not so much to find the edge, but to distinguish the light from the dark side. See for example the two maps above. Once we put a hard border line, it saturates our vision, we can no longer see if the East of England or Yorkshire are the darkest blue. This is much like walking in a dark park at night and having a cyclist with strong LED lights coming towards you. You can definitely see the cyclist's light but you can no longer see the path or indeed much else.

The map on the left on the other hand suddenly allows us to make the most of the limited dynamic range. We can now see that the East of England is the darkest blue. It is harder to see the border between areas that are the same colour, but this is a Tableau map, that's what interaction is for. After all quite often these maps are for people who know where the borders are, they are trying to see the borders suggested by the data, not the borders known a priori. Adding a hard border is much like a naive painter's approach: putting more effort into picturing what we know rather than what we see.

Saturday, 25 November 2017

Converting hexadecimal values in Tableau

While Tableau has a lot of basic maths and string functions, coping with hexadecimal numbers is not something it can do natively. Let's see how we can do this with calculated fields.

To make the solution easier we break the problem into two: Interpret each hexadecimal digit, and then put the results together to convert the whole number to decimal. We create a calculated field for the rightmost hex digit (1s)
ifnull(int(right([Hex No],1)), 
case right([Hex No],1) 
when 'a' then 10 
when 'b' then 11 
when 'c' then 12 
when 'd' then 13 
when 'e' then 14 
when 'f' then 15 
end)
We do the same for the next digit (16s) where we can use mid([Hex No],5,1) assuming our numbers are in the format 0x023c. Likewise for the next two digits, 16^2s and 16^3s. Then we bring everything together:
[16^3s]*16*16*16+[16^2s]*16*16+[16s]*16+[1s]

Sunday, 12 November 2017

Working with Eurostat unemployment data in Tableau

Eurostat publishes among other things, unemployment data for the EU. The data is in a weird format, a hybrid between tab and comma separated values.
Having turned it into proper comma separated format, we remove the colons and pivot the years to get a more tableau friendly format:
While the year and quarter can be split, a dateparse calculation is more helpful:
dateparse("yyyy'Q'q",[Pivot Field Names])
 This finally allows us to do some analysis, looking at the unemployment trend for the ten countries of the 2004 accession. Cyprus which I've chosen to highlight is a definite outlier.
It had relatively low unemployment before entering the EU, has a fairly mild recession until the greek PSI when unemployment rises and rises, until the final bail in and associated bank collapse in 2013. This was a double dip, or in the case of unemployment, double peak situation, with the situation improving only to become worse again around the end of 2014 and beginning of 2015. Croatia was the only country of the group in a worse situation until Q3 2016 when the two cross over.  

Sunday, 22 October 2017

Work around for problems with Split function when connecting Tableau to PostgreSQL

I'm a big fan of the split function when doing calculated fields. Partly because far too often, the fields from the sources I work with are concatenations of other fields, or even because I've done a union of several CSV sources and some crucial bit of information is hidden in the table name/path. But recently I tried to use this with PostgreSQL and I got the following error
  • ERROR: function split_part(text[], unknown, integer) does not exist; Error while executing the query
Now I could investigate this further on the PostgreSQL side of things but I just want to get my thing to work and move on. So instead here's the workaround I came up with:
mid([concat_str],start_of_split, find([concat_str],'/',start_of_split)-start_of_split)
where start_of_split is a fixed number of characters (maybe you want to do another find here) and '/' defines the delimiter to use when splitting.

Sunday, 20 August 2017

Tableau and negative zero

Another saga from the big data frontier: at work we have a source of GPS data, and I've been working with a colleague to aggregate it to a degree grid, not unlike the example linked. The data is in hive, and bizarrely it has two columns for each dimension, latitude magnitude as a positive float, and latitude sense as a string (N or S) etc. for longitude. To make our life simple we round the magnitude in the custom SQL that connects to hive, and we make the data signed again with a simple calculation in the Tableau source:
[round Latitude magnitude]*
(case [latitude sense]when 'S' then -1
when 'N' then 1
end)
The rounding of the magnitude is fine as we also keep the sense in the group by dimensions in the custom SQL. The only special case here is when the rounded magnitude is zero, where had we done the sign assignment before the rounding, we'd have one bucket for zero instead of one for 0-0.5 and one for -0.5-0. But surely that shouldn't be an issue once we do the calculation above in tableau?

It turns out that it is an issue. I'm not sure what's happening in the data engine (two's complement going crazy because of negative zero?) but the two zeros are treated differently, recreated with the data of the post linked above.
 Sure enough, looking at the two zeros there's one for -0 and one for +0. So we refine our calculation to avoid multiplying zero by -1

if [round Longitude magnitude]=0 then 0 else
[round Longitude magnitude]*
(case [longitude sense]
when 'W' then -1
when 'E' then 1
end)
end