Month: December 2013

Most Popular Names Visualized in Excel

A couple of months ago, I came across an article in the Atlantic that showed an animated gif of the most popular baby names by state by year. I decided to visualize this data using various add-ins and chart types in Excel.

Retrieving the Data

I found the data on the Social Security Administration website. This data showed the top 5 girl names in a table for each state for a given year. I needed to pull in multiple years, so I used Power Query and wrote a bit of M. I followed the guidance from a great blog post by Devin Knight:

  1. Choose From Web and enter the url for a page containing any year of data.
  2. Modify the M to create a function that accepts a year parameter and uses that to formulate the correct url for each year.
  3. Create a second query from scratch and provide an array of years.
  4. Convert this data to a table and insert a calculated column that pulls in the data from the first query.
  5. Expand the columns, rename columns, and change to appropriate data types where necessary.

For more detail, see Devin’s post. He does a great job of explaining and providing screenshots.

Once I had my data it was time to show it off.

Power Map

My first instinct was to use Power Map since its specialty is showing changes in geospatial data over time. After trying each type of chart in Power Map, I found that the region shading was the closest to what I wanted.

Power Map Baby NameAlthough I wasn’t just trying to duplicate the animated gif, I found it difficult to display this data in an equally effective way in Power Map due to the lack of ability to add data labels or customize annotations. There was no good way to show the most popular name on the state. I chose to use annotations to display the name. You can use data fields to populate the title and description of an annotation. The biggest drawback of using annotations is that there is no way to resize or relocate them. If I were to add annotations for each state, they would overlap and you wouldn’t be able to read many of them nor would you be able to see most of the map. I was able to add enough annotations that there was usually one annotation for each of the names shown on the map at any time. One other issue I noticed is that occasionally an annotation will fail to update on time, so the data in the annotation will be old and correspond to a previous year.  If you watch the video for my Power Map, you will see that the annotation for Virginia gets stuck on 1983 for a few years, but it eventually catches up. Update: Microsoft responded to my question on this.  We figured out that the annotations will update as expected if you set the Time to Date(Year) and then change the time settings to “Data Shows for an Instant”.  I have updated my workbook with these settings. You can watch the video of my Power Map tour here. Outside of the inability to add good labels, watching the states change color over time works well. You can have a legend that shows the names with the colors, but it will show every name from the data set and will not be an effective use of space.

Power View

Power View provides the ability to create bubble maps. I made a bubble map tiled by year.

Power View Baby NameIf you click on a year tile, you can then use the arrow keys to flip through the years; you can also use the scroll bar below the tiles. I think this provides a better visual than the Power Map for this situation since it is easier to see which names go with which states. Because of the bubble colors, I can see how the trend went from Jennifer to Jessica to Ashley in the 1980s. We also notice that there are 5+ top names from the 1990s forward, whereas the 1980s had 2 – 4.  The only real problem I encountered with the map in Power View was the limitation of the colors in the themes. When there are several names on the map, some of the colors can be very similar.  I tried several themes and couldn’t find one where this wasn’t the case.

Pivot Table

Next I decided to stray from maps and see if there were other chart choices that suited this data.  I created a pivot table with data bars and a timeline slicer.

Excel pivot baby nameI can still tell which baby names were most popular across the US and see how the trends change over time. The piece of information that I lose in moving away from maps is seeing how trends affect states that are geographically near each other. For instance, I can tell that Jessica was popular in 1985 in states such as California, Florida, New York, and Michigan. But I don’t immediately recognize that Jessica was popular only in New England, Florida, and the Mountain West and West Coast because Ashley had taken over the    Mid-Atlantic and Midwest. One advantage of the pivot table is that I can select multiple years. So I can learn things like Ashley was the most popular name overall in the 1990s.

Adventures with Apps for Office

Microsoft Research published several free apps for data visualization in Excel.  I decided to use a couple of them to visualize the baby name data. You can access these apps on the Insert tab.  Click Apps for Office -> See All.  Then click Find More Apps at the Office Store. One thing to note about these apps is that they require the data to be formatted in a certain way.  See the instructions in the links below for details on this.

apps for office

Streamgraph

A streamgraph shows how a set of numbers has changed over time using their relative area. It’s kind of like an area chart that doesn’t use an absolute Y axis.

steam graph baby names

I do not think this is an effective way to display the data.  Area graphs and streamgraphs both fail when trying to display a large number of values.  In this specific case, you can only pick a color and the shades are determined for you. It’s difficult to distinguish some of the shades from others. The fact that the values for the names can overlap and aren’t a specific shape (like a rectangle where you only have to judge length) make it difficult to interpret quickly.  You lose most of the advantages of pre-attentive processing because you really have to study the graph to get information from it.

Treemap

A treemap is a hierarchical view of data that breaks it into rectangles and uses size and color to communicate information.

tree map

For some reason there is a size limitation on this treemap so I couldn’t make it big enough to where all the names would fit in the boxes. Since the treemap plots the boxes so they fit into the overall rectangle, years are not placed in sequential order.  In my treemap, both the color and size represent the number of count of names for the year. Using a treemap removes the ability for the viewer to see trends over time.  I have to look in the bottom row for 1980 – 1982 and then jump to the top left to find 1985. I also had to summarize to top names by year, losing any geographical/state information.  One thing this does do well is compare 2 – 3 values for a name across years.  For instance, I can compare the shade of green for Jennifer in 1981, 1983, and 1985 to see the decline in popularity of Jennifer over time.

And the winner is…

I think the Power View does the best job of communicating all of the available information from the data set while being visually appealing. We can see geographic relationships and temporal relationships.  It’s easy to read and quickly see which names were top in which states at any year.

You can download my Excel file here if you would like to check it out or try to improve upon it.

Retrieving Lowest Level Hierarchy Members and Leaves in MDX

The Original Answer

I was answering questions on Stack Overflow when I came across a question about getting the last level of a hierarchy in MDX when you don’t know how many levels there will be. SSAS multidimensional allows for parent-child hierarchies without having to define a maximum number of levels.  A common use case for this is an employee reporting structure. You can create the same type of functionality is SSAS Tabular, but you have to build a flattened out structure and relate each level through DAX so there will always be a set number of levels even if they aren’t all populated.

This question was fun to answer because it can be done fairly simply and elegantly with MDX functions. I can get the members of the lowest level of the hierarchy with this:

[Dimension].[Hierarchy].LEVELS([Dimension].[Hierarchy].Levels.Count-1)

The Levels function can take a level number or name as the argument. [Hierarchy_expression].levels.count gives me the number of levels. I have to subtract one from that number when I supply that to the Levels function because it uses zero-based positioning (i.e. the 6th level is level 5).

I can accomplish the same thing using the Descendants function.

DESCENDANTS([Dimension].[Hierarchy].[All], ([Dimension].[Hierarchy].Levels.Count - 1), SELF)

I am giving the Descdendants function the distance as my second argument. I use the All member for the member expression so any descendant in the hierarchy could be returned.  Then you’ll see the same expression that I used above.  This time, I am subtracting one because this number represents the distance (number of levels) from the specified member. The description flag allows me to specify which descendants should be returned. It is optional and defaults to SELF if you leave it out, but I included it just to be explicit. It means that I am only returning the descendants on the level I specified.

The More Probable Answer

Once I had my answer all written up, I realized that I was answering the question that was asked, but I wasn’t sure that is what they really needed.  If you have a parent-child hierarchy, the bottom level might not mean anything in particular.  If the hierarchy is ragged, you can have leaves at various levels, so there would have to be a very interesting and specific question asked that would use the bottom level members as the answer.  I decided it was just as likely that the question might really be asking for all of the members that have no children (the leaves). I can use the Descendants function to answer this question as well.

DESCENDANTS([Dimension].[Hierarchy].[All], , LEAVES)

I once again supply my hierarchy expression.  I leave the level out and feed it a empty value and specify the description flag is LEAVES. This assumes that the cube is built with the HIDEMEMBERIF property set so that members are not repeated as children of themselves and there are no blank members.

Transitioning from thinking in T-SQL to thinking in MDX can be difficult, and sometimes the original poster doesn’t quite understand the way MDX works.  As I answer questions on Stack Overflow, I try to ask clarifying questions to get a better understanding of the reason for the question.  Many times someone will ask how to build a specific calculation, and other members will work with them to understand why they want to do it in order to provide a better answer.  I do this a lot at client sites as well.  I can save information workers lots of time by asking them what their end goal is rather than simply delivering an intermediate step when building their BI solution, providing them a complete and sometimes more accurate answer to their common analytical questions.