A couple of months ago, I came across an article in the Atlantic that showed an animated gif of the most popular baby names by state by year. I decided to visualize this data using various add-ins and chart types in Excel.
Retrieving the Data
I found the data on the Social Security Administration website. This data showed the top 5 girl names in a table for each state for a given year. I needed to pull in multiple years, so I used Power Query and wrote a bit of M. I followed the guidance from a great blog post by Devin Knight:
- Choose From Web and enter the url for a page containing any year of data.
- Modify the M to create a function that accepts a year parameter and uses that to formulate the correct url for each year.
- Create a second query from scratch and provide an array of years.
- Convert this data to a table and insert a calculated column that pulls in the data from the first query.
- Expand the columns, rename columns, and change to appropriate data types where necessary.
For more detail, see Devin’s post. He does a great job of explaining and providing screenshots.
Once I had my data it was time to show it off.
My first instinct was to use Power Map since its specialty is showing changes in geospatial data over time. After trying each type of chart in Power Map, I found that the region shading was the closest to what I wanted.
Although I wasn’t just trying to duplicate the animated gif, I found it difficult to display this data in an equally effective way in Power Map due to the lack of ability to add data labels or customize annotations. There was no good way to show the most popular name on the state. I chose to use annotations to display the name. You can use data fields to populate the title and description of an annotation. The biggest drawback of using annotations is that there is no way to resize or relocate them. If I were to add annotations for each state, they would overlap and you wouldn’t be able to read many of them nor would you be able to see most of the map. I was able to add enough annotations that there was usually one annotation for each of the names shown on the map at any time.
One other issue I noticed is that occasionally an annotation will fail to update on time, so the data in the annotation will be old and correspond to a previous year. If you watch the video for my Power Map, you will see that the annotation for Virginia gets stuck on 1983 for a few years, but it eventually catches up. Update: Microsoft responded to my question on this. We figured out that the annotations will update as expected if you set the Time to Date(Year) and then change the time settings to “Data Shows for an Instant”. I have updated my workbook with these settings. You can watch the video of my Power Map tour here. Outside of the inability to add good labels, watching the states change color over time works well. You can have a legend that shows the names with the colors, but it will show every name from the data set and will not be an effective use of space.
Power View provides the ability to create bubble maps. I made a bubble map tiled by year.
If you click on a year tile, you can then use the arrow keys to flip through the years; you can also use the scroll bar below the tiles. I think this provides a better visual than the Power Map for this situation since it is easier to see which names go with which states. Because of the bubble colors, I can see how the trend went from Jennifer to Jessica to Ashley in the 1980s. We also notice that there are 5+ top names from the 1990s forward, whereas the 1980s had 2 – 4. The only real problem I encountered with the map in Power View was the limitation of the colors in the themes. When there are several names on the map, some of the colors can be very similar. I tried several themes and couldn’t find one where this wasn’t the case.
Next I decided to stray from maps and see if there were other chart choices that suited this data. I created a pivot table with data bars and a timeline slicer.
I can still tell which baby names were most popular across the US and see how the trends change over time. The piece of information that I lose in moving away from maps is seeing how trends affect states that are geographically near each other. For instance, I can tell that Jessica was popular in 1985 in states such as California, Florida, New York, and Michigan. But I don’t immediately recognize that Jessica was popular only in New England, Florida, and the Mountain West and West Coast because Ashley had taken over the Mid-Atlantic and Midwest. One advantage of the pivot table is that I can select multiple years. So I can learn things like Ashley was the most popular name overall in the 1990s.
Adventures with Apps for Office
Microsoft Research published several free apps for data visualization in Excel. I decided to use a couple of them to visualize the baby name data. You can access these apps on the Insert tab. Click Apps for Office -> See All. Then click Find More Apps at the Office Store. One thing to note about these apps is that they require the data to be formatted in a certain way. See the instructions in the links below for details on this.
A streamgraph shows how a set of numbers has changed over time using their relative area. It’s kind of like an area chart that doesn’t use an absolute Y axis.
I do not think this is an effective way to display the data. Area graphs and streamgraphs both fail when trying to display a large number of values. In this specific case, you can only pick a color and the shades are determined for you. It’s difficult to distinguish some of the shades from others. The fact that the values for the names can overlap and aren’t a specific shape (like a rectangle where you only have to judge length) make it difficult to interpret quickly. You lose most of the advantages of pre-attentive processing because you really have to study the graph to get information from it.
A treemap is a hierarchical view of data that breaks it into rectangles and uses size and color to communicate information.
For some reason there is a size limitation on this treemap so I couldn’t make it big enough to where all the names would fit in the boxes. Since the treemap plots the boxes so they fit into the overall rectangle, years are not placed in sequential order. In my treemap, both the color and size represent the number of count of names for the year. Using a treemap removes the ability for the viewer to see trends over time. I have to look in the bottom row for 1980 – 1982 and then jump to the top left to find 1985. I also had to summarize to top names by year, losing any geographical/state information. One thing this does do well is compare 2 – 3 values for a name across years. For instance, I can compare the shade of green for Jennifer in 1981, 1983, and 1985 to see the decline in popularity of Jennifer over time.
And the winner is…
I think the Power View does the best job of communicating all of the available information from the data set while being visually appealing. We can see geographic relationships and temporal relationships. It’s easy to read and quickly see which names were top in which states at any year.
You can download my Excel file here if you would like to check it out or try to improve upon it.
4 thoughts on “Most Popular Names Visualized in Excel”
Wow this is really awesome! Nice job!
Good job, was a nice read! 🙂