Data Visualization, Excel, Microsoft Technologies, Power BI, Power Pivot, Power View

Infographic vs Power View

Someone on Twitter posted a link to an infographic on the 10 most visited cities in the world, which you can see below.  I’m interested in travel, and I like data, so I looked through it.

Most Visited Cities Infographic
Most Visited Cities Infographic from DailyInfographic.com

After a second or two of looking at it, my BI and dataviz nerdiness kicked in. Here were my thoughts:

  1. I know infographics are cool, and I love that they communicate an interesting data point to the general public in an easily consumable manner.  But I hate when they make them really large/long with a lot of text where you have to read through every bit of text to really get the info.  You lose a lot of the opportunities for preattentive processing with a format like this.
  2. I know I could make a great Power View from this data that could deliver information more efficiently, and I think it would look pretty cool.

Making the Power View

First, I had to get the data and put it in a Power Pivot model. I just copied the info from the infographic and put it in a single table in Excel.  You could split it out and put it into several tables (especially if you split the tourist attractions into 3 individual fields per city), but I didn’t need it for what I was making and the data was all at the same level of granularity in my single table.

Most Visited Excel table
Click to enlarge

I imported that into my Power Pivot model by selecting the table and  clicking the Add to Data Model button on the PowerPivot tab.  I had a vision of where I was going with my Power View, so I also imported a table of countries and flag images into Power Pivot. See yesterday’s post for info on how to import images into Power Pivot. I had originally split out the hotel price and the hotel name as two separate fields, but I found that I liked them together when I was building my Power View so I made a calculated field in Power Pivot to take care of that.  Here are my tables in Power Pivot:

Most visited power pivot 1
Click to enlarge
Most visited Power Pivot 2
Click to enlarge

You’ll notice on the Country Flags table, I had to change the table behavior properties to get the images to render properly in Power View. You can find this on the Advanced tab in Power Pivot. The important part is to set the default image to the flag image. Next I had to relate my most visited cities data to the appropriate flag.  Since there is one row per country in my Country Flags table, this was as easy as relating the country field in the Most Visited Cities table to the country field in the Country Flags table.

Most Visited Power Pivot 3
Click to enlarge

And finally, I built my report. You can download my workbook here.

Most Visited Power View
Click to enlarge

I used a map and cards to display the data. In the map, the bubble size represents number of visits. If you click on a bubble you filter the table below to see just the information for that location. If you hover over a bubble you can see the city name and visits per year. The table is sorted by visitors per year descending.

Why I Like This Power View

Let’s try an experiment: open the file in Excel and then open the infographic in your browser. See how long it takes you to answer the following questions with the infographic and then with the Power View.

  1. How many of the most visited cities are in the United States?
  2. If I were to plan a trip to visit 3 of the cities on this list, which ones would be most convenient to visit in one trip?
  3. What tourist attractions are in Hong Kong?
  4. How many visitors does Dubai get per year?
  5. In what country is Kuala Lumpur located? (I’m hoping you know this off the top of your head, but let’s pretend you need to look it up.)
  6. Which country has the lowest average hotel rate?

I think the Power View would answer all of these questions quicker or just as quick as the infographic.  If you have been looking at the infographic for a while, you might be pretty fast at #3, #4, and #5.  But you will be much slower at #1, #2, and #6 using the infographic.

There are two main reasons for this:

  1. The map provides visual information that you can take in quickly.  You can quickly see locations of these cities on the globe as well as their locations relative to each other. Because of preattentive processing (PDF) we can determine which places have the most visitors (size), which places are close to one another (spatial position), and where cities are located (color – the mauve bubbles stand out) without having to really think about it. I will note that both the Power View and the infographic do a good job of using enclosure to indicate which data points are related.  The Power View has the mauve bars to the left and the line above to separate each city.  I easily know that Malaysia is the country related to Kuala Lumpur and don’t confuse it with the information for Hong Kong above it.
  2. Interactivity allows us to quickly find information through sorting and filtering. For question #3, I just clicked on the bubble for Hong Kong to filter the table below and find that the tourist attractions in Hong Kong include Hong Kong Disneyland, Kowloon Walled City, and Ocean park Hong Kong. To find the lowest hotel rate, I just sorted my table by the Avg Cheap Hotel field ascending and looked in the first card to find Kuala Lumpur has hotels for $16. If you look for this information in the infographic you have to scan through every box until you find Hong Kong or look through all the boxes and remember the lowest hotel rate you have read.

Perhaps you think that it is unfair to compare an interactive Power View with a static infographic. I would argue that it makes a point.  When you build BI solutions including reports and ad hoc data discovery tools you should consider the needs of the end consumer to decide whether you need interactivity. They do make interactive infographics.  This author chose not to include that feature in her design. There are some advantages to the infographic around accessibility. The infographic can render in basically any browser on any device and can be downloaded as an image to where it doesn’t require internet connectivity. Power View maps currently use Silverlight and the maps require an internet connection. The preview of Power BI in Office 365 has an HTML view but it doesn’t yet work with maps, so you aren’t going to look at this Power View on an Android or iOS device.

I don’t think all infographics are inherently poorly designed; I don’t think this one is particularly bad.  I do think that the information could be better arranged to more effectively communicate to the audience. I recommend looking through Stephen Few’s presentation on Infographics and the Brain (PDF) for more info on this subject. I agree with Few that infographics are good when they simplify the data and tell a story to communicate a message with the data in addition to being entertaining.

Excel, Microsoft Technologies, Power BI, Power Pivot, Power View, SSAS, T-SQL

Fun With OPENROWSET

I’ve had several occasions to use OPENROWSET recently in T-SQL.  Although you can use it as an ad hoc alternative to a linked server to query a relational database, I’m finding it useful to get data from other sources. I’ll go into details of how to use it, but first I would like to acknowledge: OPENROWSET is not the answer to everything.  Sometimes you should use a linked server or SSIS or some other method.  I had some specific cases where I felt OPENROWSET made more sense for my task.

Importing Data from an Excel Spreadsheet

Technical need: I had some data that needed to be validated before importing into SQL Server from Excel as part of an upload process in an application.  If it didn’t pass validation, I did not want to import the data. Rather than use SSIS to import the data into a table and then check it, I used OPENROWSET to pull it directly from the Excel file into a temp table, ran all of my validation checks, and then imported the data using SSIS all within a stored procedure that was called by an application.  Field and table names have been changed to protect the innocent.

The stored procedure:

CREATE PROCEDURE dbo.MySproc
@datafile nvarchar(127) 
AS
BEGIN
    -- SET NOCOUNT ON added to prevent extra result sets from
    -- interfering with SELECT statements.
    SET NOCOUNT ON;
    DECLARE
        @sql nvarchar(max)
    ,   @rowcount int
 SELECT
        @rowcount = 0

    DECLARE @FileExists int

    BEGIN TRY
        EXECUTE master.dbo.xp_fileExist @datafile, @FileExists out
        -- Value of 1 = file exists
        -- Value of 0 = file doesn't
        IF @FileExists = 0
        BEGIN
            SELECT
            CONCAT('The file ', @datafile, ' could not be found.') 
                                                   AS reasontext
        ,   NULL AS field2
        ,   NULL AS field3;            
            RETURN -1;
        END
    END TRY
    BEGIN CATCH
        SELECT
            ERROR_MESSAGE()  AS reasontext
        ,   NULL AS field2
        ,   NULL AS field3;
        RETURN -1;

    END CATCH
 TRUNCATE TABLE dbo.finaldest;
    IF OBJECT_ID('tempdb..##tempf') IS NOT NULL
    BEGIN
        DROP TABLE ##tempf;
    END;

--Create temp table to hold values from Excel file
Create table ##tempf (
Field1 [nvarchar](255),
Field2 [nvarchar](255),
Field3 [nvarchar](255),
Field4 [nvarchar](255),
Field5 [nvarchar](255),
Field6 [nvarchar](255)
);
--dynamic sql to populate temp table with values from Excel file
    SELECT
    @sql = N'INSERT INTO ##tempf
   SELECT
   [Field1],
   [Field2], 
   [Field3],
   [Field4],
   [Field5],
   [Field6],
    FROM OPENROWSET (
                ''Microsoft.ACE.OLEDB.12.0'',''Excel 12.0;Database='
                + @datafile
                + ';HDR=YES;IMEX=1'', 
                ''SELECT s.* FROM [Sheet1$] AS S 
                WHERE [Field2] IS NOT NULL AND [Field3] IS NOT NULL'')';

    BEGIN TRY 
        EXEC sp_executesql @sql;
    END TRY
    BEGIN CATCH
        SELECT
            'File not found or invalid file type' AS reasontext
            --ERROR_MESSAGE()  AS reasontext
        ,   NULL AS Field2
        ,   NULL AS Field3;
 IF OBJECT_ID('tempdb..##tempf', 'U') IS NOT NULL 
        BEGIN
            DROP TABLE ##tempf
        END
      RETURN -1
    END CATCH
/*Do lots of data validation here
...
*/
--if data validation tests are passed, write to final destination table
Insert INTO dbo.[finaldest]
   ([Field1],
      [Field2], 
      [Field3],
      [Field4],
      [Field5],
      [Field6])
   Select [Field1],
      [Field2], 
      [Field3],
      [Field4],
      [Field5],
      [Field6] 
      from ##tempf;

 Drop table ##tempf; 
END

To make this work you need to install the Microsoft ACE OLE DB 12.0 provider on your server. You cannot use the Jet OLEDB Provider with 64-bit SQL Server and 64-bit Excel. Once you have this installed you need to change some settings:

EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', 
N'AllowInProcess', 1
GO
EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', 
N'DynamicParameters', 1
GO
sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
sp_configure 'Ad Hoc Distributed Queries', 1;
GO
RECONFIGURE;
GO

Here is a blog post that explains the errors you might get if you don’t have these settings correct. It also notes that you can go check your settings in the registry keys. For SQL Server 2012 the path is HKEY_LOCAL_MACHINE -> SOFTWARE -> Microsoft -> Microsoft SQL Server -> MSSQL11.MSSQLSERVER -> Providers -> Microsoft.ACE.OLEDB.12.0.  You can then see that AllowInProcess and Dynamic parameters are set to 1.

RegKeyACEOLEDB

This provider seems to be a bit fussy and there are some things that took me a while to troubleshoot once I got started.  First, make sure that the location of the Excel file you are reading is accessible to the account that is running the SQL engine and doesn’t require admin access. The Linked Server errors that get returned with this really aren’t that helpful, so you have to use your Google-fu to work your way through it.  I used a global temporary table so it would be accessible across connections if it needed it to be (I was having trouble when I used a local temp table).  The application that calls this stored procedure queues up the calls and executes them serially so one can’t override the other.

Returning results from an Analysis Services cube in SQL

Technical need: I had some data (several facts with common dimensions) with calculations in a multidimensional cube and an application that needed that data but couldn’t/wouldn’t consume the MDX result set.

The query: For demonstration purposes I’m using a small query that returns the values for a drop-down box for selecting a member of the date dimension.

select a."[Measures].[DateKeys]" as DateKey, 
a."[Measures].[DateLevel]" as DateLevel, 
a."[Measures].[DateValues]" as DateValue 
from OPENROWSET 
('MSOLAP','Datasource=localhost; Initial Catalog=MyCubeDB;',
  
'with member [Measures].[DateValues] as
    Iif([Date].[Fiscal Calendar].currentmember.level_number = 2, "    " 
    + [Date].[Fiscal Calendar].currentmember.member_value, 
    [Date].[Fiscal Calendar].currentmember.member_value)
  member [Measures].[DateKeys] as 
    [Date].[Fiscal Calendar].currentmember.uniquename
  member [Measures].[DateLevel] as 
    [Date].[Fiscal Calendar].currentmember.level_number

select
 {[Measures].[DateValues], [Measures].[DateKeys], 
                           [Measures].[DateLevel]} on 0 ,
 {Exists(Descendants([Date].[Fiscal Calendar].[FY2014], 1, 
                         SELF_AND_BEFORE), , "MyMeasureGroup") } on 1
 from [MyCube]
') as a;

You can see that I explicitly named my fields, but you can also do a select * to get all the fields from the OPENROWSET. I had to put the fields from the cube in quotation marks to make it work because of the brackets in the field names. For Analysis Services, MSOLAP is the name of my provider.  You must have the correct version of this provider listed under the linked servers on the server where this is being executed.

MSOLAP ProviderFor SQL Server 2012 SP1, you can download the Microsoft Analysis Services OLE DB Provider for Microsoft SQL Server 2012 SP1 here. There is nothing special about the data source and initial catalog. They are exactly the same connection info you would use when connecting to SSAS through Management Studio.  The last part is simply the MDX query.  You should be able to copy that part out and run it in Management Studio connected to your SSAS database and get results.

Inserting Images Into a Database

Technical need: I was building a demo for Power BI and I needed to show some images in my Power View reports in Office 365.  In order for the images to show in Office 365, you must import the images into SQL Server and then import them into Power Pivot. First I downloaded images of flags for each country from the CIA world factbook.  I took the folder of flags and put it in my pictures folder. I already had a table in SQL Server that contained country names and codes.  So I just needed add a flag for each country.

The query:

  CREATE TABLE [dbo].[countryflagsdemo]
  (
  [id] int, 
  [country] varchar(100), 
   varchar(2),
  flagimage [varbinary](max)
  )
 --get ID, country, and code from existing table
  INSERT INTO dbo.countryflagsdemo (id, country, code)
  SELECT id, country, code FROM dbo.Countryflag1;

  DECLARE @i INT; 
  DECLARE @rows INT; 
  DECLARE @code VARCHAR(2);
  SET @i = 1;
--get max id value for loop
  SET @rows = (SELECT COUNT(country) FROM dbo.countryflagsdemo)
  WHILE (@i < @rows)
  BEGIN 

    DECLARE @sql VARCHAR(MAX); 
    SET @code = (SELECT code FROM dbo.countryflagsdemo WHERE [id] = @i);
    SELECT @sql = 'UPDATE dbo.countryflagsdemo 
      SET flagimage = 
      (SELECT BulkColumn FROM 
        OPENROWSET (BULK 
            ''C:\Users\mlongoria\Pictures\flags\large\' + @code 
               + '-lgflag.gif'', Single_Blob) as flagimage)
            WHERE  = ''' + @code + ''';'

    EXEC (@sql);
    SET @i = @i + 1; 
    --PRINT @code;

END 
GO

I figured out how to do this based upon this MSDN blog post. I inserted my other data first and then added the images.  I’m sure you could find a way to do this all in one select statement.  Since this was a one-time thing I didn’t see the need to find another way. It should be noted that the bulk insert must pull from a location on your computer; you can’t use links to the images on the internet. I downloaded the images in a folder. Each image name was the abbreviation for the country. If you are going to import this data into PowerPivot, be sure you change the table behavior properties so it will recognize the varbinary field as an image.

 

There may be other ways to accomplish my goals, but it’s nice to understand how to use OPENROWSET and have that in my toolbox.