Tableau Desktop is a powerful BI tool for the Data Warehouse. Tableau Prep is a promising but less mature product for cleansing new Datasets.
The weekly mood
This the end of my first quarter working as a full-time Architect. During this period, I had the opportunity to deep-dive into different topics such as DevOps, API and Data Management. It feels like a year from a content perspective, but actually not more than a month from a collaborative standpoint. Indeed, it has been and it is still a challenge for me to connect with other people within a distributed organisation and a world of pandemic. And as I am currently gaining visibility through a Data Lake project, I remind that as for my previous position (I've been working for years as a Solution Engineer), this Architect role might be sometimes very unclear, difficult to endorse and align with other people's expectation.
Architecture is a double-edge. It brings people together around Software design principles on one side, enforces constraints and authority on the other side. Which finally leads to a moderate acceptance from peers, a responsibility that is assumed to be better deserved by senior developers in their spare hours, than by a fully dedicated team with random expertise and privileges... According to the Sun Certified Enterprise Architect Study Book (2010), the "Designer is concerned with what happens when a user presses a button, whereas the Architect is concerned with what happens when ten thousand users press a button".
See also: Software Architect - A Role, not a Job
Last but not least, the Architect role seems to require a predicate of kind "takes care of" for being meaningfull at all e.g.:
- Enterprise architects care about Risk mitigation strategy around Digital Transformation, Standardization and Information governance.
- Software architects care about Design patterns, Agility and Testability.
- Infrastructure architects care about System provisioning and operation (SysOps), Security and configuration (SysAdmin), Software development lifecycle automation (DevOps).
- Product architects care about Unique selling proposition (USP), Minimum Viable Product (MVP) and Go to market (GTM).
- Solution architects care about pre-packaged Professional services and templates to repeat accross as many customers as possible.
- Business architects care about individual requirements, custom solutions and Return on Investment (ROI).
As a Cloud Architect I like the idea to not really fit into one unique category, but to work on multiple ones using the right mindset. I especially agree with the cross-concern principles of Modern Architecture as defined by Rajesh R V, Head Of Architecture at Emirates: Adaptability, Governance, Decision, Change, Resiliency.
Today I'd like to complete my previous articles around Data Analytics with a look at Tableau, the corporate standard for Data visualization at my company, and probably the most popular reporting tool on the current Business Intelligence (BI) market. We'll start with a general overview of reporting technology and requirements, then introduce Tableau platform components and finally evaluate the capabilities of both clients Tableau Desktop and Tableau Prep.
From Reporting to Insights
A lot has changed in the BI area since I built my first report using BIRT 15 years ago, an Open-Source project that was mainly backed by IBM until the group acquired Cognos. I found it great to be able to put a database query in direct relation with a chart that I could integrate in my Java application. However, it was a significant effort to build, run and maintain a Basic Report. Errors like "missing data" were badly handled. The end-result was not much actual and difficult to consume. I also worked on a project in which we used CrystalReport under Windows. It was already after SAP had acquired Business Objects, but reports were still produced on schedule. Of course, other Mega-vendors like Oracle and Microsoft also tried to choke the BI market through shipping of some Business Activity Monitoring (BAM) with their Data Warehouse (DW).
What reporting users actually asked the most for was the flexibility and ease to create, modify and run reports "as they need it" aka. Ad-Hoc Reporting, mostly like in Microsoft Excel, rather than having to ask and wait for the IT. As per the Self-service software approach, administrators would just need to provide reporters with a read access connection to their data sources. Second, Web-based user interfaces became so rich that no more client installation should be required. Third, creating reports could be done with no coding, visualizations could auto-refresh, show details on mouse-hovering and set filters on selection. New emerging products provided exactly that. The market leaders are Microsoft Power BI and Salesforce Tableau. While I have a preference for the Open-Source projects Metabase, (Databricks) Redash and Apache Superset, I find Amazon QuickSight and Google Looker interesting alternatives. All are available as a managed service.
In a recent article, the research institute Gartner monitored data analytics trends and reflected both the state-of-the-art and promising innovations in BI & Analytics market in 2020:
- Advanced-Analytics: Text Mining and Machine Learning (ML) supports the delivery of insights from Big data, e.g. a Customer churn prediction.
- Dynamic Storyboards: Communication is not based on pre-computed Reports but interactive tailored Dashboards, e.g. a Summary of historical measures.
- Prescriptive Analytics: Assist the user with smart function for both analytics design and decisional process, e.g. a Department project prioritization.
- X-Analytics: As most data is unstructured and technology mature, it becomes even more relevant to analyse other kind of data like Pictures and Videos, e.g. Tumor expansion rates.
- Augmented Data Management: Metadata as a dimension of any Data analysis including provenance, relations and versions, e.g. a Navigation for Catalog data.
Tableau is a proprietary BI and Analytics platform. It is developed by Tableau software, a company founded in 2003 and acquired by Salesforce in 2019, the market leader for Customer Relationship Management (CRM) in the Cloud. Tableau enables Visual Analytics of static and live data through beautiful Reports which the user can interact with.
As compared to its direct competitors, Tableau generally offers a larger set of capabilities and extensions, requires less technical skills and offers better guidance and support to the user. The Total Cost of Ownership (TCO) is generally higher.
What's in the box
Tableau platform consists in:
- Tableau Desktop
- Client application for Mac and Windows.
- Connect to source systems and create Reports.
- Tableau Prep
- Client application for Mac and Windows.
- Cleanse, blend and publish Datasets.
- Tableau Server
- Self-hosted deployment.
- Tableau Online
- Managed deployment.
- Tableau Extension API and Gallery
- Community add-ons.
Tableau technology
Tableau all started at Standford University with the Polaris research project. Based on their findings, Tableau founders patented VizQL in 2003, a unique engine dynamically generating code from visual elements, that is compiled just-in-time (JIT). With this, non-technical users didn't require anymore traditionnal scripting to query for data, such as Structured Query Language (SQL) and Multi-dimensional eXpressions (MDX). The data was stored on the file system in a proprietary columnar format called Tableau Data Extract (TDE).
With the later rise of Big Data and Ad-Hoc Analytics, a Data store that is optimized for both large transactional (i.e. write) and analytical (i.e. read) workloads became a must have. Tableau acquired the in-memory columnar database Hyper in 2016, a product originated from some research lead on Morsel-Driven-Parallelism a few years before at the Technical University of Munich (TUM). This finally enabled Tableau to deliver maximal performance at minimal cost.
In addition, Tableau recognized the potential of using rich graphical libraries like D3js to gain user's hearts, provide unstructured Data discovery based on ElasticSearch (Tableau Ask Data) and invest in Smart capabilities.
Hands-on
Downloading a trial is as easy as visiting the product homepage, typing in your corporate e-mail and clicking on download. Since Linux is not supported, I'm letting my Linux workstation aside in favor to a Macbook.
Tableau Desktop for Mac
An application root project is called a Workbook. You can get started with sample workbooks or create your own. A Workbook contain 4 different kinds of tabs accessible from the bottom bar:
- Data Sources: A Data Source is a basic connection to a source system among common file formats, databases and business applications (see also the List of connectors).
- Worksheets: A Worksheet is a Business view of the Data source using various kind of representations, from tables to charts (see also the common types of charts).
- Dashboards: A visual Report wrapping over one or multiple Worksheets (ex. booking numbers, moving average and forecast for the sales quarter).
- Stories: A set of Worksheets or Dashboards, that can be used for both providing a visual narrative to the reporter (what is in this view and how to use it), and a presentation deck to the audience (what does this report mean for my current business).
All those assets can be published to a Tableau server for sharing with other users. Visual items (2, 3, 4) can be exported as Image or PowerPoint format. The share button also allows you to embed views as external content to any Web portal/page via Tableau JavaScript API.
In order to connect to Snowflake as a logical follow-up to my previous posts, I needed to provide my Operating System with the following requirements :
- Snowflake ODBC Driver
- iODBC SDK
- Data Source Name (DSN) as per the below script.
$ sudo cat < EOF > ~/Library/ODBC/odbc.ini [ODBC Data Sources] SnowflakeTNCAD = Snowflake [SnowflakeTNCAD] Server = <my_account>.<my_region>.snowflakecomputing.com UID = tncad Schema = CORE Warehouse = COMPUTE_WH Role = SYSADMIN Driver = /opt/snowflake/snowflakeodbc/lib/universal/libSnowflake.dylib Description = SnowflakeTNCAD Locale = en-US Tracing = 0 EOF $ /Applications/iODBC/iODBC\ Test.command iODBC Demonstration program This program shows an interactive SQL processor Driver Manager: 03.52.1419.0910 Enter ODBC connect string (? shows list): ? DSN | Driver ------------------------------------------------------------------------------ SnowflakeTNCAD | Snowflake Enter ODBC connect string (? shows list): DSN=SnowflakeTNCAD;PWD=<my_password> Driver: 2.22.0 (Snowflake) SQL> select count(1) from demo_db.core.message; COUNT(1) -------------------- 2 result set 1 returned 1 rows.
Once this is done, adding a Data Source to Tableau Desktop is pretty step forward.
However, you need to understand your data model and decide on your selection strategy before moving forward. At first, I wrongly dragged-and-dropped all my tables to the relationships pane and set the corresponding keys on their linkage.
This had the consequence to transfer the physical data model to the Worksheet with just additional key information, whereas I am actually expecting to work on a consolidated Dataset with highlighted measures and dimensions. Then I figured out that fact tables need to be explicitly "Open" before you drag-and-drop dimensions. This enables table joins and will transfer a logical data model to the Worksheet.
I also got irritated with the data preview. First, Tableau wrongly ignores numeric data types by automatically categorizing any column as discrete ("Abc") instead of continuous ("#"). Second, default column order set to "Datasource order" is the alphabetical order of table and column names. So I fixed this manually, hoping to simplify further report configuration.
Tableau guides the user with selecting an appropriate kind of chart depending on input measures and dimensions she/he drags-and-drops to "shelves". The "Filter" shelf contains attributes that you can preset and expose to the user via a "Card". The "Mark" shelf contains customizations like in our case a unique color per user, and weekday labels instead of numerical values as provided by the Dataset. One can also click the "Label" button to show all labels or Min./Max. within the chart. "Columns" are measures sticking to the X-Axis and "Rows" are variants sticking to the Y-Axis. You can easily swap the axis from the top toolbar. Finally, Actions allow to modify the layout depending on the context.
To summarize, even if Tableau can theoretically integrate SQL, Python and R scripts, all common design tasks are truly 0-Coding, actually mostly Drag-N-Drop, so that the learning curve is extremely fast once you have overcome the first few hurdles. In a Nutshell, Tableau says "goodbye" to SQL for the purpose of BI, and empowers the citizen to work with data. It is probably one of the most productive and efficient tools you can imaging for creating professional, lean and interactive reports.
Filter parameters and contextual actions increase the user autonomy and analysis spectrum which is especially meaningful in case reporter and consumer are two different persons. In terms of timeliness, the data extract can be triggered from an external scheduler via API, or self-refreshed through the "live-data" mode. First option is definitely relevant atop of a DW where the provisioning pipeline already runs on schedule, so that no unnecessary computing costs are generated. Second option is appropriate to smaller volumes and constantly changing source systems like for example Business Applications.
A very big drawback of Tableau is that it does not well support technical change. My organisation didn't use Tableau before migrating from Redshift to Snowflake DW a couple of years ago, but if we had to do it now, then we would also have to re-build every single reports from scratch. Same issue in case you are changing structures such as schema names. It seems that those who approved our Tableau adoption either didn't know, didn't understand, or didn't care.
If I was the one to decide, then it would definitely be a no-go. I am also very concerned about some of our departments gaining Self-service access to some raw tables, then running their KPI calculations directly inside Tableau reports. Indeed, those not only fritter away any opportunity for data governance, but also store some intellectual property inside a tool which does not handle it as such. In fact, Tableau is even worse than closed-source: It is closed-architecture. I must say that I am very supicious about it.
I will leave the pricing model over to some neutral information further below.
Tableau Prep Builder for Mac
The need for discovery, cleansing and profiling has been around for a while in the Data Management industry. A key to success is a smooth user experience and integration with surrounding systems.
Here we will use the Kaggle Dataset Amazon Consumer Reviews of Amazon Products. The text file in CSV format includes information on various products manufactured by Amazon (e.g. Kindle), as well as consumer rating and review text.
Tableau correctly recognizes the file encoding, header line, field separator and data types (date, discrete and continuous values) without any manual input.
In order to discover the data, you can scroll to the right and find the metadata column "Sample Values", or click on the second flow step from the above pane "view and clean data".
Unfortunately, I didn't find enough time and interest to look deeper into further capabilities such as Data Blending and Masking.
Once finished, Tableau Prep facilitates the result export or publish to Tableau Server. However, I don't see any support for development life-cycle (ex. versioning, test) nor for data governance (ex. documentation, auditing).
Negative experience
Talend Prep Builder has a recommendations button that suggested to remove some identifier fields, and to adjust the datatype of URL columns. I proceeded with the second one, expecting to support the data distribution analysis with URL specific metadata like for example subdomains. This started a background process eating all my whole CPU.and never completing.
$ top
PID COMMAND %CPU TIME #TH #WQ #PORT MEM PURG CMPR PGRP PPID 9040 Tableau Prep 91.8 03:00.02 8/1 1 150 207M 576K 0B 9038 9038 9060 Tableau Prep 39.0 01:55.29 14 1 409 95M+ 0B 0B 9038 9038
According to the documentation, Tableau automatically defines a sample size based on the number of rows and columns of the Dataset. However, it doesn't take any processing complexity and local computing resource into account. In my case, the sample size was equal to the Dataset size of 28K rows x 20 columns. Reducing the sample size to 1000 rows didn't solve the above issue, even after restart. In the header I could see a progress bar stuck at about 20% "validating flow and generating schema".
Others reported similar issues on this community forum, but the answer isn't much satisfactory.
Tableau costs
Tableau offers a free trial of 14 days.
After that, it is accessible by users that subscribed to one of the following licenses:
- Tableau Creator: $70/month, Desktop user, Min. 1 per instance
- Tableau Explorer: $42/month, Web-only user
- Tableau Viewer: $15/month, Read-Only user
The subscription pricing table is publicly available here.
What else can you do for free
Tableau Public is a free platform dedicated to sharing your reports with the world. Those are typically based on non-confidential datasets and open data. An example is the Global COVID-19 Tracker.
Tableau also enjoys a growing Open Source community around the development of system and graphical integrations: https://tableau.github.io.
Take away
Tableau Desktop became very popular by re-inventing the art of data reporting by limiting technical requirements to their strict minimum. Although our hands-on example is based on a pre-calculated measure, a reporter might be tempted to fully skip the DW. To be clear, Tableau has convenient storage and compute capabilities, but is by far not as transparent and powerful as a DW.
Tableau Prep Builder is a newer, less mature application, which supports blending and cleansing across multiple Data sources, so that reporters can explore and fix broken data before aggregating it. A usefull addition to the Desktop license, however transformation complexity and scalability are limited to the unrevealed weaknesses of the proprietary engine.
Unfortunately for Tableau, my overall impression of the software is critical given very promissing open-source alternatives, as well as new capabilities acquired by major analytical platforms and released out-of-the-box to the purpose of reporting. If like my organisation you are paying for Tableau anyway, then be aware that it comes with some significant cost and vendor lock-in, which you might be able to restrict step-by-step.
Comments
Post a Comment