Sunday, February 8, 2015

Published Data Sources: The Good and The Insufficient

The Good

Publishing data sources to the server allows us to

- Centralize data sources (much like a data mart)
- Share them with all the authenticated users
- Reduce workbook sizes
- Increase workbook publishing/upload speed
- Schedule data refresh with fixed frequency

The Insufficient

However,the published data sources are not so easy to use. A number of product defects or design oversights could have hindered the adoption of server-based data sources.

- Switching to server data sources breaks a workbook in many ways.

Many of us start designing vizzes using a file-based data source. In the end we replace it by its server image. Then everything breaks.

- Unable to change data types in data sources

It feels like a given that we can convert the data type of any dimension or measure. If a dimension or measure is from a server data source, we can't convert its data type. Not sure why this happens. The conversion is part of the application or the workbook. It shouldn't have anything to do with the data source.

- Incomplete property information

We may have multiple data sources of the same name across sites, projects and owners. However we don't see them in data source properties. And we don't see the last update time. So, we may not uniquely identify a server data source and don't know when it is updated.

- No access to data sources from different sites on the same server

A Tableau server is organized as site > project > owner > file. Each site has its own collection of data sources. They all share a single sign-on through the Tableau server. From one workbook, however, we are not allowed to access sources of different sites. To us, a site is just a partition of a specific function, such as development, QA, production etc. A workbook is limited to accessing a single Tableau server and data sources from a single site therein. We hope that this limitation can be lifted. This issue should be fixed using permission settings.

Summary

Server based data sources are great in many aspects. We would love to use them as much as possible. However its design leaves much to be desired. It seems to hamper the adoption and the deployment of server data sources. We hope that these defects can be fixed soon.


7 comments:

  1. I think a lot of the shortcomings you describe are based on your implementation of the features...not that Tableau offers a lot of advice on ideal application architecture.

    I prefer that sites are completely isolated for security purposes. In my opinion, sites should only be used for that purpose. If you do SDLC, then you can easily accomplish that without sites by using project folders.

    In our organization, the only way you get access to enterprise data is through Tableau Server. So there isn't a need to migrate from local to server data sources. We centrally manage the data to ensure key definitions are used consistently in all reports. It also ensures we don't get a mass proliferation of data sources and we don't have massive workbooks published to the server with a lot of duplicate data.

    By the way, you can change the data type for server sources. You just need to duplicate the field or you might have to cast it as a different type with a calculation.

    Also, you can see the last time a server extract was refreshed from the connect to data screen in Desktop.

    ReplyDelete
    Replies
    1. Mark, thanks for shedding lights on server data source management. I wonder if you can write up an article detailing your best practices on the topic that people can look up to.

      In general, Tableau primes itself to be the data visualization platform for the mass. It has penetrated the sectors such as hospitals, banks, schools, non-profits etc. which are not well versed and trained in SDLC or certified by ISO9000 or 6 Sigma.

      Regarding your comments, I think that as a tool, Tableau should give people more flexibility regarding sites. It doesn't have to do with SDLC. It's up to the administrator to set up permissions according to the need of its organization.

      There is no security concern because I have legal access to all sites (single sign on), but can't access data sources on them simultaneously.

      Workaround is something we do when we can't do it in the normal course of the action. Not allowing to change data types is something I don't understand. It doesn't have anything to do the data source. It's only part of the application. Thanks for the tip though. I expect a solution more elegant than that in the future from Tableau.

      Yes I can check the update time in the server. But first, I need to uniquely identify the data source I am using. The server doesn't prevent us from using the same name in different sites, projects, owners etc. Adding more properties to the data source menu isn't so hard to do.

      Delete
    2. I'll definitly consider this as a topic for my next post. Thanks for your contributions to the community!

      Delete
    3. For example, our sites are divided as Production, Dev, QA. These are mainly from a standard SDLC point of view. We have workbooks that are in dev or QA, Most data sources are well audited and validated and we put them in Production site. Then all workbooks in Dev and QA won't be able to access them, unless we copy the sources everywhere. That's why we need access to data sources in different sites.

      Delete
    4. You don't need sites to accomplish your SDLC controls. You could do the same thing with project folders without the downsides.

      Delete
    5. I could certainly. The sites were already set up before I use them. Then I found sites got limitations which are not well justified.

      Delete
    6. Mark's blog entry: http://ugamarkj.blogspot.com/2015/02/implementing-tableau-server-top-10.html?m=1
      Both of you, nice work. I think both your entries complement each other.

      Delete