In this article, we will outline how the San Francisco Giants increased their time-to-insights by 50% by leaning on Data Clymer to implement a modern data system.
The Challenges: Dirty Data, Siloed Data, Hard to AccessData, and Slow Analytics
Every entertainment organization wants to better understand their customers, ticketing sales, partner campaign success, and venue data. But how do you do so with siloed data and no centralized data platform?
This is exactly what the Giants were struggling with. Aside from these two obstacles, they were also faced with dirty data thanks to duplicated fan records, making data hard to access for the entire organization. Those barriers seemed too high to even start an analysis.
The Solution: Snowflake, Matillion, Melissa Data, and Tableau
With an eye on data culture, the Data Clymer team could see the strain lagging reporting times and lack of democratized data was putting on their analytics team. As a result, this reduced the trust the three-time champions had in their data.
Striving for democratized data for their sales and marketing teams, Data Clymer implemented a modern cloud data platform.
Why Snowflake?
Snowflake is of course a best-in-class cloud data warehouse provider, and we at Data Clymer only implement best-in-class tools, so naturally this is the new enterprise data warehouse for the Giants. Plus, Snowflake’s native support of JavaScript Object Notation (JSON files made it an easy solution to support the results from Melissa. Making it easy to store those results as a variant data type directly in Snowflake and then easily flatten the data into a structured format for further transformation processing.
Why Matillion?
Matillion is a cloud-native ETL/ELT tool built for Snowflake. It works especially well with Snowflake because:
- Matillion doesn’t store or transform any data natively, but instead leverage the horsepower of Snowflake to handle all the heavy lifting under the covers.
- Computing can be dedicated to the Matillion jobs, so they don’t affect analytical queries against the data.
- The ability to create custom meta-data driven ELT applications by creating configuration tables in Snowflake, so when you want to extract a new table from a source, you don’t even need to touch the Matillion code, but rather just add a new entry into a configuration table.
Why Melissa?
Melissa is a great Master Data Management (MDM) solution in conjunction with Matillion. The diagram below illustrates the MDM process.
What this meant for the Giants was it enabled them to clean email addresses, mailing addresses (access to the US Postal Service address database, among other libraries), identify if the address is business or residential, and give them the latitude and longitude of the address. Then, the data is sent through another Melissa service that matches using a variety of rules. Currently, the rules start with the fan’s first and last name as an anchor, then match across email address, mailing address, and phone number against the entire fan database of millions of records. Melissa returns a master key for all matches.
The output of this process is a “mapping” table that contains all the original fan contact data and the master key, providing the ability to trace back from the master key to the raw keys. And finally, that is rolled that up into master contact, account, and person tables. The Giants also track historical changes to the master tables to understand what changes may have occurred over time with the master keys. For those of you familiar with data warehousing best practices, we used a slowly changing dimension type 2 design pattern. That means that any change to the data in a master row triggers a new row to be inserted and the old row to be marked as historical.
The Results: Faster Analytics, Pre-packaged Solutions, and Partner Growth
With all their data in a single location, analysis was much faster. Actually, 50% faster according to Rocky Koplik, vice president of analytics. Projects that were once impossible due to level of effort are now within reach for the Giants.
Two major factors influence these results:
-
Democratized Data – data now in the hands of several departments
-
Cleaned data – fan record datasets reduced by 15% after de-duplication
Within weeks of having their system, the Giants had a better understanding of their customers. As summed up by Koplik, “We used to base this on TDC account number, which overestimated new customers being attracted through offers. Now, we can see when a new customer truly enters our database. We can better assess the types of offers that draw their interest and follow their customer path forward. The ability to more completely understand how individuals interact with the Giants by more easily seeing where people have individual accounts, business accounts or both.”
In addition, their solutions now help the Giants understand the following:
-
Ticketing – usage, sales, and packaging efficiencies
-
Stadium data – total entrances, seat data, concession sales
-
Forecasting – marketing, ticket pricing, and partner campaigns
-
Ad-hoc game-day analytics