In this article, we will outline how the San Francisco Giants increased their time-to-insights by 50% by leaning on Data Clymer to implement a modern data system.

The Challenges: Dirty Data, Siloed Data, Hard to Access, and Slow Analytics

Every entertainment organization wants to be better understand their customers, ticketing sales, partner campaign success, and stadium data. But how do you do so with siloed data and no centralized data platform?

This is exactly what the Giants were struggling with. Aside from these two bottlenecks, they were also faced with dirty data thanks to duplicated fan records. Making data hard to access for the entire organization. Resulting in barriers that felt too high to even start an analysis.

The Solution: Snowflake, Matillion, Melissa Data, and Tableau

With an eye on data culture, the Data Clymer team could see the strain the lagging reporting times and lack of democratized data was putting on their Analytic’s team. All while reducing the trust the three-time champions had in their data.

Striving for democratized data for their Sales and Marketing teams, Data Clymer implemented a modern cloud data platform.

Why Snowflake?

Snowflake is of course a best-in-class cloud data warehouse, and we at Data Clymer only implement best-in-class tools, so naturally this is the new enterprise data warehouse for the Giants. Plus, Snowflake’s native support of JSON files made it an easy solution to support the results from Melissa. Making it easy to store those results as a variant data type directly in Snowflake and then easily flatten the data into a structured format for further transformation processing.  

Why Matillion?

Matillion is a cloud-native ETL/ELT tool built for Snowflake. It’s an especially good tool in conjunction with Snowflake because:

  1. Matillion doesn’t store or transform any data natively, but instead leverage the horsepower of Snowflake to handle all of the heavy lifting under the covers.   
  2. We can dedicate compute to the Matillion jobs so they don’t affect analytical queries against the data.
  3. We were able to create our own custom meta-data driven ELT applications by creating configuration tables in Snowflake, so when you want to extract a new table from a source, you don’t even need to touch the Matillion code, but rather just add a new entry into a configuration table.

Why Melissa?

Melissa is a great Master Data Management (MDM) solution in conjunction with Matillion. Below is a diagram and description of the MDM process:

What this meant for the Giants was it enabled us to clean email addresses, mailing addresses (access to the US Postal Service address database, among other libraries), identify if the address is business or residential, and give us the latitude / longitude of the address. Then, with this normalized data, we send it through another Melissa service that matches using numerous rules. Currently, our rules start with the fan’s first and last name as an anchor, then match across email address, mailing address, and phone number against our entire fan database (millions of records).  Melissa returns a master key for all matches.

The output of all this is a “mapping” table that contains all of the original fan contact data and the master key, so this is how we can always trace back from the master key to the raw keys.    And finally, we roll that up into master contact, account, and person tables.   We also track historical changes to the master tables to understand what changes may have occurred over time with the master keys.  For those of you familiar with data warehousing best practices, we used a slowly changing dimension type 2 design pattern.  That means that any change to the data in a master row triggers a new row to be inserted and the old row to be marked as historical.

The Results: Faster Analytics, Pre-packaged Solutions, and Partner Growth

With all their data in a single location, analysis was much faster. Actually, 50% faster according to Rocky Koplik, VP of Analytics. Projects that were once impossible due to level of effort are now within reach for the Giants.

2 major factors influence these results:

  1. Democratized Data – data now in the hands of several departments

  2. Cleaned data – fan record datasets reduce by 15% after de-duplication

Within weeks of having their system, the Giants had a better understanding of their customers. As summed up by Koplik, “We used to base this on TDC account number, which overestimated new customers being attracted through offers. Now, we can see when a new customer truly enters our database. We can better assess the types of offers that draw their interest and follow their customer path forward. The ability to more completely understand how individuals interact with the Giants by more easily seeing where people have individual accounts, business accounts or both.”

In addition, their pre-packed solutions now help the Giants understand the following:

  1. Ticketing – usage, sales, and packaging efficiencies

  2. Stadium data – total entrances, seat data, concession sales

  3. Forecasting – marketing, ticket pricing, and partner campaigns

  4. Ad-hoc game-day analytics

Data Clymer is a premier consulting firm specializing in full-stack analytics and data culture transformation. Our proven methodology includes full data stack implementation, data democratization, custom training, and analytics to enable data-driven decisions across the organization. We have curated a set of best practices from our deep expertise in LookerTableauSnowflakeRedshiftBigQueryPanoplyMatillonDBT, Sigma, and Fivetran.

If you need professional help, contact us or follow us on LinkedIn.