Luke Stanke

Data Science – Analytics – Psychometrics – Applied Statistics

Sports Analytics Mirrors Real Life: My Big Takeaways from SSAC17

Last weekend I finished up my first Sloan Sports Analytics Conference. For many in the analytics world this attending means running into big-name sports analytics figures: Bill James, Jessica Gelman, Nate Silver, Bill Barnwell, and Daryl Morey, to name a few. And while the conference boasts big names that packed very large auditoriums, there were some excellent smaller sessions that anyone in attendance could have taken a lot from. Here are a few of my bigger takeaways:


Perception doesn’t always meet reality.

In any given sport there are 2-3 organizations who on the leading edge of innovation. These organizations are consistent performers and are constantly seeking competitive advantages in-game and in-business. Teams are now tracking in great detail the physical excursion of players in practice to minimize injuries, keeping players healthy, and keeping the in-game product at a high-level for their fans. Even more impressive are some of the fan experience analytics. Some teams are tracking many sources of data – beyond box score, weather, and attendance data to understand fans habits.

The conference highlights big wins in sports analytics. But in hallway conversations with other organizations – those not presenting – they were quick to confide they felt way behind. And that’s not really a surprise – while sports analytics has been a hot topic for nearly a decade the business only began to boom in the past 5 years. A team on the leading edge five years ago can easily maintain their competitive advantage. This leaves teams late to the game playing major catch-up.

"Today, you will need to understand this, whether you are the first or last on the team" – Luis Scola on analytics #SSAC17 #futureofbball

— Sloan Sports Conf. (@SloanSportsConf) March 4, 2017

There’s an overused quote: “sports are a microcosm of life”. But this phrase also applies to analytics. In any industry it’s easy to believe that all organizations – especially your competitors – have a strong analytics culture and are building an ROI analytics behemoth. It’s just not the case. Reality is a handful of organizations – paraphrasing, but I don’t think it’s out of line – have “figured analytics out”.


A culture of analytics goes beyond the team (on the field/court).

Let me re-iterate from my earlier points: it’s important to use analytics to improve player performance and the in-game product, but the culture begins with leaders who drive business decisions with data. There are team executives across multiple sports that drive success using both their expertise of the game and advanced analytics. But there are also leaders who are not convinced of analytics and believe intuition and the eye-test are still the true way to gain competitive advantage. This is just isn’t true: research has largely shown leaders leveraging data and their expertise produce the best outcomes in any industry.

Building an analytics culture also requires capable employees who are domain experts, are have capacity with advancing technologies, can code, and can distill all the data into clear stories for the organization to use. People with these skills are nearly impossible to find across any industry – that’s why most executives call them unicorns.

Be able to program

— Daryl Morey (@dmorey) March 3, 2017

That’s the thing, finding the right people for a team is hard. Sometimes the right move is promoting an analytics savvy individual from within the organization; sometimes it’s hiring a fresh-out-of-school MBA graduate.

Both can backfire. That person internally promoted might be a strong communicator with the business but not have the expertise to place appropriate probabilities to leads, leading to wasted effort by ticket reps. And that recent MBA grad might know how to build the best models to place probabilities to leads but might not know how to talk to the business, leading to the exact same problem.

I’m not saying I have a solution for this. But it’s clear that every organization faces the same problem. And that’s not just in the sports world, but also in every industry, everywhere. It’s clear that sports teams are trending towards the integrated use of analytics across all lines of business but because talent is hard to find there have been some growing pains in some markets.


Make sure a story can be told from the data.

In every session I heard the same thing – directly and indirectly: analytics professionals need the capacity for data-based storytelling. And those stories need to be actionable. In a session with basketball professionals, coach Vinny Del Negro stated he receives a 40-page analytics binder before every game. That’s 40-pages of information 82 times a season. Meaning coach Del Negro has to absorb 3200 pages to develop in-game strategies for a dozen players. It’s too much information to digest, so he often just scans it for a handful of key takeaways he can bring to his team.

"There are two types of coaches: those that embrace analytics and those that are unemployed" @adirshiffman #SSAC17 #sustaininggreatness

— Sloan Sports Conf. (@SloanSportsConf) March 4, 2017

When it comes to the players, women’s professional Sue Bird tries to put aside analytics on herself to focus on her teammates strengths. That message carried from session to session: most pros already know their tendencies. But most were less familiar with their teammates and competition. And that’s where analytics is most useful on-court. With these digestible bites of data she makes in-game decisions on which open teammate should get a pass based on situational statistics of each player.

Regardless of background – in pro sports or in another industry – it’s clear that the analytics professionals have to be able to take complicated ideas and bake them into a 3-5 point story that has clear meaning and easily executable. Without this skillset analytics is practically fruitless. But with it teams can develop improved efficiencies in-game or in-business.

Recent NFL History

Please be patient: loading every play from every game since 2002. Trust me, its worth the wait!

13 Tips for Growing Data Science Teams

I recently presented to a local organization on setting/scaling up data science teams. We had a thoughtful one-hour conversation about the pseudo-prerequisites to accelerating the adoption of data science within an organization. Our conversation kept coming back to the same point: data scientists are a magnifying glass for an organization – whether if an organization has a strong analytic culture or if its’ systems are disparate – the current status will only be amplified with data scientists. But when data scientists unearth issues – in the data, in the infrastructure, or even the culture – its an opportunity to improve organizational productivity and culture. Now for some subtext:

In October of 2012, Harvard Business Review dubbed the Data Scientist as the sexiest job of the 21st century. Four years later, the immense demand remains unchanged for individuals who can write code to sift through both known-critical and seemingly superfluous data, correctly develop and apply statistical models, and effectively communicate key insights to improve organizational outcomes. In fact, as success stories continue to pour into the media, organizations – many you are very familiar with, but shall remain nameless – are hastily moving to add individuals with these skillsets onto their teams to explore their environments. With big data and analytics raising the eyebrows of most of c-suite executives, business units are catalyzing for the deep dive into the data science waters. While there are plenty of insights regarding the ideal skillset of a data scientist – hint: it’s the soft skills – very few mention the prerequisites for scaling data science teams. This begs the question: so if I am building a data science team, what should I do ASAP to make sure my team is successful? What follows are some of the overlooked aspects of building data science teams that my peers and I have personally experienced – some are even blockers for today’s most highly effective data science teams.

1. Access to data.

This shouldn’t come as a surprise, but data scientists work with data, preferably in the rawest form its collected. Giving permission to data might seem straightforward but some organizations will limit data scientists to data marts or cubes (pre aggregated data). With most data scientists spending a big chunk of their time playing data “Go Fish”, making data available allows data scientists to focus on what they do best.

2. Appropriate governance.

With freedom comes responsibility. Providing appropriate access to individuals who most likely have the highest analytical ceilings of an organization also means having strong data governance practices in place. Organizations should be thoughtful about the risk and reward of limiting access. Applying some basic data governance practices will allow data scientists understand the data, the data flow, and data quality issues.

3. Provide an analytic sandbox.

Data scientists are constantly building models and structuring data for analysis – its 80% of the job. Create a space for teams to build models and share data so that processes do not have to be re-run from scratch each time will save valuable business time.

4. Leverage data visualization tools.

This I believe is a big one. First, data science is more than just model building, it’s collecting critical insights about an organization and effectively communicating those insights. One of the best ways a data scientist can do this is by building tools that allow key stakeholders to see and interact with the story the data tells. Second, good data tools also build a culture of data fluency by allowing the business to interact and understand the who, what, and where of the business. They allow an organization to take action. The tools build trust around using data to improve the business and getting to the when, why and how.

5. Keep your team structures flexible.

If you are introducing the concept of data science teams to your organization for the first time you’ll probably spend hundreds of hours conceptualizing the ideal organizational structure. That’s fine, but just know that a team’s structure is likely to evolve in the first few years depending on the people and existing processes and structures you already have in place. In a nutshell, put more of a premium on collaboration over team structure.

6. Integrate data scientists with existing business units.

It’s great when other data scientists can collaborate with each other. It’s just as important that data scientists work with business units to better understand and solve problems. This would elevates data scientists from reactive stats-gatherers to proactive partners

7. Be sure your data science sponsors are 100% committed to supporting the team.

Data science return on investment can be accelerated – or limited – by leader support. These individuals are evangelists and enforcers for data science work across the organization. In some cases there are leaders who will leverage the idea of data scientists – not actually implement work, but hype the potential work – just to move up the organizational ranks, which hinders the effectiveness of data scientists and the organization.

8. Hire for diverse skillsets.

Prioritize culture and communication, but also make sure that your data scientists bring varying analytic skillsets. If you are trying to build a robust data science team it should not just be data scientists. Consider having supporting roles that might include a project manager, business analysts, data visualization specialists, UX designer/front-end developer and/or ETL developers

9. Develop internal team skillsets.

This should apply to all individuals in an organization because skillsets should be constantly evolving to match the tools and technologies that improve the business; consider embracing open-source technologies and cloud-based tools.

10. Be willing to fail.

Experimentation is a cornerstone of data science. It allows the business to learn what not to do. Experimentation shouldn’t happen on big parts of the business, they should be in small, controllable chunks. This is the heart of data science and many organizations might be afraid that they are going to lose. In reality they are going learn quickly about what does and doesn’t grow their business and should adopt the fail fast mantra.

11. Iterate quickly.

This is another cornerstone of data science. Data science projects shouldn’t take months to spin out initial insights. The team should learn what works and what don’t from the data using short cycles and focus on quick wins for the organization. If something works, then move it to a larger production environment.

12. Be comfortable with data not being perfect.

Because you are iterating quickly data might not be perfect. Data science is about action – avoid slowing down to do significant QA on data, extensive data collection, and/or complex modeling – at least to start.

13. Plan now for scaling up.

While you should start small and get some quick wins, start thinking about how you will want to scale and automate insights from the data science practice.

Wrap things up: highly effective data science teams don’t appear overnight. Building a successful team means developing people, culture, processes, science, and technologies. To do so, there are a number of things that can be done – many of these steps are rarely outlined for an organization but I’ve tried to highlight some of these components above. Most of the points above are just good business practices, practices that are highlighted by data scientists.

Developing a Dynamic Ticket Viz

Tableau Public now supports Google Spreadsheets. I didn’t think of this as anything special. But after I realized cloud-based spreadsheets means dashboards that can be automatically update. I set out on a way to constantly update data in Tableau. Given my limited toolset – mainly R, Tableau, and a handful of AWS tools.

The Data

I decided I wanted to monitor NFL ticket prices on an hourly basis. I’d hit the API for 253 regular season, United States-based games – 3 are in London. This is not easy since there is no pre-built datasets. I’d need to hit a secondary ticket market API and store that data somewhere.

API Data Extraction

I decided to use secondary ticket market data from the StubHub API. To stream data, I created a cloud instance of R and Rstudio on an AWS Elastic Cloud Compute instance. FYI: EC2 is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers. In short, it’s a cloud computer. 

Data Storage

Now comes the interesting part. I’m taking data from the API and processing it on EC2. I need to store it somewhere. Since I’m using Amazon already, used its siple files storage system: Simple Storage Service ­– or S3 – provides developers and IT teams with secure, durable, highly-scalable cloud storage.

Data Processing

The data from StubHub is compiled every hour. Its also stored on S3. Using that data I am processing and want to send it to Google Spreadsheets. And technically I was able to do this from R, but it required constant manual authentication for the R-Google handshake every 4 hours. That was a pain, and would require me to wake-up at night so I decided to go a different route.

More Handshakes

Good news. I could save data from EC2/RStudio into S3. From there I could load data from S3 into Google Spreadsheets using the Blockspring API that sends .csv files from S3 to Google. Also with the API I can update data hourly. Bonus. 


In case you are wondering here’s whats going on:


I wish this was as easy as I wrote it down to be. But a lot is going on.

The Visualization


There are two different things happening in this visualization. First are the line charts. Second are the stadium shapes. These are not difficult to pull, but take some time to do. Remember data was pulled from StubHub. In this data are IDs for stadium and the corresponding zones. This means if I have the shape files I can plot data as shapes. Here is a stadium for an upcoming game.


Good news. The colors on each of the sections of the stadium are the result of custom polygons in the html code. This means I can download the html from the webpage and extract the polygons from the graphic. I did this using some simple R code. This extracts the svg. That’s the good news.

stadium <-

   paste0(loc, team, '.htm') %>%

   xml2::read_html() %>%

   rvest::html_nodes('.svgcontainer svg path') %>%

   .[1:length(.) - 1]

stadium %>%

   as.character() %>%

   c('<svg>', ., '</svg>') %>%

   stringr::str_replace_all('class=\"st0\"', '') %>%

   writeLines(., file(paste0(loc, team, '.txt')))



The bad news: Another tool is needed. This just directs html on how to create the polygon, but it doesn’t have the actual points along the polygon ­– which is what I want. This requires a reformat of the data to a file type called .wkt. There are connectors for R to do this, but I didn’t have time to learn it, so I used an online site to make this happen. After I did this I saved the shapes – digestible by Tableau – back to the EC2/Rstudio instance. After that It’s pretty easy to read wkt files into R. This lets me get x and y values and some basic indicies.


More bad news: hand coding. The converter didn’t carry some of the underlying data over to the shapes. I couldn’t find a way around other than hand coding 9,900 rows of data. This meant looking at all sections in 32 stadiums. And some of them were very unorganized. But once this is done, I can connect ticket prices back to zones in stadiums.

Send a MMS Text

Prior to the start of this final Iron Viz feeder I joked with a few colleagues my next viz would be able to text a picture of it to you. And about 48 hours prior to submitting my visualization – while my data was still aggregating – I decided I needed to figure out how to do this. My coding skills outside of R are very limited, but there are a number of cool packages available in R that would allow me to send a text message. Here is how I did it:

Twilio is a messaging tool. I can send or receive text or phone calls with the service. Through the service I can make an API call and send the text message. The httr package makes it easy to do. Below is the code I used to send the message:


paste0('', 'XXXX', '/Messages.json'),

config=authenticate('XXXXX', 'XXXXX', "basic"),

body = list(

   Body='See which NFL games had the hottest ticket prices in week 2:!/vizhome/Whohasthehottestticketsrightnow/HottestNFLTicket',



   To=paste0("+1", stringr::str_extract(as.character(isolate(input$phone_number)), "[[:digit:]]+")),

   MediaUrl = "",

   encode = 'json'


Here I am sending my API a authenticated message in json format. I’m specifying the to number, from number, the text message and image URL. Everything is hard-coded with the exception of the input number – which I’ll talk about in a minute.

But first the image send from the MMS text. To do this I used the webshot package in R. This package requires phatom.js, ImageMagick, and GraphicsMagick. Once those were installed onto my EC2 instance I could just run the code below and it would grab an image of any website – Including the published tableau dashboard.




selector = ".tableau-viz",

delay = 8,

vwidth = '800') %>%

webshot::resize('300%') %>%


Then I just place this onto a webserver using the put_object function from the aws.s3 function:


The code you see above is essential for sending the text. But I still need a user interface to go with the code. I relied on Shiny, which is a web application framework for R. I created an open text box and a submit button which then triggered the code you see above. I wanted an alert to pop-up saying the message was sent, but I didn’t have time since this was less than 6 hours before the deadline for the visualization to be submitted and I still didn’t even know what my visualization was going to look like.

I thought the functionality turned out all right considering I was still learning how to program it just hours before the due date. I just added a bunch of text clarifying how the text worked to the user. It’s overkill and takes away from the experience a bit, but it’s still pretty cool that it works.


#TableauTorch Mobile Visualization

Visualize an .png in Tableau

I’ve always thought it was really awesome how some people could replicate images in Tableau. I thought I would give it a shot. I figured it’d take a long time to figure out, but it turns out it’s pretty easy to get key underlying data in R. After doing that you can quickly plot that information in Tableau.

So let’s start with a photo of downtown Minneapolis:


Well, I want to reduce the resolution of the photo, so I used pixlr to change the width of the photo to 200 pixels.

Once we do this we can load this photo into R. We can do that using the readPNG function which is part of the png package.

image <- 

This produces an array that is height-by-width-by-array (RGB) in size. With this information I can edit the shape of the file.

###  Create empty space for the image
image2 <-
      nrow = dim(image)[1]*dim(image)[2],
      ncol = 5
  ) %>%
  dplyr::as_data_frame() %>%
    row = X1,
    column = X2,
    red = X3,
    green = X4,
    blue = X5
  ) %>%
    row = image %>% dim() %>% .[1] %>% seq(1, .) %>% rep(image %>% dim() %>% .[2]),
    column = image %>% dim() %>% .[2] %>% seq(1, .) %>% rep(each = (image %>% dim() %>% .[1]))

### Fill in the data 
for(w in (image %>% dim() %>% .[2]) %>% seq(1, .)) {
  for(h in (image %>% dim() %>% .[1]) %>% seq(1, .)) {
    image2$red[image2$column == w & image2$row == h] <- image[h, w, 1]
    image2$green[image2$column == w & image2$row == h] <- image[h, w, 2]
    image2$blue[image2$column == w & image2$row == h] <- image[h, w, 3]

Now let's add in the hexcolor as another column. The code I am writing allows for up to 27 levels of color. We'll also save the output

image2 %>%
    red = round(red * 3, 0)/3,
    green = round(green * 3, 0)/3,
    blue = round(blue * 3, 0)/3,
    color = rgb(red, green, blue, maxColorValue = 1)
  ) %>%

Now we can load this data into Tableau. Once we load it in, lets add the columns and rows into column and row pills as dimensions.

Screen Shot 2016-03-14 at 2.02.56 PM

Unfortunately, my photo is upside down. There a few fixes for this, the easiest is to reverse the row axis.

Screen Shot 2016-03-14 at 2.05.02 PM

Now the color of each point isn't what I am looking for, exactly, I am going to create some custom palettes that include all of the colors that are currently in the visualization. Here you can see all the colors in the graphic:

Screen Shot 2016-03-14 at 2.09.25 PM

The list totals 27 colors. Since a custom palette contains 16 colors, I'm going to make 2 palettes for the colors.
If you haven't ever made a custom palette, read this document.

If we create the colors in the same order they are sorted in the image above, the colors will be aligned from top to bottom and it makes it easy to load the correct color to the correct shape. The color change results in this viz:

Screen Shot 2016-03-14 at 2.24.46 PM

We've essentially done it! But now I want to edit the skyline so that it changes color based on the time of the day. The color of the sky is #AAAAFF. When it's mid-day, I want it to be bright blue, and when it's night I want it to be a dark blue. To do this. I need to make a dual axis chart. One chart includes all dots in the part of the sky, and another for all of the buildings.

To do this, I need to create two variables that will help sort things out later: Sky? and Exclude Sky. Both have the same formula:

IF [Color] = '#AAAAFF'
THEN 'Yes'

I'm going to change the shape type for both charts to be based off of each of these variables. Then I am going to hide the sky one one viz, the buildings on the other, and then create a dual axis chart with synchronized axes.

gif create dual axis

Now lets create the sky color. I want it to start getting dark by 7PM and to be fully dark by 9PM. I also want it to start getting light by 7AM and be fully light by 9AM. I'm going to use the logit function to make this happen. Here's the function for Sky Color.

IF DATEPART("hour", NOW()) >= 12
THEN EXP((DATEPART("hour", NOW()) - 7.5)/2)/(1 + (EXP((DATEPART("hour", NOW()) - 7.5)/2)))
ELSE EXP((19.5 - DATEPART("hour", NOW()))/2)/(1 + (EXP((19.5 - DATEPART("hour", NOW()))/2)))

I'll make this as the color of my sky on the Sky portion of this viz. But let's make the lights show up when it's dark by creating a Modified Color variable. Here's the code:

IF DATEPART('hour', NOW()) < 7 OR DATEPART('hour', NOW()) >= 19
THEN IF [Color] = '#AAAAAA'
    ELSE [Color]
ELSE [Color]

Now I'll replace the [color] variable on the color option.

skychange color

Now I'll put this in a dashboard and publish it:

Who should be the NFL MVP

It’s the end of the National Football League regular season tomorrow. With that, comes talk about who the best player — the MVP — will be. Since 2000, the MVP award has been won by 11 times by quarterbacks and 4 times by running backs. In fact, since the award has been given out only two defensive players — Alan Page in 1971 and Lawrence Taylor in 1986 have ever won MVP. The award has been given out to a special teams player just once — to Mark Moseley in the strike-shortened 1982 season. His recognition is a bit of an oddity, and some suggest he might not have even been the best kicker that year.

It’s easy to suggest what Most Valuable means — a player that has contributed the most to his team’s success — but it’s a lot harder to quantify that. It’s what smart statisticians that work for big sports companies work on all the time and think about with more data than I have access to, but I’m going to give it a shot. Using play-by-play data from Armchair Analysis, I created a model that gives the chances of winning of a game given the context of the games — things like time to play in the game, down, distance to go, distance from goal, and the difference in score.

Using this information I could then figure out what the chances of team are to win at any given point in the game. Take this situation: Let’s pretend Aaron Rodgers has the ball on his opponents 40 yard line. It’s 2nd and 7 yards to go with 7:50 remaining in the game and the Packers are down by four. In this situation, Rodgers and the Packers have a 45.92% chance to win the game. On the next play he is forced to run, and he gains 34 yards. The new odds of winning the game for the packers are 50.02%. This mean Rodgers contributed .051 points to his team’s chances of winning. We can figure this out for every player on every play. And then we can add it all up to figure out who’s the MVP.

So who is the MVP?

NFL Box Scores

I used to go to an awesome site to get important/cool data for playing fantasy football — Advanced Football Analytics — but it was bought by ESPN and the secret sauce/important information was no longer available to me. I had a few options: 1) cry like a baby; 2) find a different site; 3) find the data and develop my own analytics. I choose to create my own thing. I’ve developed a boxscore data tool in Tableau. It’s still early on but here’s what it looks like so far: