Luke Stanke

Data Science – Analytics – Psychometrics – Applied Statistics

Sports Analytics Mirrors Real Life: My Big Takeaways from SSAC17

Last weekend I finished up my first Sloan Sports Analytics Conference. For many in the analytics world this attending means running into big-name sports analytics figures: Bill James, Jessica Gelman, Nate Silver, Bill Barnwell, and Daryl Morey, to name a few. And while the conference boasts big names that packed very large auditoriums, there were some excellent smaller sessions that anyone in attendance could have taken a lot from. Here are a few of my bigger takeaways:

 

Perception doesn’t always meet reality.

In any given sport there are 2-3 organizations who on the leading edge of innovation. These organizations are consistent performers and are constantly seeking competitive advantages in-game and in-business. Teams are now tracking in great detail the physical excursion of players in practice to minimize injuries, keeping players healthy, and keeping the in-game product at a high-level for their fans. Even more impressive are some of the fan experience analytics. Some teams are tracking many sources of data – beyond box score, weather, and attendance data to understand fans habits.

The conference highlights big wins in sports analytics. But in hallway conversations with other organizations – those not presenting – they were quick to confide they felt way behind. And that’s not really a surprise – while sports analytics has been a hot topic for nearly a decade the business only began to boom in the past 5 years. A team on the leading edge five years ago can easily maintain their competitive advantage. This leaves teams late to the game playing major catch-up.

"Today, you will need to understand this, whether you are the first or last on the team" – Luis Scola on analytics #SSAC17 #futureofbball

— Sloan Sports Conf. (@SloanSportsConf) March 4, 2017

There’s an overused quote: “sports are a microcosm of life”. But this phrase also applies to analytics. In any industry it’s easy to believe that all organizations – especially your competitors – have a strong analytics culture and are building an ROI analytics behemoth. It’s just not the case. Reality is a handful of organizations – paraphrasing, but I don’t think it’s out of line – have “figured analytics out”.

 

A culture of analytics goes beyond the team (on the field/court).

Let me re-iterate from my earlier points: it’s important to use analytics to improve player performance and the in-game product, but the culture begins with leaders who drive business decisions with data. There are team executives across multiple sports that drive success using both their expertise of the game and advanced analytics. But there are also leaders who are not convinced of analytics and believe intuition and the eye-test are still the true way to gain competitive advantage. This is just isn’t true: research has largely shown leaders leveraging data and their expertise produce the best outcomes in any industry.

Building an analytics culture also requires capable employees who are domain experts, are have capacity with advancing technologies, can code, and can distill all the data into clear stories for the organization to use. People with these skills are nearly impossible to find across any industry – that’s why most executives call them unicorns.

Be able to program https://t.co/CDQaQwNCcG

— Daryl Morey (@dmorey) March 3, 2017

That’s the thing, finding the right people for a team is hard. Sometimes the right move is promoting an analytics savvy individual from within the organization; sometimes it’s hiring a fresh-out-of-school MBA graduate.

Both can backfire. That person internally promoted might be a strong communicator with the business but not have the expertise to place appropriate probabilities to leads, leading to wasted effort by ticket reps. And that recent MBA grad might know how to build the best models to place probabilities to leads but might not know how to talk to the business, leading to the exact same problem.

I’m not saying I have a solution for this. But it’s clear that every organization faces the same problem. And that’s not just in the sports world, but also in every industry, everywhere. It’s clear that sports teams are trending towards the integrated use of analytics across all lines of business but because talent is hard to find there have been some growing pains in some markets.

 

Make sure a story can be told from the data.

In every session I heard the same thing – directly and indirectly: analytics professionals need the capacity for data-based storytelling. And those stories need to be actionable. In a session with basketball professionals, coach Vinny Del Negro stated he receives a 40-page analytics binder before every game. That’s 40-pages of information 82 times a season. Meaning coach Del Negro has to absorb 3200 pages to develop in-game strategies for a dozen players. It’s too much information to digest, so he often just scans it for a handful of key takeaways he can bring to his team.

"There are two types of coaches: those that embrace analytics and those that are unemployed" @adirshiffman #SSAC17 #sustaininggreatness

— Sloan Sports Conf. (@SloanSportsConf) March 4, 2017

When it comes to the players, women’s professional Sue Bird tries to put aside analytics on herself to focus on her teammates strengths. That message carried from session to session: most pros already know their tendencies. But most were less familiar with their teammates and competition. And that’s where analytics is most useful on-court. With these digestible bites of data she makes in-game decisions on which open teammate should get a pass based on situational statistics of each player.

Regardless of background – in pro sports or in another industry – it’s clear that the analytics professionals have to be able to take complicated ideas and bake them into a 3-5 point story that has clear meaning and easily executable. Without this skillset analytics is practically fruitless. But with it teams can develop improved efficiencies in-game or in-business.

Recent NFL History

Please be patient: loading every play from every game since 2002. Trust me, its worth the wait!

NFL Drive Effectiveness 2016

13 Tips for Growing Data Science Teams

I recently presented to a local organization on setting/scaling up data science teams. We had a thoughtful one-hour conversation about the pseudo-prerequisites to accelerating the adoption of data science within an organization. Our conversation kept coming back to the same point: data scientists are a magnifying glass for an organization – whether if an organization has a strong analytic culture or if its’ systems are disparate – the current status will only be amplified with data scientists. But when data scientists unearth issues – in the data, in the infrastructure, or even the culture – its an opportunity to improve organizational productivity and culture. Now for some subtext:

In October of 2012, Harvard Business Review dubbed the Data Scientist as the sexiest job of the 21st century. Four years later, the immense demand remains unchanged for individuals who can write code to sift through both known-critical and seemingly superfluous data, correctly develop and apply statistical models, and effectively communicate key insights to improve organizational outcomes. In fact, as success stories continue to pour into the media, organizations – many you are very familiar with, but shall remain nameless – are hastily moving to add individuals with these skillsets onto their teams to explore their environments. With big data and analytics raising the eyebrows of most of c-suite executives, business units are catalyzing for the deep dive into the data science waters. While there are plenty of insights regarding the ideal skillset of a data scientist – hint: it’s the soft skills – very few mention the prerequisites for scaling data science teams. This begs the question: so if I am building a data science team, what should I do ASAP to make sure my team is successful? What follows are some of the overlooked aspects of building data science teams that my peers and I have personally experienced – some are even blockers for today’s most highly effective data science teams.

1. Access to data.

This shouldn’t come as a surprise, but data scientists work with data, preferably in the rawest form its collected. Giving permission to data might seem straightforward but some organizations will limit data scientists to data marts or cubes (pre aggregated data). With most data scientists spending a big chunk of their time playing data “Go Fish”, making data available allows data scientists to focus on what they do best.

2. Appropriate governance.

With freedom comes responsibility. Providing appropriate access to individuals who most likely have the highest analytical ceilings of an organization also means having strong data governance practices in place. Organizations should be thoughtful about the risk and reward of limiting access. Applying some basic data governance practices will allow data scientists understand the data, the data flow, and data quality issues.

3. Provide an analytic sandbox.

Data scientists are constantly building models and structuring data for analysis – its 80% of the job. Create a space for teams to build models and share data so that processes do not have to be re-run from scratch each time will save valuable business time.

4. Leverage data visualization tools.

This I believe is a big one. First, data science is more than just model building, it’s collecting critical insights about an organization and effectively communicating those insights. One of the best ways a data scientist can do this is by building tools that allow key stakeholders to see and interact with the story the data tells. Second, good data tools also build a culture of data fluency by allowing the business to interact and understand the who, what, and where of the business. They allow an organization to take action. The tools build trust around using data to improve the business and getting to the when, why and how.

5. Keep your team structures flexible.

If you are introducing the concept of data science teams to your organization for the first time you’ll probably spend hundreds of hours conceptualizing the ideal organizational structure. That’s fine, but just know that a team’s structure is likely to evolve in the first few years depending on the people and existing processes and structures you already have in place. In a nutshell, put more of a premium on collaboration over team structure.

6. Integrate data scientists with existing business units.

It’s great when other data scientists can collaborate with each other. It’s just as important that data scientists work with business units to better understand and solve problems. This would elevates data scientists from reactive stats-gatherers to proactive partners

7. Be sure your data science sponsors are 100% committed to supporting the team.

Data science return on investment can be accelerated – or limited – by leader support. These individuals are evangelists and enforcers for data science work across the organization. In some cases there are leaders who will leverage the idea of data scientists – not actually implement work, but hype the potential work – just to move up the organizational ranks, which hinders the effectiveness of data scientists and the organization.

8. Hire for diverse skillsets.

Prioritize culture and communication, but also make sure that your data scientists bring varying analytic skillsets. If you are trying to build a robust data science team it should not just be data scientists. Consider having supporting roles that might include a project manager, business analysts, data visualization specialists, UX designer/front-end developer and/or ETL developers

9. Develop internal team skillsets.

This should apply to all individuals in an organization because skillsets should be constantly evolving to match the tools and technologies that improve the business; consider embracing open-source technologies and cloud-based tools.

10. Be willing to fail.

Experimentation is a cornerstone of data science. It allows the business to learn what not to do. Experimentation shouldn’t happen on big parts of the business, they should be in small, controllable chunks. This is the heart of data science and many organizations might be afraid that they are going to lose. In reality they are going learn quickly about what does and doesn’t grow their business and should adopt the fail fast mantra.

11. Iterate quickly.

This is another cornerstone of data science. Data science projects shouldn’t take months to spin out initial insights. The team should learn what works and what don’t from the data using short cycles and focus on quick wins for the organization. If something works, then move it to a larger production environment.

12. Be comfortable with data not being perfect.

Because you are iterating quickly data might not be perfect. Data science is about action – avoid slowing down to do significant QA on data, extensive data collection, and/or complex modeling – at least to start.

13. Plan now for scaling up.

While you should start small and get some quick wins, start thinking about how you will want to scale and automate insights from the data science practice.

Wrap things up: highly effective data science teams don’t appear overnight. Building a successful team means developing people, culture, processes, science, and technologies. To do so, there are a number of things that can be done – many of these steps are rarely outlined for an organization but I’ve tried to highlight some of these components above. Most of the points above are just good business practices, practices that are highlighted by data scientists.

NFL Analytical MVP 2016

Which NFL tickets are hottest?

Developing a Dynamic Ticket Viz

Tableau Public now supports Google Spreadsheets. I didn’t think of this as anything special. But after I realized cloud-based spreadsheets means dashboards that can be automatically update. I set out on a way to constantly update data in Tableau. Given my limited toolset – mainly R, Tableau, and a handful of AWS tools.

The Data

I decided I wanted to monitor NFL ticket prices on an hourly basis. I’d hit the API for 253 regular season, United States-based games – 3 are in London. This is not easy since there is no pre-built datasets. I’d need to hit a secondary ticket market API and store that data somewhere.

API Data Extraction

I decided to use secondary ticket market data from the StubHub API. To stream data, I created a cloud instance of R and Rstudio on an AWS Elastic Cloud Compute instance. FYI: EC2 is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers. In short, it’s a cloud computer. 

Data Storage

Now comes the interesting part. I’m taking data from the API and processing it on EC2. I need to store it somewhere. Since I’m using Amazon already, used its siple files storage system: Simple Storage Service ­– or S3 – provides developers and IT teams with secure, durable, highly-scalable cloud storage.

Data Processing

The data from StubHub is compiled every hour. Its also stored on S3. Using that data I am processing and want to send it to Google Spreadsheets. And technically I was able to do this from R, but it required constant manual authentication for the R-Google handshake every 4 hours. That was a pain, and would require me to wake-up at night so I decided to go a different route.

More Handshakes

Good news. I could save data from EC2/RStudio into S3. From there I could load data from S3 into Google Spreadsheets using the Blockspring API that sends .csv files from S3 to Google. Also with the API I can update data hourly. Bonus. 

Intermission

In case you are wondering here’s whats going on:

screen-shot-2016-09-22-at-11-26-03-pm

I wish this was as easy as I wrote it down to be. But a lot is going on.

The Visualization

screen-shot-2016-09-22-at-11-29-29-pm

There are two different things happening in this visualization. First are the line charts. Second are the stadium shapes. These are not difficult to pull, but take some time to do. Remember data was pulled from StubHub. In this data are IDs for stadium and the corresponding zones. This means if I have the shape files I can plot data as shapes. Here is a stadium for an upcoming game.

screen-shot-2016-09-22-at-11-42-54-pm

Good news. The colors on each of the sections of the stadium are the result of custom polygons in the html code. This means I can download the html from the webpage and extract the polygons from the graphic. I did this using some simple R code. This extracts the svg. That’s the good news.

stadium <-

   paste0(loc, team, '.htm') %>%

   xml2::read_html() %>%

   rvest::html_nodes('.svgcontainer svg path') %>%

   .[1:length(.) - 1]

stadium %>%

   as.character() %>%

   c('<svg>', ., '</svg>') %>%

   stringr::str_replace_all('class=\"st0\"', '') %>%

   writeLines(., file(paste0(loc, team, '.txt')))

 

 

The bad news: Another tool is needed. This just directs html on how to create the polygon, but it doesn’t have the actual points along the polygon ­– which is what I want. This requires a reformat of the data to a file type called .wkt. There are connectors for R to do this, but I didn’t have time to learn it, so I used an online site to make this happen. After I did this I saved the shapes – digestible by Tableau – back to the EC2/Rstudio instance. After that It’s pretty easy to read wkt files into R. This lets me get x and y values and some basic indicies.

 

More bad news: hand coding. The converter didn’t carry some of the underlying data over to the shapes. I couldn’t find a way around other than hand coding 9,900 rows of data. This meant looking at all sections in 32 stadiums. And some of them were very unorganized. But once this is done, I can connect ticket prices back to zones in stadiums.

Send a MMS Text

Prior to the start of this final Iron Viz feeder I joked with a few colleagues my next viz would be able to text a picture of it to you. And about 48 hours prior to submitting my visualization – while my data was still aggregating – I decided I needed to figure out how to do this. My coding skills outside of R are very limited, but there are a number of cool packages available in R that would allow me to send a text message. Here is how I did it:

Twilio is a messaging tool. I can send or receive text or phone calls with the service. Through the service I can make an API call and send the text message. The httr package makes it easy to do. Below is the code I used to send the message:

httr::POST(

paste0('https://api.twilio.com/2010-04-01/Accounts/', 'XXXX', '/Messages.json'),

config=authenticate('XXXXX', 'XXXXX', "basic"),

body = list(

   Body='See which NFL games had the hottest ticket prices in week 2: https://public.tableau.com/profile/stanke#!/vizhome/Whohasthehottestticketsrightnow/HottestNFLTicket',

   From="6122550403",

   Method="GET",

   To=paste0("+1", stringr::str_extract(as.character(isolate(input$phone_number)), "[[:digit:]]+")),

   MediaUrl = "https://XXXXXX.com/dashboard.png",

   encode = 'json'

)

Here I am sending my API a authenticated message in json format. I’m specifying the to number, from number, the text message and image URL. Everything is hard-coded with the exception of the input number – which I’ll talk about in a minute.

But first the image send from the MMS text. To do this I used the webshot package in R. This package requires phatom.js, ImageMagick, and GraphicsMagick. Once those were installed onto my EC2 instance I could just run the code below and it would grab an image of any website – Including the published tableau dashboard.

webshot::webshot(

"https://public.tableau.com/profile/stanke#!/vizhome/Whohasthehottestticketsrightnow/Print",

"dashboard.png",

selector = ".tableau-viz",

delay = 8,

vwidth = '800') %>%

webshot::resize('300%') %>%

webshot::shrink()

Then I just place this onto a webserver using the put_object function from the aws.s3 function:

aws.s3::put_object('dashboard.png')

The code you see above is essential for sending the text. But I still need a user interface to go with the code. I relied on Shiny, which is a web application framework for R. I created an open text box and a submit button which then triggered the code you see above. I wanted an alert to pop-up saying the message was sent, but I didn’t have time since this was less than 6 hours before the deadline for the visualization to be submitted and I still didn’t even know what my visualization was going to look like.

I thought the functionality turned out all right considering I was still learning how to program it just hours before the due date. I just added a bunch of text clarifying how the text worked to the user. It’s overkill and takes away from the experience a bit, but it’s still pretty cool that it works.

 

5 tips for mobile dashboards (that are good for any device)

Designing mobile data tools can be intimidating particularly because we think we don’t have a lot of space to tell the same story we would with other devices. The format of data tools – including dashboards – for phones can appear rather limiting, but that’s just a myth. While it would be nice to be device agnostic – where we ignore the different methods data is now consumed (phones, tablets, desktop, blah, blah, blah) – we just can’t. It’s not best practice. We consume information differently by device so we need to design around each experience. Given the shifting landscape of how we consume information (hint: it’s increasing mobile) we need to developing appropriate data tools now. With that, here are five device agnostic data tool development best practices I follow (but are prioritized because of a mobile design).

Tell a clear and direct, but guided story.

This is my number one rule for any dashboard or data tool. Remember that our tools should answer the initial question asked by our stakeholders. But as we answer the question we should also shine a light on a what is likely a deeper actionable issue. To get to the actionable issue we need to provide context – and this means allowing users to “explore” the data. I use the term explore loosely because we want to give them the feel that they are diving into the data and blazing their own trail, but in reality we have curated the data and we are guiding users through the story. This is approach is similar to the one followed by the authors of  Data Fluency. Don’t be afraid to come up with a storyboard of how you want to guide stakeholders.

Use the least astonishing: Scroll first, then swipe or tap.

Consider the principle of least astonishment: if a necessary feature has a high astonishment factor, it may be necessary to redesign the feature. This means keep it simple and choose what audiences expect, which is usually scrolling down/swiping up. From a storytelling point-of-view when you scroll down/swipe up you keep the story on a single page. This makes going back and re-reading or re-interpreting something a lot easier than swiping or tapping.

When it comes to dropdown menus, use parsimony.

First, try to avoid dropdowns all together. But if you need a number: limit yourself to three dropdown menus. If you can get users to the exact source of data they need, do that. Dropdowns that apply filters or parameters make visualizations complicated. It’s not that dropdowns are bad, they just need to be customized for a mobile device. Affording each dropdown it’s necessary space takes away from the functionality of the data tool. Don’t forget that dropdown menus are going to have low discoverability – meaning you will have to touch the dropdown menu to know its there. One last thing, users like to see all of the options in a dropdown, so consider that space, as well.

Cursors don’t exist so skip hover functionality.

With desktop dashboards and other data tools we often hide additional data with tooltips (that extra stuff that shows up in a small box when we hover or click on something). Sometimes tooltips show up with a hover of a cursor – when you are using a desktop with a mouse, of course, and other times tooltips show up with a click or a tap. Once again, consider discoverability: Since it’s impossible to tell when a visualizations are going to have a tooltips – unless you explicitly state it – its best to avoid them.

Let the data and visualizations breathe.

Keep your visualization organized: Grids are good. Larger fonts are good. Consistent formatting is good. And don’t forget white space is good. Yes. White space. We don’t need to cram things in – but you already knew that. If we can give our audiences some space to process the findings then we don’t need to simplify our data tools to a few basic charts. More complexity should be accompanied with more whitespace!

Just one last thing: designing a mobile format is a blessing. We don’t feel the same obligation to fill an empty space on a dashboard – even if it doesn’t add value. Mobile tools force us to think about what’s important and actionable. This pressure allows us to make better visualizations and better tools, which should – if done right – lead to better outcomes for our stakeholders.

#TableauTorch Mobile Visualization