Luke Stanke

Data Science – Analytics – Psychometrics – Applied Statistics

Who should be the NFL MVP

It’s the end of the National Football League regular season tomorrow. With that, comes talk about who the best player — the MVP — will be. Since 2000, the MVP award has been won by 11 times by quarterbacks and 4 times by running backs. In fact, since the award has been given out only two defensive players — Alan Page in 1971 and Lawrence Taylor in 1986 have ever won MVP. The award has been given out to a special teams player just once — to Mark Moseley in the strike-shortened 1982 season. His recognition is a bit of an oddity, and some suggest he might not have even been the best kicker that year.

It’s easy to suggest what Most Valuable means — a player that has contributed the most to his team’s success — but it’s a lot harder to quantify that. It’s what smart statisticians that work for big sports companies work on all the time and think about with more data than I have access to, but I’m going to give it a shot. Using play-by-play data from Armchair Analysis, I created a model that gives the chances of winning of a game given the context of the games — things like time to play in the game, down, distance to go, distance from goal, and the difference in score.

Using this information I could then figure out what the chances of team are to win at any given point in the game. Take this situation: Let’s pretend Aaron Rodgers has the ball on his opponents 40 yard line. It’s 2nd and 7 yards to go with 7:50 remaining in the game and the Packers are down by four. In this situation, Rodgers and the Packers have a 45.92% chance to win the game. On the next play he is forced to run, and he gains 34 yards. The new odds of winning the game for the packers are 50.02%. This mean Rodgers contributed .051 points to his team’s chances of winning. We can figure this out for every player on every play. And then we can add it all up to figure out who’s the MVP.

So who is the MVP?

NFL Box Scores

I used to go to an awesome site to get important/cool data for playing fantasy football — Advanced Football Analytics — but it was bought by ESPN and the secret sauce/important information was no longer available to me. I had a few options: 1) cry like a baby; 2) find a different site; 3) find the data and develop my own analytics. I choose to create my own thing. I’ve developed a boxscore data tool in Tableau. It’s still early on but here’s what it looks like so far:

2014 Defenses or now it makes sense why the Packers fell apart in the NFC Championship

After creating charts for the outcomes of offensive possessions by series, I decided to replicate the charts for a team’s defense.

Here’s the premise: the success of a team depends on offense, defense, and special teams — wow, no surprise — and sometimes the strength varies by quarter — again, wow and sarcasm. I decided to chart what happened by quarter based on four possible outcomes.

The outcomes for defenses are:

  • Giving up touchdowns
  • Allowing field goal attempts — regardless of the outcome a team put themselves in position to score
  • Forcing punts
  • Creating turnovers — either by fumble recovery, interception, or turnover on downs.

I charted this information by quarter and made the bars bigger for quarters where a team was on the field more often. In the case of the charts you will see below, wider bars means teams experienced more possessions during that quarter.

Here is the Green Bay Packers defensive chart for 2014:

[image width=”360″ height
=”300″ lightbox=”yes” src=””]

If you are like me, you noticed that 40% of possessions in the 4th quarter against the Packers defense ended as touchdowns — more than double the rate of any other quarter. After seeing this chart, and reviewing the offensive possession chart for the Packers, the outcome of the NFC championship has some context. The outcome of the five 4th quarter and overtime drives for the Seahawks offense: Punt, Interception, Touchdown, Touchdown, Touchdown. If we make a lot of assumptions about the numbers, then the odds of 3 touchdowns happening over 5 possessions is about once every 9(!!!!) games against the Packers. If we also take into account the offensive performance, a similar type of implosion would occur once every 30 games! When the team fell apart last year, I thought the odds of that happening had to be one-in-a-million, but the were much, much better than expected!

Interesting narratives for other teams:

  • Indianapolis could produce turnovers more often in the 4th quarter, but still gave up a lot of touchdowns.
  • Arizona’s defense was great, but got better each quarter of the game.
  • Miami gave up in the 4th quarter — well documented and backed by the numbers.

Here are the charts defensive charts for all other teams:

[image width=”360″ height
=”300″ lightbox=”yes” src=””]

And a close-up fore each team’s defense (left-right arrows for smoother transitions):


The 2014 Green Bay Packers Were A First Half Team: Offensive Possessions Results By Quarter

“the [Green Bay] Packers are a second half team”

That was the text message I got from a friend at halftime of a recent game between Green Bay and Seattle. I disagreed.

I did some web searching, and found the answer at The story was clear — Green Bay scored more in the first half. But I wanted more — were more points scored because they had the ball more frequently? How efficient were they at scoring touchdowns? The work at inspired me to come with visualizations to that answered my questions — which I was able to develop after downloading 2014 box score data.

For my first iteration I want to know how do drives end — how often do drives result in touchdowns? And I want to know that for each quarter. And for each team.

The end of drives fall into four categories:

  • Touchdowns – The most successful outcome of being on offense, putting six points on the board
  • Field Goal Attempts – The offense put the team in a position to get three points. It’s important to note that an offense is not penalized for a missed field goal — that’s a special teams issue not an offense issue. Also, a good defense might help with field position, but that’s not a factor right now. Maybe I’ll adjust for that in future iterations.
  • Punts – A sign of ineffective/unsuccessful offense.
  • Turnovers – This includes fumble recoveries, interceptions, turnover on downs, and safeties. Statistically, these are worse than a punt, so they get their own category. I considered defensive scores from turnovers as another level. But those rarely occur by statistical standards. And a score after a turnover is really a product of special teams, not offensive quality.

The graphic I settled on was inspired the Statistical Atlas work by Nathan Yau and housed at his Flowing Data website. As you can see from the links, there are some similarities.

[image width=”360″ height
=”300″ lightbox=”yes” src=””]

The height of the graphic is represents the percent of drives resulting in a touchdown, field goal attempt, punt or turnover — totaling 100%. Each outcome is broken up by quarter, which differ by color gradient. The width of each quarter varies by the number of based on total possessions — more possessions means wider bars. In the Green Bay plot, the Packers had fewer possessions in the first and third quarter — this means narrower bars.

What resulted — a clear story that the Packers were most effective at scoring touchdowns earlier in the game.

After putting this together, I decided I should do it for every team — Each row describes teams in a division. The teams are alphabetically ordered. Remember this is 2014 NFL data.

[image width=”300″ height
=”360″ lightbox=”yes” src=””]

Or for a closer look at individual teams — again, ordered alphabetically within each division:


MPS Unveils Behavior Dashboard

Just an update on my work:

Over the last year, I’ve spent significant time developing a behavior data tool for the principals of Minneapolis Public Schools. We developed the tool because it was one a of a few areas where MPS principals could use feedback from the data (not because we were federally mandated to do it — although other people in the district might think this). A bunch of people had been inputting this data, these were mostly school’s Behavior Deans and Assistant Principals. Almost nobody was reflecting on that information, so we thought a behavior tool was a prime opportunity for reflection and a change in practices.

The data was on everyone’s fingertips, so the district decided to publish parts of it on it’s website.  It was even covered in the Southwest Journal.

Now the public has access to some of the data, but I believe that we should reveal even more information about the district to the public.

ESPN: Green Bay Packers 1-on-1 Blocking Drills

Over the past four years I’ve worked with Rob Demovsky of ESPN analyzing the Green Bay Packers one-on-one offensive/defensive line drill. We’ve found it to be a fantastic predictor of making the team. I’m going to post some more detailed analysis in the future, but in the meantime here is the article: Packers 1-v-1 Blocking Data.

Reflections on Behavioral Referral Data Tool

Last fall, I spent significant time working on a data tool for schools that gathered information from many different sources. To create the tool, I had to develop reproducible code to gather information from our student information system,  develop an intermediate data warehouse, and then have the data tool grab information from the intermediate warehouse, and have it update every night. Developing just the warehouse took over a month, and that was the easy part.

I think the hardest part of developing any data tool is making sure it drives stakeholders to action. That means moving beyond group-level summary statistics. Side note: Admittedly, below is a screen shot of summary information, but the meat of data tool contains information that is private and much more actionable. Getting to action means not thinking like a data analyst, but in this case thinking like a school principal, administrator, and teacher — If I was a principal, what information would I need to decrease referrals.

Our team’s work lead to two answers: First, what is the collective context of referrals? Knowing the who, what, where, why, and when could get principals to know how students are being referred. Schools could then develop school-level plans and/or student-level plans. So the first question gets at decreasing referrals, but doesn’t answer questions about actions. Second, what happens after a student is referred? The decisions made about what do with students after a referral by school leaders can significantly affect the amount of instruction missed and opportunities to learn. Giving data back to school leaders regarding their disciplinary actions of students gives the staff a chance to reflect and improve their practices.

To learn more, contact me at

Thoughts on Standardized Testing

Let’s face it, standardized testing has been a white hot topic since the adoption of No Child Left Behind. There have been many voices expressing distaste for testing, even questioning the validity of standardized tests in Minnesota all-together or blaming the standardized test for diluting the quality of education provided to students. This blame on end-of-year comprehensive assessments seems unwarranted.

The Usual Suspects

Perhaps the most common complaint of standardized testing is that it promotes “teaching to the test”. An argument can be made against this statement. Tests are designed to measure student’s subject knowledge on a particular K-12 academic standard as determined by the state. Therefore, teachers are teaching to the academic standards. While it might be frustrating for parents to hear about what likely won’t be taught, solace should be taken in the clear delineation of the important skills that all students should be taught during a school year.

Complaints against standardized tests for not testing critical thinking are not likely to hold up, as well. While there are other routes for assessing critical thinking skills, vastresearch has been completed to determine techniques for writing critical thinkingmultiple-choice test questions. For a few fun non-verbal examples, search the web for “Raven’s Progressive Matrices”.

Limitations with Standardized Testing

There are limits to standardized testing. Test length and cost constrain understanding of student learning, but test developers argue that diminishing returns occur with longer tests. Content in some subjects is hard to test, especially those that require specific skills, such as writing or scientific thinking. Short answer and essay questions could be a viable alternative.

The use, cost, and benefits of short-answer and/or essay test questions are laid out nicely in a summary by Dr. Samuel Livingston, a distinguished scholar working at Educational Testing Services. Constructed response test questions can measure skills that multiple-choice questions cannot, but when scores on the two types of questions are highly related, Dr. Livingston (and others) recommends the use of multiple-choice questions.

Additional factors related to constructed response questions also are weighed: short-answer and essay questions take longer to administer; are more expensive to score; and can be difficult to score because of differences of opinion on such question types.

Prometric, a subsidiary of Educational Testing Services, surveyed 82 test development groups, of those groups 22% used short-answer or essay questions. About 2 of 5 companies used only multiple-choice question, but over half of the groups would like to use additional question types. The top three limitations: cost, technology, and time.

While more affordable than short-answer and essay test questions, multiple choice questions are not cheap. The Handbook of Test Development, considered the go-to book for test developers, states that multiple-choice questions cost between $300 and $1,000 per question to develop. Meaning millions of dollars have been poured into the development of test questions alone. The reason for the high cost is vetting process a test question must endure.

A single question will be reviewed by more than a dozen active educators to determine if the content is appropriate and aligned to standards. The question format will also be scrutinized to assure that it conforms to best practices. A question will also be reviewed by a diverse panel to determine if cultural bias exists in the format of the question. A question might cycle through this process several times or be dropped along the way. A question that makes the cut is then embedded within a test, but not counted towards the final score. The question is then analyzed by test developers to determine if the question is reliable, fair, and valid.

Other Arguments Against Testing

To keep cost down, large testing companies are developing software to automatically score constructed response tests. This is often another complaint of against standardized testing, but it is typically used in tandem with human raters. In the near future, improved artificial intelligence technologies could make short-answer or essay questions viable on state assessments.

Some believe standardized testing is not meant to be high-stakes for teachers. Most experts agree with this, including Minnesota policy makers. In fact, the State of Minnesota weighs only 35% of it’s pilot teacher evaluation system to test scores. Also included are teacher practice ratings consisting of four domains: planning, instruction, environment, and professionalism; and results from a student engagement survey.

Accountability & Transparency

Standardized tests are meant to hold educators accountable to the same standard. Without it, all districts, schools and teachers could produce favorable results. In this way, standardized tests set a bar in which all schools are held accountable.

When these scores are published publically, it puts pressure on districts and schools to perform to the standard. This is tough for historically poor performing schools, but with metrics that also measure student growth from year-to-year, more vivid descriptions and definitions of success schools are available. These measures can also identify schools that are closing or promoting the achievement gap.

Historically, research has shown that tests have had a positive effect on student achievement and are especially useful when feedback is provided. But National Research Council’s study on testing suggests that the testing policies have contributed no negative effects and, in some cases, small improvements in educational outcomes above-and-beyond all other features in the current educational landscape. Even distain for testing seems to be a myth, as several polls have shown that parents and teacherssupport testing.

While standardized testing can still be improved, there is no reason hold multiple-choice tests culpable for educational inadequacies. Standardized tests minimize bias and put all students on the same metric. Test developers are measuring higher-order thinking of state standards with the resources available. And with schools completing standardized tests across Minnesota, all educators are held accountable for progressing student achievement.