Luke Stanke

Data Science – Analytics – Psychometrics – Applied Statistics

Which NFL tickets are hottest?

Developing a Dynamic Ticket Viz

Tableau Public now supports Google Spreadsheets. I didn’t think of this as anything special. But after I realized cloud-based spreadsheets means dashboards that can be automatically update. I set out on a way to constantly update data in Tableau. Given my limited toolset – mainly R, Tableau, and a handful of AWS tools.

The Data

I decided I wanted to monitor NFL ticket prices on an hourly basis. I’d hit the API for 253 regular season, United States-based games – 3 are in London. This is not easy since there is no pre-built datasets. I’d need to hit a secondary ticket market API and store that data somewhere.

API Data Extraction

I decided to use secondary ticket market data from the StubHub API. To stream data, I created a cloud instance of R and Rstudio on an AWS Elastic Cloud Compute instance. FYI: EC2 is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers. In short, it’s a cloud computer. 

Data Storage

Now comes the interesting part. I’m taking data from the API and processing it on EC2. I need to store it somewhere. Since I’m using Amazon already, used its siple files storage system: Simple Storage Service ­– or S3 – provides developers and IT teams with secure, durable, highly-scalable cloud storage.

Data Processing

The data from StubHub is compiled every hour. Its also stored on S3. Using that data I am processing and want to send it to Google Spreadsheets. And technically I was able to do this from R, but it required constant manual authentication for the R-Google handshake every 4 hours. That was a pain, and would require me to wake-up at night so I decided to go a different route.

More Handshakes

Good news. I could save data from EC2/RStudio into S3. From there I could load data from S3 into Google Spreadsheets using the Blockspring API that sends .csv files from S3 to Google. Also with the API I can update data hourly. Bonus. 

Intermission

In case you are wondering here’s whats going on:

screen-shot-2016-09-22-at-11-26-03-pm

I wish this was as easy as I wrote it down to be. But a lot is going on.

The Visualization

screen-shot-2016-09-22-at-11-29-29-pm

There are two different things happening in this visualization. First are the line charts. Second are the stadium shapes. These are not difficult to pull, but take some time to do. Remember data was pulled from StubHub. In this data are IDs for stadium and the corresponding zones. This means if I have the shape files I can plot data as shapes. Here is a stadium for an upcoming game.

screen-shot-2016-09-22-at-11-42-54-pm

Good news. The colors on each of the sections of the stadium are the result of custom polygons in the html code. This means I can download the html from the webpage and extract the polygons from the graphic. I did this using some simple R code. This extracts the svg. That’s the good news.

stadium <-

   paste0(loc, team, '.htm') %>%

   xml2::read_html() %>%

   rvest::html_nodes('.svgcontainer svg path') %>%

   .[1:length(.) - 1]

stadium %>%

   as.character() %>%

   c('<svg>', ., '</svg>') %>%

   stringr::str_replace_all('class=\"st0\"', '') %>%

   writeLines(., file(paste0(loc, team, '.txt')))

 

 

The bad news: Another tool is needed. This just directs html on how to create the polygon, but it doesn’t have the actual points along the polygon ­– which is what I want. This requires a reformat of the data to a file type called .wkt. There are connectors for R to do this, but I didn’t have time to learn it, so I used an online site to make this happen. After I did this I saved the shapes – digestible by Tableau – back to the EC2/Rstudio instance. After that It’s pretty easy to read wkt files into R. This lets me get x and y values and some basic indicies.

 

More bad news: hand coding. The converter didn’t carry some of the underlying data over to the shapes. I couldn’t find a way around other than hand coding 9,900 rows of data. This meant looking at all sections in 32 stadiums. And some of them were very unorganized. But once this is done, I can connect ticket prices back to zones in stadiums.

Send a MMS Text

Prior to the start of this final Iron Viz feeder I joked with a few colleagues my next viz would be able to text a picture of it to you. And about 48 hours prior to submitting my visualization – while my data was still aggregating – I decided I needed to figure out how to do this. My coding skills outside of R are very limited, but there are a number of cool packages available in R that would allow me to send a text message. Here is how I did it:

Twilio is a messaging tool. I can send or receive text or phone calls with the service. Through the service I can make an API call and send the text message. The httr package makes it easy to do. Below is the code I used to send the message:

httr::POST(

paste0('https://api.twilio.com/2010-04-01/Accounts/', 'XXXX', '/Messages.json'),

config=authenticate('XXXXX', 'XXXXX', "basic"),

body = list(

   Body='See which NFL games had the hottest ticket prices in week 2: https://public.tableau.com/profile/stanke#!/vizhome/Whohasthehottestticketsrightnow/HottestNFLTicket',

   From="6122550403",

   Method="GET",

   To=paste0("+1", stringr::str_extract(as.character(isolate(input$phone_number)), "[[:digit:]]+")),

   MediaUrl = "https://XXXXXX.com/dashboard.png",

   encode = 'json'

)

Here I am sending my API a authenticated message in json format. I’m specifying the to number, from number, the text message and image URL. Everything is hard-coded with the exception of the input number – which I’ll talk about in a minute.

But first the image send from the MMS text. To do this I used the webshot package in R. This package requires phatom.js, ImageMagick, and GraphicsMagick. Once those were installed onto my EC2 instance I could just run the code below and it would grab an image of any website – Including the published tableau dashboard.

webshot::webshot(

"https://public.tableau.com/profile/stanke#!/vizhome/Whohasthehottestticketsrightnow/Print",

"dashboard.png",

selector = ".tableau-viz",

delay = 8,

vwidth = '800') %>%

webshot::resize('300%') %>%

webshot::shrink()

Then I just place this onto a webserver using the put_object function from the aws.s3 function:

aws.s3::put_object('dashboard.png')

The code you see above is essential for sending the text. But I still need a user interface to go with the code. I relied on Shiny, which is a web application framework for R. I created an open text box and a submit button which then triggered the code you see above. I wanted an alert to pop-up saying the message was sent, but I didn’t have time since this was less than 6 hours before the deadline for the visualization to be submitted and I still didn’t even know what my visualization was going to look like.

I thought the functionality turned out all right considering I was still learning how to program it just hours before the due date. I just added a bunch of text clarifying how the text worked to the user. It’s overkill and takes away from the experience a bit, but it’s still pretty cool that it works.

 

5 tips for mobile dashboards (that are good for any device)

Designing mobile data tools can be intimidating particularly because we think we don’t have a lot of space to tell the same story we would with other devices. The format of data tools – including dashboards – for phones can appear rather limiting, but that’s just a myth. While it would be nice to be device agnostic – where we ignore the different methods data is now consumed (phones, tablets, desktop, blah, blah, blah) – we just can’t. It’s not best practice. We consume information differently by device so we need to design around each experience. Given the shifting landscape of how we consume information (hint: it’s increasing mobile) we need to developing appropriate data tools now. With that, here are five device agnostic data tool development best practices I follow (but are prioritized because of a mobile design).

Tell a clear and direct, but guided story.

This is my number one rule for any dashboard or data tool. Remember that our tools should answer the initial question asked by our stakeholders. But as we answer the question we should also shine a light on a what is likely a deeper actionable issue. To get to the actionable issue we need to provide context – and this means allowing users to “explore” the data. I use the term explore loosely because we want to give them the feel that they are diving into the data and blazing their own trail, but in reality we have curated the data and we are guiding users through the story. This is approach is similar to the one followed by the authors of  Data Fluency. Don’t be afraid to come up with a storyboard of how you want to guide stakeholders.

Use the least astonishing: Scroll first, then swipe or tap.

Consider the principle of least astonishment: if a necessary feature has a high astonishment factor, it may be necessary to redesign the feature. This means keep it simple and choose what audiences expect, which is usually scrolling down/swiping up. From a storytelling point-of-view when you scroll down/swipe up you keep the story on a single page. This makes going back and re-reading or re-interpreting something a lot easier than swiping or tapping.

When it comes to dropdown menus, use parsimony.

First, try to avoid dropdowns all together. But if you need a number: limit yourself to three dropdown menus. If you can get users to the exact source of data they need, do that. Dropdowns that apply filters or parameters make visualizations complicated. It’s not that dropdowns are bad, they just need to be customized for a mobile device. Affording each dropdown it’s necessary space takes away from the functionality of the data tool. Don’t forget that dropdown menus are going to have low discoverability – meaning you will have to touch the dropdown menu to know its there. One last thing, users like to see all of the options in a dropdown, so consider that space, as well.

Cursors don’t exist so skip hover functionality.

With desktop dashboards and other data tools we often hide additional data with tooltips (that extra stuff that shows up in a small box when we hover or click on something). Sometimes tooltips show up with a hover of a cursor – when you are using a desktop with a mouse, of course, and other times tooltips show up with a click or a tap. Once again, consider discoverability: Since it’s impossible to tell when a visualizations are going to have a tooltips – unless you explicitly state it – its best to avoid them.

Let the data and visualizations breathe.

Keep your visualization organized: Grids are good. Larger fonts are good. Consistent formatting is good. And don’t forget white space is good. Yes. White space. We don’t need to cram things in – but you already knew that. If we can give our audiences some space to process the findings then we don’t need to simplify our data tools to a few basic charts. More complexity should be accompanied with more whitespace!

Just one last thing: designing a mobile format is a blessing. We don’t feel the same obligation to fill an empty space on a dashboard – even if it doesn’t add value. Mobile tools force us to think about what’s important and actionable. This pressure allows us to make better visualizations and better tools, which should – if done right – lead to better outcomes for our stakeholders.

#TableauTorch Mobile Visualization

Most Downloaded R Packages of 2016

I was curious what packages are the most downloaded from CRAN. A quick google search of “most downloaded R packages 2016” produces outdated articles (see here, here, and here). Luckily the Rstudio CRAN keeps logs of the number of users downloading packages each day. With a few lines of code the data downloaded and aggregated. This data was then placed in Tableau and packages were given weekly ranks. Here are the results:

Considering the Possiblities: The Story Behind My IronViz Submission

Getting Started: The data

When developing my visualization for the IronViz competition I wasn’t sure where to begin. I thought that I needed a good data that would stand out from all other competitors. From early submissions, I saw a lot of competitors using similar datasets—I’m no different in this regard, but I think my process to get to this conclusion was much different than other competitors. The data I decided to use was more a product of my design philosophy for this competition than being able to find a unique, distinguishing dataset. Actually, I considered a number really interesting ideas—all which I possessed at some point during this month, but I decided to go a different route altogether.

Originally, I really focused on trying to find a very unique data set. I scraped data from the webz to find the locations of major grocery store locations and then determined  the distance from the centroid of every census track to the nearest grocery store. The plan was to look for trends in this data and make a color-coded clickbait map. I was actually almost done with the proof-of-concept when I stumbled upon a different dataset. As a Blue Apron user, I considered scraping recipe ratings and gathering insights about the best and worst meals. Finally I gathered geodata on food deserts in the United States at the census tract level and identifying patterns between food deserts and the characteristics of people who lived in these places. I decided to save this a future project, as well.

It bothered me about all this is I was finding a really cool data sets that would be very interesting in general, but I wasn’t really showcase seen the quality of tableau desktop. In one last search, I’ve came across a National Geographic visualization about food consumption. This I thought was really interesting and I wanted to see the underlying data. Luckily, National Geographic provided a link to FAOSTAT, a site that contains information about about food consumption around the world. I really wanted to data about overall foot habits in my data tool but my kept focusing and wondering about world meat consumption. I know that meat consumption was a big issue—as populations develop more more meat is consumed by population and generally speaking meat has a large impact on the environment. It also fit nicely with the theme #foodtipsmonth, so meat consumption became the focus of my visualization as did showcase the possibilities of data visualization in Tableau.

Tableau: The Focus of the Competition

Because it’s in written form, I feel like the context of this next section might be missed. Tableau is a great tool, but in some areas it falls short. In this next section, I discuss what I feel are some of the shortfalls of Tableau.

Story Points. When developing this visualization—this data tool—I really wanted to tell the story of meat consumption across the world, the differences in meat consumption by major world regions, and the impact it has on the environment. I also wanted to highlight some features of Tableau that I wish were present, but, generally, are still not possible without a lot of work. I tried really had to try to hack as much of the data tool as I could, I really think that this speaks more about the competition then a really interesting data set.

The final design I came up with reminds me of my first ever encounter with Tableau. I wanted to make a dynamic report combining both text and data-based graphics—because I believe that we should go beyond just a dashboard, we as developers should really should create a tool with some context. And other experts like Stephanie Evergreen agree.  And when we often create dashboards with lots of filters on the right part of a screen and and multiple visuals scattered on a single page, the story within the data gets lost. I wanted to go back to the style of my first project in Tableau—provide a mini report and interesting graphics that enhance the story.

I really wanted to the use the Story feature in Tableau. I’ve used it once before, but I found the functionality rather limited. Story points uses a lot of screen real estate. The borders are too large. The captions take up too much space. And what if I wanted to put the caption at the bottom of the page? Or what if I didn’t want captions altogether? I see the concept as perfect for both mobile design and as a best practice for focusing users. I think it’d be really cool if I used it in mobile and when I swipe left or right the story points moves to the next card. I see all of these as opportunities to highlight in this competition. When I saw a tweet from Jewel Loree on how others use story points, it was the tipping point.

Interactive Text. I think in part because of my design philosophy, I think copy (text) is a big an integral part of data telling the story. It’s always been my desire to have text that interactive and linked to charts in Tableau. For instance if the text focuses on a specific subset of the data, and this is something that is color-coded in a graphic, I might want to highlight that information by hovering my mouse over the text. When I do this, it highlights a specific part of the chart. This way the graphic and the copy are linked and interactive. We often see this right now in some of the HTML/javascript/D3.js designs on the New York Times, Wall Street Journal and other major online news sources. With my data tool I developed here, I tried to create interactive text by embedding visualizations within a text to show exactly what I want. It’s not perfect, but I think the concept is there.

R Server. Finally, it’d be great to showcase R’s capabilities in Tableau Public, but it’s not available so all the cool stuff I do in R has to be done ahead of time. This isn’t a big deal, but it’d be nice to showcase the possibilities. I use R A LOT. Often if I want to do some cool graphic in Tableau I prep the data ahead of time in R. This is no different. But it’d be great to be able to use it on Public. Here is 1/3 of the code I used for the IronViz challenge

meat_summary %

##  Select only meat data from 1992 to 2011
  dplyr::filter(
    stringr::str_detect(Item, 'meat') | stringr::str_detect(Item, 'Meat') | stringr::str_detect(Item, 'Seafood'),
    Year <= 2011, Year >= 1992,
    Item != 'Meat'
  ) %>%

##  Edit variables
  dplyr::mutate(
    Item = ifelse(Item == 'Meat, Aquatic Mammals', 'Meat, Other', Item),
    Region = ifelse(Country == 'World', 'World', Region)
  ) %>%

##  Select specific columns
  dplyr::select(Region, Country, Item, Year, Value, Population, Total) %>%

##  Set aggregation level
  dplyr::group_by(Region, Country, Item, Year, Population) %>%

##  Create aggregated variables
  dplyr::summarise(
    Value = sum(Value),
    Total = sum(Total)
  ) %>%

##  Set new aggregation levels
  dplyr::ungroup() %>%
  dplyr::group_by(Region, Year, Item) %>%

##  Create aggregated variables
  dplyr::summarise(
    `Value Wt` = sum(Value*Population)/sum(Population),
    Total = sum(Total),
    Population = sum(Population)
  ) %>%
  dplyr::ungroup() %>%

##  Left join meat_id -- for sorting meats
  dplyr::left_join(meat_id) %>%

##  Order rows in data frame
  dplyr::arrange(Region, Year, meat_id) %>%

##  Set aggregation levels
  dplyr::group_by(Region, Year) %>%

##  Get cumulative sum of varibles (for stacked bar charts)
  dplyr::mutate(
    ValueStacked = cumsum(`Value Wt`),
    TotalStacked = cumsum(Total)
  ) %>%

##  Set aggregation levels
  dplyr::ungroup() %>%
  dplyr::group_by(Region, Year) %>%

##  Create steam chart data.
  dplyr::mutate(
    ValueSteam = ValueStacked - (max(ValueStacked)/2),
    TotalSteam = TotalStacked - (max(TotalStacked)/2)
  )

Other Design Considerations

Mobile or Desktop? I struggled with the layout. Should it be for mobile? Or should it be a larger tool made for a desktop? When I think about the usage of dashboards and tools in the way that most information is consumed today, I think mobile design. First, the constraint forces the designer to create a tool that clearly and concisely tells the story at hand. Second, designers are forced to create clean, relatively simple data graphics,—whether it is a bubble chart, a line graph, or some other way of displaying data. When desktop tools are utilized, we data tool designers are more likely to make some lazy decisions and might not think about the full data tool experience.

Plus, a competitive event like the IronViz Feeder necessitates a zig while the competition zags.

Scrolling vs. Drilling down. When it comes to mobile design, you basically have two options, scroll or drill down. I’d normally design for scrolling. I really think thats just what people like to do. Look at most websites today, most sites have infinite scrolling to keep the user engaged. The only reason I chose to go the drill down route was to show off how I’d like to see story points look in the future.

Is seafood a meat? According to FAOSTAT: no. But if I talked to a vegetarian, would they call it meat: maybe, but generally yes. When creating the tool, I had to decide whether or not to include seafood as a meat. Clearly, overconsumption of seafood, particularly fish, is an issue. So in the end I did include seafood and fish. Most other articles out there don’t consider seafood meat, so my story is slightly different.

Bacon Charts. I was playing around with the data and I made a mistake in R doing the calculations, but I liked the end result, I thought I generally resembled bacon. I added a cosine function in Tableau and added a wave to it and came up with a food-themed bacon chart. With the case of the chart, the whitespace in the bacon chart represents an earlier time where meat consumption was lower. It also looks like like bacon. MMmmmm, bacon. See, sometimes mistakes can be good. Bacon.

Testing for multiple browsers. I use Tableau Desktop on a Mac and I check my work on Tableau Public using Chrome. I always worry about rendering—fonts in particular. They tend to render differently on different systems, so when doing my work I also check to see how things render in Internet Explorer on a PC. Doing the quality assurance on this project was not fun, I kept finding that my interactive text boxes were not aligning perfectly in Internet Explorer like they where in Chrome. This meant lots of tweaks. Over a few hours I worked out all of the minutiae.

Visualize an .png in Tableau

I’ve always thought it was really awesome how some people could replicate images in Tableau. I thought I would give it a shot. I figured it’d take a long time to figure out, but it turns out it’s pretty easy to get key underlying data in R. After doing that you can quickly plot that information in Tableau.

So let’s start with a photo of downtown Minneapolis:

bigphotoforminneapolis

Well, I want to reduce the resolution of the photo, so I used pixlr to change the width of the photo to 200 pixels.

Once we do this we can load this photo into R. We can do that using the readPNG function which is part of the png package.

image <- 
  png::readPNG('.../mpls.png')

This produces an array that is height-by-width-by-array (RGB) in size. With this information I can edit the shape of the file.

###  Create empty space for the image
image2 <-
  data.frame(
    matrix(
      NA,
      nrow = dim(image)[1]*dim(image)[2],
      ncol = 5
    )
  ) %>%
  dplyr::as_data_frame() %>%
  dplyr::rename(
    row = X1,
    column = X2,
    red = X3,
    green = X4,
    blue = X5
  ) %>%
  dplyr::mutate(
    row = image %>% dim() %>% .[1] %>% seq(1, .) %>% rep(image %>% dim() %>% .[2]),
    column = image %>% dim() %>% .[2] %>% seq(1, .) %>% rep(each = (image %>% dim() %>% .[1]))
  )

### Fill in the data 
for(w in (image %>% dim() %>% .[2]) %>% seq(1, .)) {
  for(h in (image %>% dim() %>% .[1]) %>% seq(1, .)) {
    image2$red[image2$column == w & image2$row == h] <- image[h, w, 1]
    image2$green[image2$column == w & image2$row == h] <- image[h, w, 2]
    image2$blue[image2$column == w & image2$row == h] <- image[h, w, 3]
  }
}

Now let's add in the hexcolor as another column. The code I am writing allows for up to 27 levels of color. We'll also save the output

image2 %>%
  dplyr::mutate(
    red = round(red * 3, 0)/3,
    green = round(green * 3, 0)/3,
    blue = round(blue * 3, 0)/3,
    color = rgb(red, green, blue, maxColorValue = 1)
  ) %>%
  readr::write_csv('.../mpls.csv')

Now we can load this data into Tableau. Once we load it in, lets add the columns and rows into column and row pills as dimensions.

Screen Shot 2016-03-14 at 2.02.56 PM

Unfortunately, my photo is upside down. There a few fixes for this, the easiest is to reverse the row axis.

Screen Shot 2016-03-14 at 2.05.02 PM

Now the color of each point isn't what I am looking for, exactly, I am going to create some custom palettes that include all of the colors that are currently in the visualization. Here you can see all the colors in the graphic:

Screen Shot 2016-03-14 at 2.09.25 PM

The list totals 27 colors. Since a custom palette contains 16 colors, I'm going to make 2 palettes for the colors.
If you haven't ever made a custom palette, read this document.

If we create the colors in the same order they are sorted in the image above, the colors will be aligned from top to bottom and it makes it easy to load the correct color to the correct shape. The color change results in this viz:

Screen Shot 2016-03-14 at 2.24.46 PM

We've essentially done it! But now I want to edit the skyline so that it changes color based on the time of the day. The color of the sky is #AAAAFF. When it's mid-day, I want it to be bright blue, and when it's night I want it to be a dark blue. To do this. I need to make a dual axis chart. One chart includes all dots in the part of the sky, and another for all of the buildings.

To do this, I need to create two variables that will help sort things out later: Sky? and Exclude Sky. Both have the same formula:

IF [Color] = '#AAAAFF'
THEN 'Yes'
ELSE 'No'
END

I'm going to change the shape type for both charts to be based off of each of these variables. Then I am going to hide the sky one one viz, the buildings on the other, and then create a dual axis chart with synchronized axes.

gif create dual axis

Now lets create the sky color. I want it to start getting dark by 7PM and to be fully dark by 9PM. I also want it to start getting light by 7AM and be fully light by 9AM. I'm going to use the logit function to make this happen. Here's the function for Sky Color.

IF DATEPART("hour", NOW()) >= 12
THEN EXP((DATEPART("hour", NOW()) - 7.5)/2)/(1 + (EXP((DATEPART("hour", NOW()) - 7.5)/2)))
ELSE EXP((19.5 - DATEPART("hour", NOW()))/2)/(1 + (EXP((19.5 - DATEPART("hour", NOW()))/2)))
END

I'll make this as the color of my sky on the Sky portion of this viz. But let's make the lights show up when it's dark by creating a Modified Color variable. Here's the code:

IF DATEPART('hour', NOW()) < 7 OR DATEPART('hour', NOW()) >= 19
THEN IF [Color] = '#AAAAAA'
    THEN '#FFFFAA'
    ELSE [Color]
    END
ELSE [Color]
END

Now I'll replace the [color] variable on the color option.

skychange color

Now I'll put this in a dashboard and publish it:

Target Store Locations with rvest and ggmap

I just finished developing a presentation for Target Analytics Network showcasing geospatial and mapping tools in R . I decided to use Target store locations as part of a case study in the presentation. The problem: I didn’t have any store location data, so I needed to get it from somewhere off the web. Since there some great tools in R to get this information, mainly rvest for scraping and ggmap for geocoding, it wasn’t a problem. Instead of just doing the work, I thought I should share what this process looks like:

First, we can go to the target website and find stores broken down by state.

Screen Shot 2016-02-15 at 4.14.41 PM

After finding this information, we can use the rvest package to scrape the information. The URL is so nicely formatted that you can easily grab any state if you know the state’s mailing code.

# Set the URL to borrow the data.
TargetURL <- paste0('http://www.target.com/store-locator/state-result?stateCode=', state)

Now we can set a state — Minnesota’s mailing code is MN.

# Set the state.
state <- 'MN'

Now that we have the URL, let’s grab the html from the webpage.

# Download the webpage.
TargetWebpage <-
  TargetURL %>%
  xml2::read_html()

Now we have to find the location of the table in the html code.

Screen Shot 2016-02-15 at 4.15.46 PM

Once we have found the html table, there are a number of ways we could extract from this location. I like to copy the the XPath location. It’s a bit lazy, but for the purpose of this exercise it makes life easy.

Once we have the XPath location, it’s easy to exact the table from the Target’s webpage. First we can pipe the html through the html_nodes function, this will isolate the html responsible for creating the store locations table. After that we can use the html_table to parse the html table into an R list. Let’s then use the data.frame function to take the list to a data frame and use the select function from the dplyr library to select specific variables. The problem with extracting the data is that the city, state, and zip code are in one column. Well its not really a problem for this exercise, but its maybe the perfectionist in me. Let’s use the separate function in the tidyr library to make city, state, and zipcode their own columns.

# Get all of the store locations.
TargetStores <-
  TargetWebpage %>%
  rvest::html_nodes(xpath = '//*[@id="stateresultstable"]/table') %>%
  rvest::html_table() %>%
  data.frame() %>%
  dplyr::select(`Store Name` = Store.Name, Address, `City/State/ZIP` = City.State.ZIP) %>%
  tidyr::separate(`City/State/ZIP`, into = c('City', 'Zipcode'), sep = paste0(', ', state)) %>%
  dplyr::mutate(State = state) %>%
  dplyr::as_data_frame()

Let’s get the coordinates for these stores; we can pass each store’s address through the geocode function which obtains the information from the Google Maps API — you can only geocode up to 2500 locations per day for free using the Google API.

# Geocode each store
TargetStores %<>%
  dplyr::bind_cols(
    ggmap::geocode(
      paste0(
        TargetStores$`Store Name`, ', ',
        TargetStores$Address, ', ',
        TargetStores$City, ', ',
        TargetStores$State, ', ',
        TargetStores$Zipcode
      ),
      output = 'latlon',
      source = 'google'
    )
  )

Now that we have the data, let’s plot. In order to plot this data, we need to put it in a spatial data frame — we can do this using the SpatialPointsDataFrame and CRS functions from the sp package. We need to specify the coordinates, the underlying data, and the projections

# Make a spatial data frame
TargetStores <-
  sp::SpatialPointsDataFrame(
    coords = TargetStores %>% dplyr::select(lon, lat) %>% data.frame,
    data = TargetStores %>% data.frame,
    proj4string = sp::CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")
  )

Now that we have a spatial data frame, we can plot these points — I’m going to plot some other spatial data frames to make add context for the Target store point data.

# Plot Target in Minnesota
plot(mnCounties, col = '#EAF6AE', lwd = .4, border = '#BEBF92', bg = '#F5FBDA')
plot(mnRoads, col = 'darkorange', lwd = .5, add = TRUE)
plot(mnRoads2, col = 'darkorange', lwd = .15, add = TRUE)
plot(mnRivers, lwd = .6, add = TRUE, col = '#13BACC')
plot(mnLakes, border = '#13BACC', lwd = .2, col = '#EAF6F9', add = TRUE)
plot(TargetStores, add = TRUE, col = scales::alpha('#E51836', .8), pch = 20, cex = .6)

Target Locations in Minnesota

Yes! We’ve done it. We’ve plotted Target stores in Minnesota. That’s cool and all, but really we haven’t done much with the data we just obtained. Stay tuned for the next post to see what else we can do with this data.

UPDATE: David Radcliffe of the Twin Cities R User group presented something similar using Walmart stores.