This post describes how to make an interactive data visualization web app:

The three parts aim to give a concrete example with a full workflow of visualizing data from preprocessing, graphing, to app building.

The case is adapted from how the R Shiny App for the project soccer dashboards was actually created. Due to the nature and the goal of the project, this interactive visualization paid much attention to how different groups of users would use the service. Most tasks concern how to graph categorical data.


The first part deals with the raw data. Data transformation, with a focus on reshaping data structures for visualization. It starts with reshaping data for single graphs, and extends to reshaping for R Shiny App that deals with a lot more repetition and automation.

In this case, the challenges of data preprocessing, or reshaping data, for visualization arise from how the data was collected and generated ( which is beyond the scope of this post, you can however check out THIS TOOL by Ben Torvaney that allows for the collection of sport event data) how the data is structured and stored in ways quite far from what could be ready to use.


Intro

First of all, let’s load all the packages that we will use for this post.

library(readxl)
library(data.table)
library(dplyr)
library(stringr)
library(tidyverse)

sample data

Let’s load the sample data. The data came from manually annotated video data, cleaned and deidentified for the purpose of this demo. Let’s take a brief look at our data.

#Load the data
df <- read_excel("C:/Users/User/Desktop/sportapp/KEN-UG/actions.xlsx")
head(df)
MatchIDPeriodTeamPlayerTypeEventResultXYX2Y2Time
FIFAWCQH1Kenya(H)Daniel SakariINT29180 : 20
FIFAWCQH1Kenya(H)Abdalla Hassanpasspasscomplete361029130 : 22
FIFAWCQH1Kenya(H)Daniel Sakaripasspassincomplete291435210 : 23
FIFAWCQH1Kenya(H)Abdalla HassanchallengeFouledRefStop31130 : 25
FIFAWCQH1Kenya(H)Daniel Sakariset piecefree kickcomplete371421210 : 36
FIFAWCQH1Kenya(H)Eugene Asikepasspasscomplete212425480 : 39

data description

This dataset contains soccer match event data. MatchID represents an id for different matches that are played. A full football match is played in two halves with a break in the middle of the game, Period therefore represents either the 1st or 2nd half/period of the football match. There are two opposing sides who contest a football match called teams, represented by the Team column. The Player column represents a player within a team that normally has 11 (players) at a time. Type represents the event category, while Event is an action descriptor. Result is the outcome of the event. X, Y, X2 and Y2 represent the start and stop coordinate points for an event while, Time is a time stamp for an action that has been annotated from the match video.

Part 1 - Reshaping the Data for Presentation

minutes played

A common metric used to evaluate soccer players is the number of minutes played in competitive matches. For this data, we have a Time column and values in the Event column filter(Event == 'Sub In' | Event == 'Sub Out') that show how long a player has participated in a match.

#filter for period markers
markers.data <- df %>%
  filter(Event == 'start' | Event == 'FT' | Event == 'HT')

#filter for player actions
actions.data <- df %>%
  filter(Player == 'Joseph Okumu')

#bind the dataframes
full.data <- rbind(actions.data, markers.data)

full.data %>%
  mutate(mins_played = if_else(Event == 'Sub Out', Mins, 
                                   if_else(Event == 'Sub In',max(Mins) - Mins, 
                                           if_else(Event != 'Sub Out' | Event != 'Sub In', max(Mins), max(Mins))))) %>%
  filter(case_when(
    Event %in% 'Sub Out' ~ Event != 'FT',
    Event %in% 'Sub In' ~ Event != 'FT', 
    Event != 'Sub Out' ~ Event == 'FT',
    Event != 'Sub In' ~ Event == 'FT')) %>%
  select(mins_played) %>%
  filter(row_number()==1)
mins_played
96

fouls

Another common metric used to evaluate soccer players is the fouls committed.

df %>%
  filter(Player == 'Joseph Okumu') %>%
  filter(Event == 'Foul') %>% 
  select(Event) %>%
  count(Event)
Eventn
Foul2

cards

The discipline of a game can be evaluated by compiling the number of yellow cards and red cards accumulated.

df %>%
  filter(Player == 'Joseph Okumu') %>%
  filter(Result == 'yellow card' | Result == 'red card') %>% 
  select(Result) %>%
  count(Result)
Resultn
yellow card1

shots

Because the main objective of a soccer game is to outscore your opponent, a goal then becomes an immediate metric. From the data collected, there were no goals, However, an common event that leads to a goal is known as a shot, and this is another metric that can beused for evaluation.

df %>%
  filter(Player=='Joseph Okumu') %>%
  filter(Type == 'Shot') %>%
  select(Type, Event) %>%
  group_by(Type, Event) %>%
  count(Event) %>%
  pivot_wider(names_from = Event, values_from = n) %>%
  knitr::kable(align = "l", format = "html", table.attr = "style='width:30%;'") %>%
  kableExtra::kable_styling()
Type

general event summary

A summary of Events can also be preferred on when evaluating sporting performance within a match.

knitr::opts_chunk$set(warning = FALSE, message = FALSE)
df %>% 
  filter(Player == 'Joseph Okumu') %>%
  select(Type, Event, Result) %>% 
  group_by(Type, Event) %>% 
  mutate(Event = fct_recode(Event, `INT/Tkl` = 'Tackle', `INT/Tkl` = 'INT')) %>% 
  mutate(Result = fct_recode(Result, good = 'RefAdv', good = 'won', good = 'sucsessful', good = 'complete', good = 'possession gain', `not good` = 'out of bounds', `not good` = 'incomplete', `not good` = 'possession loss', `not good` = 'save', `not good` = 'RefStop', `not good` = 'block', `not good` = 'unsucsessful', `not good` = 'lost', `good` = 'goal', `not good` = 'yellow card', `not good` = 'red card')) %>% 
  mutate(Result = fct_explicit_na(Result, "not good")) %>% 
  mutate(Result = replace(Result, Type=='Shot', 'good')) %>% 
  mutate(Result = replace(Result, Event=='INT/Tkl', 'good')) %>% 
  count(Result) %>% 
  pivot_wider(names_from = Result, values_from = n) %>% 
  drop_na(Type) %>% 
  mutate(across(where(is.numeric), tidyr::replace_na, 0)) %>% 
  mutate(total = good + `not good`) %>% 
  mutate(`%` = good/total * 100) %>% 
  subset(select = -c(`not good`)) %>% 
  mutate_if(is.numeric, ~round(., 0)) %>% 
  filter(!((Event == 'Fouled') | (Type == 'restart') | (Type == 'Shot') | (Event == 'Foul'))) %>%
  knitr::kable(caption = "Event Summary") %>%
  kableExtra::kable_styling()
Event Summary
Type Event good total %
challenge aerial duel 13 13 100
challenge dribble 3 3 100
challenge loose ball duel 1 2 50
pass pass 32 42 76
set piece free kick 4 8 50

Reshaping the data for R Shiny App asks for a lot more repetition and automation. In this case, repetition and automation mainly comes from the idea that; we want a chunk of codes to automatically produce graphs and tables for the same purpose, and repeatedly for different teams or players. For instance, earlier we may have created a summary of events by player; now we want to plot events by player. Later when selecting the player to evaluate in the Shiny App, we want to be able to dynamically choose the teams or players data to be visualized and presented.

It may sound abstract and vague at the moment, but when we get to Part 3, we will get a more concrete idea of how that works.

Part 2 - Creating the Graphs

For this case, we will use a single visualization. We’ll visualize Events on a soccer pitch using the coordinate data in the X, Y, X2 & Y2 columns.

df %>% filter(Player == 'Joseph Okumu') %>%
  #Filter out events that do not include possession gain/loss
      filter(!(Event == 'Sub Out' | Event == 'Sub In' | Event == 'Fairplay Start' 
               | Event == 'start' | Event == 'kick off' | Event == 'RefStop' | Event == 'fair play' | Event == 'HT' | Event == 'FT' | Event == 'Out Of Scope' )) %>%
      filter(!(Type == 'pass' & Result == 'complete') & !(Event == 'launch' & Result == 'out of bounds') 
             & !(Event == 'throw' & Result == 'complete') & !(Event == 'touch' & Result == 'out of bounds')
             & !(Event == 'touch' & Result == 'out of bounds') & !(Type == 'set piece' & Event == 'corner kick')
             & !(Event == 'launch' & Event == 'possession loss') & !(Event == 'block' & is.na(Event))
             & !(Event == 'touch' & is.na(Result)) & !(Event == 'launch' & is.na(Result))
             & !(Event == 'touch' & Result == 'possession loss') & !(Event == 'pass' & is.na(Result))
             & !(Event == 'block' & Result == 'out of bounds') & !(Event == 'dribble' & is.na(Result))
             & !(Event == 'pass' & Result == 'save') & !(Event == 'launch' & Result == 'block') 
             & !(Event == 'block' & Result == 'possession loss')) %>%
  ggplot(aes(x=X, y=Y, color = Result, shape = Event)) +
      annotate_pitch(fill = 'springgreen4', colour = 'white') +
      geom_point()+
      theme_pitch()
Action Map

Action Map

Part 3 - Creating R shiny App

Now that we are done with the data transformation, and we are familiar with visually presenting data, let’s get started with creating a Shiny App. We will make use of the data objects and plot prototype that we saw in the first two parts.

User Interface Layout

For the shiny app, I wanted to get the quantitative assessments for a player from a specific match, and so I narrowed down my Inputs and Outputs to the following:

inputs

The shiny app shall have three drop-down inputs that will act as filters on the dataset.Each drop down shall filter the data sequentially, with the first filter being byMatchID, followed by Team, then PLayer. This should look like this.



outputs

Once the data has been filtered, we can then use it to present the user with information by using summary tables and plots. Some of the outputs are shown below;

  1. A reduced dynamic table to display player-specific match events that are used to validate the events; this is done by confirming the event timestamp on the dataset and the event timestamp on video
Player Events Table
Type Event Result Time X Y X2 Y2
pass pass complete 0 : 49 31 90 36 64
pass pass complete 1 : 47 21 75 28 95
pass pass complete 2 : 5 33 88 9 60
pass pass complete 2 : 26 26 69 33 97
pass pass complete 2 : 36 29 88 70 93
challenge aerial duel won 4 : 6 30 85 NA NA
  1. A player match summary that can include different types of quantitative information; For this case, we concatenate minutes played, fouls, cards, shots and goals from part 2.

  2. A player Dashboard similar to the Action Summary Table built in part 2

  3. An Action Map, similar to the one built in part 2

For better UI building, I’d suggest checking out shinyuieditor a package by Nick Strayer.

Hosting the App

Once we have built up the Shiny App, we can host it using RStudio’s hosting service on shinyapps.io

To interact with the shiny app for this case, CLICK HERE

Future directions?

  • tabset panel view for different user groups
  • more visualizations for player and team comparisons
  • Interactive action maps