This post describes how to make an interactive data visualization web
app:
The case is adapted from how the R Shiny App for the project soccer dashboards was actually created. Due to the nature and the goal of the project, this interactive visualization paid much attention to how different groups of users would use the service. Most tasks concern how to graph categorical data.
The first part deals with the raw data. Data transformation, with a focus on reshaping data structures for visualization. It starts with reshaping data for single graphs, and extends to reshaping for R Shiny App that deals with a lot more repetition and automation.
In this case, the challenges of data preprocessing, or reshaping data, for visualization arise from how the data was collected and generated ( which is beyond the scope of this post, you can however check out THIS TOOL by Ben Torvaney that allows for the collection of sport event data) how the data is structured and stored in ways quite far from what could be ready to use.
First of all, let’s load all the packages that we will use for this post.
library(readxl)
library(data.table)
library(dplyr)
library(stringr)
library(tidyverse)
Let’s load the sample data. The data came from manually annotated video data, cleaned and deidentified for the purpose of this demo. Let’s take a brief look at our data.
#Load the data
df <- read_excel("C:/Users/User/Desktop/sportapp/KEN-UG/actions.xlsx")
head(df)
MatchID | Period | Team | Player | Type | Event | Result | X | Y | X2 | Y2 | Time |
---|---|---|---|---|---|---|---|---|---|---|---|
FIFAWCQ | H1 | Kenya(H) | Daniel Sakari | INT | 29 | 18 | 0 : 20 | ||||
FIFAWCQ | H1 | Kenya(H) | Abdalla Hassan | pass | pass | complete | 36 | 10 | 29 | 13 | 0 : 22 |
FIFAWCQ | H1 | Kenya(H) | Daniel Sakari | pass | pass | incomplete | 29 | 14 | 35 | 21 | 0 : 23 |
FIFAWCQ | H1 | Kenya(H) | Abdalla Hassan | challenge | Fouled | RefStop | 31 | 13 | 0 : 25 | ||
FIFAWCQ | H1 | Kenya(H) | Daniel Sakari | set piece | free kick | complete | 37 | 14 | 21 | 21 | 0 : 36 |
FIFAWCQ | H1 | Kenya(H) | Eugene Asike | pass | pass | complete | 21 | 24 | 25 | 48 | 0 : 39 |
This dataset contains soccer match event data. MatchID
represents an id for different matches that are played. A full football
match is played in two halves with a break in the middle of the game,
Period
therefore represents either the 1st or
2nd half/period of the football match. There are two opposing
sides who contest a football match called teams, represented by the
Team
column. The Player
column represents a
player within a team that normally has 11 (players) at a time.
Type
represents the event category, while
Event
is an action descriptor. Result
is the
outcome of the event. X
, Y
, X2
and Y2
represent the start and stop coordinate points for
an event while, Time
is a time stamp for an action that has
been annotated from the match video.
A common metric used to evaluate soccer players is the number of
minutes played
in competitive matches. For this data, we
have a Time
column and values in the Event
column filter(Event == 'Sub In' | Event == 'Sub Out')
that
show how long a player has participated in a match.
#filter for period markers
markers.data <- df %>%
filter(Event == 'start' | Event == 'FT' | Event == 'HT')
#filter for player actions
actions.data <- df %>%
filter(Player == 'Joseph Okumu')
#bind the dataframes
full.data <- rbind(actions.data, markers.data)
full.data %>%
mutate(mins_played = if_else(Event == 'Sub Out', Mins,
if_else(Event == 'Sub In',max(Mins) - Mins,
if_else(Event != 'Sub Out' | Event != 'Sub In', max(Mins), max(Mins))))) %>%
filter(case_when(
Event %in% 'Sub Out' ~ Event != 'FT',
Event %in% 'Sub In' ~ Event != 'FT',
Event != 'Sub Out' ~ Event == 'FT',
Event != 'Sub In' ~ Event == 'FT')) %>%
select(mins_played) %>%
filter(row_number()==1)
mins_played |
---|
96 |
Another common metric used to evaluate soccer players is the
fouls
committed.
df %>%
filter(Player == 'Joseph Okumu') %>%
filter(Event == 'Foul') %>%
select(Event) %>%
count(Event)
Event | n |
---|---|
Foul | 2 |
The discipline of a game can be evaluated by compiling the number of
yellow cards
and red cards
accumulated.
df %>%
filter(Player == 'Joseph Okumu') %>%
filter(Result == 'yellow card' | Result == 'red card') %>%
select(Result) %>%
count(Result)
Result | n |
---|---|
yellow card | 1 |
Because the main objective of a soccer game is to outscore your
opponent, a goal
then becomes an immediate metric. From the
data collected, there were no goals, However, an common event that leads
to a goal is known as a shot
, and this is another metric
that can beused for evaluation.
df %>%
filter(Player=='Joseph Okumu') %>%
filter(Type == 'Shot') %>%
select(Type, Event) %>%
group_by(Type, Event) %>%
count(Event) %>%
pivot_wider(names_from = Event, values_from = n) %>%
knitr::kable(align = "l", format = "html", table.attr = "style='width:30%;'") %>%
kableExtra::kable_styling()
Type |
---|
A summary of Events
can also be preferred on when
evaluating sporting performance within a match.
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
df %>%
filter(Player == 'Joseph Okumu') %>%
select(Type, Event, Result) %>%
group_by(Type, Event) %>%
mutate(Event = fct_recode(Event, `INT/Tkl` = 'Tackle', `INT/Tkl` = 'INT')) %>%
mutate(Result = fct_recode(Result, good = 'RefAdv', good = 'won', good = 'sucsessful', good = 'complete', good = 'possession gain', `not good` = 'out of bounds', `not good` = 'incomplete', `not good` = 'possession loss', `not good` = 'save', `not good` = 'RefStop', `not good` = 'block', `not good` = 'unsucsessful', `not good` = 'lost', `good` = 'goal', `not good` = 'yellow card', `not good` = 'red card')) %>%
mutate(Result = fct_explicit_na(Result, "not good")) %>%
mutate(Result = replace(Result, Type=='Shot', 'good')) %>%
mutate(Result = replace(Result, Event=='INT/Tkl', 'good')) %>%
count(Result) %>%
pivot_wider(names_from = Result, values_from = n) %>%
drop_na(Type) %>%
mutate(across(where(is.numeric), tidyr::replace_na, 0)) %>%
mutate(total = good + `not good`) %>%
mutate(`%` = good/total * 100) %>%
subset(select = -c(`not good`)) %>%
mutate_if(is.numeric, ~round(., 0)) %>%
filter(!((Event == 'Fouled') | (Type == 'restart') | (Type == 'Shot') | (Event == 'Foul'))) %>%
knitr::kable(caption = "Event Summary") %>%
kableExtra::kable_styling()
Type | Event | good | total | % |
---|---|---|---|---|
challenge | aerial duel | 13 | 13 | 100 |
challenge | dribble | 3 | 3 | 100 |
challenge | loose ball duel | 1 | 2 | 50 |
pass | pass | 32 | 42 | 76 |
set piece | free kick | 4 | 8 | 50 |
Reshaping the data for R Shiny App asks for a lot more repetition and automation. In this case, repetition and automation mainly comes from the idea that; we want a chunk of codes to automatically produce graphs and tables for the same purpose, and repeatedly for different teams or players. For instance, earlier we may have created a summary of events by player; now we want to plot events by player. Later when selecting the player to evaluate in the Shiny App, we want to be able to dynamically choose the teams or players data to be visualized and presented.
It may sound abstract and vague at the moment, but when we get to Part 3, we will get a more concrete idea of how that works.
For this case, we will use a single visualization. We’ll visualize
Events
on a soccer pitch using the coordinate data in the
X
, Y
, X2
& Y2
columns.
df %>% filter(Player == 'Joseph Okumu') %>%
#Filter out events that do not include possession gain/loss
filter(!(Event == 'Sub Out' | Event == 'Sub In' | Event == 'Fairplay Start'
| Event == 'start' | Event == 'kick off' | Event == 'RefStop' | Event == 'fair play' | Event == 'HT' | Event == 'FT' | Event == 'Out Of Scope' )) %>%
filter(!(Type == 'pass' & Result == 'complete') & !(Event == 'launch' & Result == 'out of bounds')
& !(Event == 'throw' & Result == 'complete') & !(Event == 'touch' & Result == 'out of bounds')
& !(Event == 'touch' & Result == 'out of bounds') & !(Type == 'set piece' & Event == 'corner kick')
& !(Event == 'launch' & Event == 'possession loss') & !(Event == 'block' & is.na(Event))
& !(Event == 'touch' & is.na(Result)) & !(Event == 'launch' & is.na(Result))
& !(Event == 'touch' & Result == 'possession loss') & !(Event == 'pass' & is.na(Result))
& !(Event == 'block' & Result == 'out of bounds') & !(Event == 'dribble' & is.na(Result))
& !(Event == 'pass' & Result == 'save') & !(Event == 'launch' & Result == 'block')
& !(Event == 'block' & Result == 'possession loss')) %>%
ggplot(aes(x=X, y=Y, color = Result, shape = Event)) +
annotate_pitch(fill = 'springgreen4', colour = 'white') +
geom_point()+
theme_pitch()
Action Map
Now that we are done with the data transformation, and we are familiar with visually presenting data, let’s get started with creating a Shiny App. We will make use of the data objects and plot prototype that we saw in the first two parts.
For the shiny app, I wanted to get the quantitative assessments for a player from a specific match, and so I narrowed down my Inputs and Outputs to the following:
The shiny app shall have three drop-down inputs that will act as
filters on the dataset.Each drop down shall filter the data
sequentially, with the first filter being byMatchID
,
followed by Team
, then PLayer
. This should
look like this.
Once the data has been filtered, we can then use it to present the user with information by using summary tables and plots. Some of the outputs are shown below;
Type | Event | Result | Time | X | Y | X2 | Y2 |
---|---|---|---|---|---|---|---|
pass | pass | complete | 0 : 49 | 31 | 90 | 36 | 64 |
pass | pass | complete | 1 : 47 | 21 | 75 | 28 | 95 |
pass | pass | complete | 2 : 5 | 33 | 88 | 9 | 60 |
pass | pass | complete | 2 : 26 | 26 | 69 | 33 | 97 |
pass | pass | complete | 2 : 36 | 29 | 88 | 70 | 93 |
challenge | aerial duel | won | 4 : 6 | 30 | 85 | NA | NA |
A player match summary that can include different types of
quantitative information; For this case, we concatenate
minutes played
, fouls
, cards
,
shots
and goals
from part 2.
A player Dashboard similar to the
Action Summary Table
built in part 2
An Action Map
, similar to the one built in part
2
For better UI building, I’d suggest checking out shinyuieditor a package by Nick Strayer.
Once we have built up the Shiny App, we can host it using RStudio’s hosting service on shinyapps.io
To interact with the shiny app for this case, CLICK HERE