Home > Back-end >  How do I write a function to loop through year and country data and count the countries by year?
How do I write a function to loop through year and country data and count the countries by year?

Time:11-13

I am working with football data from 2010 to 2020 for teams and I want to make an excel file with data that has the index of the share of countries in a particular team in a particular year. Currently I am using R with the code :

data_collect <- ARS_data %>%
filter(Team == "ARS") %>%
filter(Year == 2010) %>%
count(Nation) 

sum(data_collect$n)

After this, I am manually inputting data into the excel sheet based on the table created. This is the data that I am using :

excel sheet with data with football team stats

I want to find a way in R that I can use to automate this process and then export the result back into excel.

My results look like this right now:

Excel table with countries listed and number of players in a particular year from that country.

I would really appreciate any help with this!

CodePudding user response:

Using some mock data as example data you could achieve your desired result like so:

  1. Use count to get the counts by team, year and nation
  2. Use add_count to add the total number of players per team and year
  3. Use pivot_wider to convert to wide format
library(dplyr)
library(tidyr)

data_collect <- ARS_data %>%
  count(Team, Year, Nation) %>% 
  add_count(Team, Year, wt = n, name = "#players") %>% 
  pivot_wider(names_from = Nation, values_from = n, values_fill = 0)

data_collect
#> # A tibble: 3 × 8
#>   Team   Year `#players`   ESP   FRA   GER   ITA   ENG
#>   <chr> <int>      <int> <int> <int> <int> <int> <int>
#> 1 ARS    2010          9     2     4     1     2     0
#> 2 ARS    2011          6     4     0     2     0     0
#> 3 ARS    2012          5     1     1     2     0     1

There are several options to export to excel. Using the writexl package you could do:

writexl::write_xlsx(list(mysheet = data_collect), path = tempfile(fileext = ".xlsx")) 

Simply replace path by your desired filename and mysheet by your desired sheet name.

DATA

set.seed(42)
ARS_data <- 
  data.frame(
    Team = "ARS",
    Year = sample(2010:2012, 20, replace = TRUE),
    Nation = sample(c("ENG", "GER", "ITA", "ESP", "FRA"), 20, replace = TRUE)
  )
  • Related