Home > Enterprise >  Merge several similar data frames from function using map_df
Merge several similar data frames from function using map_df

Time:07-31

I've the following code:

read_prem_league <- function(year) { 
"https://en.wikipedia.org/wiki/" %>%
  paste0(year - 1, "-", substr(as.character(year), 3, 4), "_Premier_League") %>%
  read_html() %>% 
  html_table() %>% 
  getElement(5) %>%
  mutate(Season = year, .before = Pos)
}
read_prem_league(2015)

Which generates the following tibble:

#> # A tibble: 20 x 12
#>    Season    Pos Team                   Pld     W     D     L    GF    GA GD     Pts
#>            <int> <chr>                <int> <int> <int> <int> <int> <int> <chr> <int>
#>  1  2015   1 Manchester City (C)     38    27     5     6    83    32  51      86
#>  2  2015   2 Manchester United       38    21    11     6    73    44  29      74
#>  3  2015   3 Liverpool               38    20     9     9    68    42  26      69
#>  4  2015   4 Chelsea                 38    19    10     9    58    36  22      67
#>  5  2015   5 Leicester City          38    20     6    12    68    50  18      66
#>  6  2015   6 West Ham United         38    19     8    11    62    47  15      65
#>  7  2015   7 Tottenham Hotspur       38    18     8    12    68    45  23      62
#>  8  2015   8 Arsenal                 38    18     7    13    55    39  16      61
#>  9  2015   9 Leeds United            38    18     5    15    62    54  8       59
#> 10  2015   10 Everton                38    17     8    13    47    48 -1       59
#> 11  2015   11 Aston Villa            38    16     7    15    55    46  9       55
#> 12  2015   12 Newcastle United       38    12     9    17    46    62 -16      45
#> 13  2015   13 Wolverhampton Wande~   38    12     9    17    36    52 -16      45
#> 14  2015   14 Crystal Palace         38    12     8    18    41    66 -25      44
#> 15  2015   15 Southampton            38    12     7    19    47    68 -21      43
#> 16  2015   16 Brighton & Hove Alb~   38     9    14    15    40    46 -6       41
#> 17  2015   17 Burnley                38    10     9    19    33    55 -22      39
#> 18  2015   18 Fulham (R)             38     5    13    20    27    53 -26      28
#> 19  2015   19 West Bromwich Albio~   38     5    11    22    35    76 -41      26
#> 20  2015   20 Sheffield United (R)   38     7     2    29    20    63 -43      23
#> # ... with 1 more variable: `Qualification or relegation` <chr>

I now want to merge the seasons from 2004 - 2015 by using the map_df function.

map_df(read_prem_league(2004:2015)) is maybe something on the way? What I'm struggling with is how to give my console an interval commando.

CodePudding user response:

I think not all the years contain data in the format you expect. We can get all the years which fit your function by wrapping it in purrr::possibly(). We set the otherwise argument to NULL and call compact as next step to get rid of these elements. Then we can bind_rows. To make bind_rows work we need to convert the Pts column to character.

As next step you can inspect those years for which data couldn't be retrieved with your function.

library(tidyverse)
library(rvest)


read_prem_league <- function(year) { 
  "https://en.wikipedia.org/wiki/" %>%
    paste0(year - 1, "-", substr(as.character(year), 3, 4), "_Premier_League") %>%
    read_html() %>% 
    html_table() %>% 
    getElement(5) %>%
    mutate(Season = year, .before = Pos,
           Pts = as.character(Pts)) # we need to convert Pts to `character`
}


test_ls <- map(set_names(2004:2015),
               # if `read_prem_league` throws an error use `NULL` as result
               possibly(read_prem_league, otherwise = NULL)) %>%
               # lets get rid of those `NULL` elements
               compact() %>% 
            bind_rows()


test_ls 

#> # A tibble: 180 × 12
#>    Season   Pos Team               Pld     W     D     L    GF    GA GD    Pts  
#>     <int> <int> <chr>            <int> <int> <int> <int> <int> <int> <chr> <chr>
#>  1   2005     1 Chelsea (C)         38    29     8     1    72    15  57   95   
#>  2   2005     2 Arsenal             38    25     8     5    87    36  51   83   
#>  3   2005     3 Manchester Unit…    38    22    11     5    58    26  32   77   
#>  4   2005     4 Everton             38    18     7    13    45    46 −1    61   
#>  5   2005     5 Liverpool           38    17     7    14    52    41  11   58   
#>  6   2005     6 Bolton Wanderers    38    16    10    12    49    44  5    58   
#>  7   2005     7 Middlesbrough       38    14    13    11    53    46  7    55   
#>  8   2005     8 Manchester City     38    13    13    12    47    39  8    52   
#>  9   2005     9 Tottenham Hotsp…    38    14    10    14    47    41  6    52   
#> 10   2005    10 Aston Villa         38    12    11    15    45    52 −7    47   
#> # … with 170 more rows, and 1 more variable:
#> #   `Qualification or relegation` <chr>

Created on 2022-07-31 by the reprex package (v0.3.0)

  •  Tags:  
  • r
  • Related