Home > Enterprise >  Merge several similar data frames from function using map_df
Merge several similar data frames from function using map_df


I've the following code:

read_prem_league <- function(year) { 
"https://en.wikipedia.org/wiki/" %>%
  paste0(year - 1, "-", substr(as.character(year), 3, 4), "_Premier_League") %>%
  read_html() %>% 
  html_table() %>% 
  getElement(5) %>%
  mutate(Season = year, .before = Pos)

Which generates the following tibble:

#> # A tibble: 20 x 12
#>    Season    Pos Team                   Pld     W     D     L    GF    GA GD     Pts
#>            <int> <chr>                <int> <int> <int> <int> <int> <int> <chr> <int>
#>  1  2015   1 Manchester City (C)     38    27     5     6    83    32  51      86
#>  2  2015   2 Manchester United       38    21    11     6    73    44  29      74
#>  3  2015   3 Liverpool               38    20     9     9    68    42  26      69
#>  4  2015   4 Chelsea                 38    19    10     9    58    36  22      67
#>  5  2015   5 Leicester City          38    20     6    12    68    50  18      66
#>  6  2015   6 West Ham United         38    19     8    11    62    47  15      65
#>  7  2015   7 Tottenham Hotspur       38    18     8    12    68    45  23      62
#>  8  2015   8 Arsenal                 38    18     7    13    55    39  16      61
#>  9  2015   9 Leeds United            38    18     5    15    62    54  8       59
#> 10  2015   10 Everton                38    17     8    13    47    48 -1       59
#> 11  2015   11 Aston Villa            38    16     7    15    55    46  9       55
#> 12  2015   12 Newcastle United       38    12     9    17    46    62 -16      45
#> 13  2015   13 Wolverhampton Wande~   38    12     9    17    36    52 -16      45
#> 14  2015   14 Crystal Palace         38    12     8    18    41    66 -25      44
#> 15  2015   15 Southampton            38    12     7    19    47    68 -21      43
#> 16  2015   16 Brighton & Hove Alb~   38     9    14    15    40    46 -6       41
#> 17  2015   17 Burnley                38    10     9    19    33    55 -22      39
#> 18  2015   18 Fulham (R)             38     5    13    20    27    53 -26      28
#> 19  2015   19 West Bromwich Albio~   38     5    11    22    35    76 -41      26
#> 20  2015   20 Sheffield United (R)   38     7     2    29    20    63 -43      23
#> # ... with 1 more variable: `Qualification or relegation` <chr>

I now want to merge the seasons from 2004 - 2015 by using the map_df function.

map_df(read_prem_league(2004:2015)) is maybe something on the way? What I'm struggling with is how to give my console an interval commando.

CodePudding user response:

I think not all the years contain data in the format you expect. We can get all the years which fit your function by wrapping it in purrr::possibly(). We set the otherwise argument to NULL and call compact as next step to get rid of these elements. Then we can bind_rows. To make bind_rows work we need to convert the Pts column to character.

As next step you can inspect those years for which data couldn't be retrieved with your function.


read_prem_league <- function(year) { 
  "https://en.wikipedia.org/wiki/" %>%
    paste0(year - 1, "-", substr(as.character(year), 3, 4), "_Premier_League") %>%
    read_html() %>% 
    html_table() %>% 
    getElement(5) %>%
    mutate(Season = year, .before = Pos,
           Pts = as.character(Pts)) # we need to convert Pts to `character`

test_ls <- map(set_names(2004:2015),
               # if `read_prem_league` throws an error use `NULL` as result
               possibly(read_prem_league, otherwise = NULL)) %>%
               # lets get rid of those `NULL` elements
               compact() %>% 


#> # A tibble: 180 × 12
#>    Season   Pos Team               Pld     W     D     L    GF    GA GD    Pts  
#>     <int> <int> <chr>            <int> <int> <int> <int> <int> <int> <chr> <chr>
#>  1   2005     1 Chelsea (C)         38    29     8     1    72    15  57   95   
#>  2   2005     2 Arsenal             38    25     8     5    87    36  51   83   
#>  3   2005     3 Manchester Unit…    38    22    11     5    58    26  32   77   
#>  4   2005     4 Everton             38    18     7    13    45    46 −1    61   
#>  5   2005     5 Liverpool           38    17     7    14    52    41  11   58   
#>  6   2005     6 Bolton Wanderers    38    16    10    12    49    44  5    58   
#>  7   2005     7 Middlesbrough       38    14    13    11    53    46  7    55   
#>  8   2005     8 Manchester City     38    13    13    12    47    39  8    52   
#>  9   2005     9 Tottenham Hotsp…    38    14    10    14    47    41  6    52   
#> 10   2005    10 Aston Villa         38    12    11    15    45    52 −7    47   
#> # … with 170 more rows, and 1 more variable:
#> #   `Qualification or relegation` <chr>

Created on 2022-07-31 by the reprex package (v0.3.0)

  •  Tags:  
  • r
  • Related