I've the following code:
read_prem_league <- function(year) {
"https://en.wikipedia.org/wiki/" %>%
paste0(year - 1, "-", substr(as.character(year), 3, 4), "_Premier_League") %>%
read_html() %>%
html_table() %>%
getElement(5) %>%
mutate(Season = year, .before = Pos)
}
read_prem_league(2015)
Which generates the following tibble:
#> # A tibble: 20 x 12
#> Season Pos Team Pld W D L GF GA GD Pts
#> <int> <chr> <int> <int> <int> <int> <int> <int> <chr> <int>
#> 1 2015 1 Manchester City (C) 38 27 5 6 83 32 51 86
#> 2 2015 2 Manchester United 38 21 11 6 73 44 29 74
#> 3 2015 3 Liverpool 38 20 9 9 68 42 26 69
#> 4 2015 4 Chelsea 38 19 10 9 58 36 22 67
#> 5 2015 5 Leicester City 38 20 6 12 68 50 18 66
#> 6 2015 6 West Ham United 38 19 8 11 62 47 15 65
#> 7 2015 7 Tottenham Hotspur 38 18 8 12 68 45 23 62
#> 8 2015 8 Arsenal 38 18 7 13 55 39 16 61
#> 9 2015 9 Leeds United 38 18 5 15 62 54 8 59
#> 10 2015 10 Everton 38 17 8 13 47 48 -1 59
#> 11 2015 11 Aston Villa 38 16 7 15 55 46 9 55
#> 12 2015 12 Newcastle United 38 12 9 17 46 62 -16 45
#> 13 2015 13 Wolverhampton Wande~ 38 12 9 17 36 52 -16 45
#> 14 2015 14 Crystal Palace 38 12 8 18 41 66 -25 44
#> 15 2015 15 Southampton 38 12 7 19 47 68 -21 43
#> 16 2015 16 Brighton & Hove Alb~ 38 9 14 15 40 46 -6 41
#> 17 2015 17 Burnley 38 10 9 19 33 55 -22 39
#> 18 2015 18 Fulham (R) 38 5 13 20 27 53 -26 28
#> 19 2015 19 West Bromwich Albio~ 38 5 11 22 35 76 -41 26
#> 20 2015 20 Sheffield United (R) 38 7 2 29 20 63 -43 23
#> # ... with 1 more variable: `Qualification or relegation` <chr>
I now want to merge the seasons from 2004 - 2015 by using the map_df
function.
map_df(read_prem_league(2004:2015))
is maybe something on the way? What I'm struggling with is how to give my console an interval commando.
CodePudding user response:
I think not all the years contain data in the format you expect. We can get all the years which fit your function by wrapping it in purrr::possibly()
. We set the otherwise
argument to NULL
and call compact
as next step to get rid of these elements. Then we can bind_rows
. To make bind_rows
work we need to convert the Pts
column to character
.
As next step you can inspect those years for which data couldn't be retrieved with your function.
library(tidyverse)
library(rvest)
read_prem_league <- function(year) {
"https://en.wikipedia.org/wiki/" %>%
paste0(year - 1, "-", substr(as.character(year), 3, 4), "_Premier_League") %>%
read_html() %>%
html_table() %>%
getElement(5) %>%
mutate(Season = year, .before = Pos,
Pts = as.character(Pts)) # we need to convert Pts to `character`
}
test_ls <- map(set_names(2004:2015),
# if `read_prem_league` throws an error use `NULL` as result
possibly(read_prem_league, otherwise = NULL)) %>%
# lets get rid of those `NULL` elements
compact() %>%
bind_rows()
test_ls
#> # A tibble: 180 × 12
#> Season Pos Team Pld W D L GF GA GD Pts
#> <int> <int> <chr> <int> <int> <int> <int> <int> <int> <chr> <chr>
#> 1 2005 1 Chelsea (C) 38 29 8 1 72 15 57 95
#> 2 2005 2 Arsenal 38 25 8 5 87 36 51 83
#> 3 2005 3 Manchester Unit… 38 22 11 5 58 26 32 77
#> 4 2005 4 Everton 38 18 7 13 45 46 −1 61
#> 5 2005 5 Liverpool 38 17 7 14 52 41 11 58
#> 6 2005 6 Bolton Wanderers 38 16 10 12 49 44 5 58
#> 7 2005 7 Middlesbrough 38 14 13 11 53 46 7 55
#> 8 2005 8 Manchester City 38 13 13 12 47 39 8 52
#> 9 2005 9 Tottenham Hotsp… 38 14 10 14 47 41 6 52
#> 10 2005 10 Aston Villa 38 12 11 15 45 52 −7 47
#> # … with 170 more rows, and 1 more variable:
#> # `Qualification or relegation` <chr>
Created on 2022-07-31 by the reprex package (v0.3.0)