Home > Net >  Complete column names with another dataframe column in R
Complete column names with another dataframe column in R

Time:11-12

I have this table:

library(rvest)
library(tidyverse)
tables_team_pl <- read_html('https://www.win-or-lose.com/football-team-colours/')
color_table <- tables_team_pl %>% html_table() %>% pluck(1) %>% select(-Away)

and also this one:

table_1 <- structure(list(Team = c("Arsenal", "Aston Villa", "Blackburn", 
"Bolton", "Chelsea", "Everton", "Fulham", "Liverpool", "Manchester City", 
"Manchester Utd", "Newcastle Utd", "Norwich City", "QPR", "Stoke City", 
"Sunderland", "Swansea City", "Tottenham", "West Brom", "Wigan Athletic", 
"Wolves")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-20L))

As you can see the second table has its names incomplete. for example, Manchester Utd should be Manchester United as in the first table.

So, all I need is to complete this second table extracting the same names from the first table.

So, I will have table_1 corrected: Manchester Utd should change to Manchester Unites, Blackburn should change to Blackburn Rovers and so on. The complete names should come from the first table.

Also on the second table I have QPR, whcih should be "Queens Park Rangers".

Any help?

CodePudding user response:

We may use a strindist join

library(fuzzyjoin)
library(dplyr)
stringdist_left_join(table_1, color_table, by = "Team", method = "soundex") %>%
     transmute(Team = coalesce(Team.y, Team.x)) %>%
     distinct
  • Related