Home > Software engineering >  Merge two dataframes by different columns with a loop
Merge two dataframes by different columns with a loop

Time:09-17

I am trying to automatically match two dataframes with a (for?) loop.

> df_key
  country_election keyword1 keyword2   keyword3 keyword4  keyword5
1           France    Paris    Rome      Madrid   London Marseille
2            Spain Valencia   Berlin Manchester   Zurich     Milan
> df_country
         city        country
1       Paris         France
2        Rome          Italy
3      Madrid          Spain
4      London United Kingdom
5   Marseille         France
6    Valencia          Spain
7      Berlin        Germany
8  Manchester United Kingdom
9      Zurich    Switzerland
10      Milan          Italy

In this example I would like to match every keyword in df_key with df_country to add country columns.

  country_election keyword1 country_1 keyword2 country_2   keyword3      country_3
1           France    Paris    France    Rome      Italy     Madrid          Spain
2            Spain Valencia     Spain   Berlin   Germany Manchester United Kingdom

FInally, I'd also like to have a series of dummy variables checking whether country_i is equal to country_election. Thanks a lot for your help.

df_key <- structure(list(country_election = c("France", "Spain"), keyword1 = c("Paris", "Valencia"), 
keyword2 = c("Rome ", "Berlin"), keyword3 = c("Madrid", "Manchester"), keyword4 = c("London", "Zurich"), 
keyword5 = c("Marseille", "Milan")), class = "data.frame", row.names = c(NA, -2L))

df_country <- structure(list(city = c("Paris", "Rome", "Madrid", "London", "Marseille", "Valencia", 
"Berlin", "Manchester", "Zurich", "Milan"), country = c("France", "Italy", "Spain", "United Kingdom", 
"France", "Spain", "Germany", "United Kingdom", "Switzerland", "Italy")), 
class = "data.frame", row.names = c(NA, -10L))

CodePudding user response:

You can match the city names, extract the country and create new columns. If the column order is important, extract the numeric part from it and order the data.

cols <- sub('keyword', 'country', names(df_key[-1]))
df_key[cols] <- df_country$country[match(as.matrix(df_key[-1]), df_country$city)]
df_key[order(as.numeric(sub('\\D ', '', names(df_key))), na.last = FALSE)]

#  country_election keyword1 country1 keyword2 country2   keyword3
#1           France    Paris   France     Rome    Italy     Madrid
#2            Spain Valencia    Spain   Berlin  Germany Manchester

#        country3 keyword4       country4  keyword5 country5
#1          Spain   London United Kingdom Marseille   France
#2 United Kingdom   Zurich    Switzerland     Milan    Italy
  • Related