Replace cells with a matching ID in R-CodePudding

I a dataframe (xlsx file) in the form: (this dataframe has 152069 rows).

Source	Target
DORTMUND	ANTWERP
MUMBAI	SPIJKENISSE
XIOALAN	BEILUN
ETTRINGEN	BREMERHAVEN
HILTER	BREMERHAVEN

and I also have another dataframe with the Ids and Names: (this dataframe has 10200 rows with unique id's for each name).

ID	Name
2678	DORTMUND
6049	MUMBAI
9873	XIOALAN
3014	ETTRINGEN
4055	HILTER
338	ANTWERP
8323	SPIJKENISSE
824	BEILUN
1272	BREMERHAVEN

I would like to replace the data of the first dataframe with their appropriate id (in the second dataframe). Do you have any suggestions on how to do this? Thank you in advance.

CodePudding user response：

Use match of first dfrm data agains second dfrm Name to create an index to apply to the ID column of the second dataframe. Obviously this should be done on R object for which you have adequate backups.

txt1 <-"| Source   | Target         |
  | DORTMUND | ANTWERP        |
  | MUMBAI   | SPIJKENISSE    |
  | XIOALAN  | BEILUN         |
  |ETTRINGEN |BREMERHAVEN     |
  |HILTER    |BREMERHAVEN     |"

 txt2 <- "| ID       | Name           |
  | 2678     | DORTMUND       |
  | 6049     | MUMBAI         |
  | 9873     | XIOALAN        |
  | 3014     | ETTRINGEN      |
  | 4055     | HILTER         |
  | 338      | ANTWERP        |
  | 8323     | SPIJKENISSE    |
  | 824      | BEILUN         |
  | 1272     | BREMERHAVEN    |"
  inp1 <-read.delim(text=txt1, sep="|")[,2:3]
  inp2 <-read.delim(text=txt2, sep="|")[,2:3]

> inp1[] <- lapply(inp1,trimws) 
> inp2[] <- lapply(inp2,trimws)

> inp1[] <- lapply(inp1, function(col){inp2$ID[match(col,inp2$Name)]})
> inp1
  Source Target
1   2678    338
2   6049   8323
3   9873    824
4   3014   1272
5   4055   1272

CodePudding user response：

With base R, you can use match to pair the names together then we can get the ID for those to replace for each column:

df1$Source <- df2$ID[match(df1$Source, df2$Name)]
df1$Target <- df2$ID[match(df1$Target, df2$Name)]

Output

  Source Target
1   2678    338
2   6049   8323
3   9873    824
4   3014   1272
5   4055   1272

Another option would be to mutate across and use match again:

library(tidyverse)

df1 %>% 
  mutate(across(everything(), ~ df2$ID[match(.x, df2$Name)]))

Another option would be to pivot to long form, then join the data together then pivot back wide (but not very efficient).

df1 %>% 
  pivot_longer(everything()) %>% 
  left_join(., df2, by = c("value" = "Name")) %>% 
  select(-value) %>% 
  group_by(grp = ceiling(row_number()/2)) %>% 
  pivot_wider(names_from = "name", values_from = "ID") %>% 
  select(-grp)

Data

df1 <- structure(list(Source = c("DORTMUND", "MUMBAI", "XIOALAN", "ETTRINGEN", 
"HILTER"), Target = c("ANTWERP", "SPIJKENISSE", "BEILUN", "BREMERHAVEN", 
"BREMERHAVEN")), class = "data.frame", row.names = c(NA, -5L))

df2 <- structure(list(ID = c(2678L, 6049L, 9873L, 3014L, 4055L, 338L, 
8323L, 824L, 1272L), Name = c("DORTMUND", "MUMBAI", "XIOALAN", 
"ETTRINGEN", "HILTER", "ANTWERP", "SPIJKENISSE", "BEILUN", "BREMERHAVEN"
)), class = "data.frame", row.names = c(NA, -9L))