I a dataframe (xlsx file) in the form: (this dataframe has 152069 rows).
Source | Target |
---|---|
DORTMUND | ANTWERP |
MUMBAI | SPIJKENISSE |
XIOALAN | BEILUN |
ETTRINGEN | BREMERHAVEN |
HILTER | BREMERHAVEN |
and I also have another dataframe with the Ids and Names: (this dataframe has 10200 rows with unique id's for each name).
ID | Name |
---|---|
2678 | DORTMUND |
6049 | MUMBAI |
9873 | XIOALAN |
3014 | ETTRINGEN |
4055 | HILTER |
338 | ANTWERP |
8323 | SPIJKENISSE |
824 | BEILUN |
1272 | BREMERHAVEN |
I would like to replace the data of the first dataframe with their appropriate id (in the second dataframe). Do you have any suggestions on how to do this? Thank you in advance.
CodePudding user response:
Use match
of first dfrm data agains second dfrm Name
to create an index to apply to the ID column of the second dataframe. Obviously this should be done on R object for which you have adequate backups.
txt1 <-"| Source | Target |
| DORTMUND | ANTWERP |
| MUMBAI | SPIJKENISSE |
| XIOALAN | BEILUN |
|ETTRINGEN |BREMERHAVEN |
|HILTER |BREMERHAVEN |"
txt2 <- "| ID | Name |
| 2678 | DORTMUND |
| 6049 | MUMBAI |
| 9873 | XIOALAN |
| 3014 | ETTRINGEN |
| 4055 | HILTER |
| 338 | ANTWERP |
| 8323 | SPIJKENISSE |
| 824 | BEILUN |
| 1272 | BREMERHAVEN |"
inp1 <-read.delim(text=txt1, sep="|")[,2:3]
inp2 <-read.delim(text=txt2, sep="|")[,2:3]
> inp1[] <- lapply(inp1,trimws)
> inp2[] <- lapply(inp2,trimws)
> inp1[] <- lapply(inp1, function(col){inp2$ID[match(col,inp2$Name)]})
> inp1
Source Target
1 2678 338
2 6049 8323
3 9873 824
4 3014 1272
5 4055 1272
CodePudding user response:
With base R, you can use match
to pair the names together then we can get the ID
for those to replace for each column:
df1$Source <- df2$ID[match(df1$Source, df2$Name)]
df1$Target <- df2$ID[match(df1$Target, df2$Name)]
Output
Source Target
1 2678 338
2 6049 8323
3 9873 824
4 3014 1272
5 4055 1272
Another option would be to mutate across and use match
again:
library(tidyverse)
df1 %>%
mutate(across(everything(), ~ df2$ID[match(.x, df2$Name)]))
Another option would be to pivot to long form, then join the data together then pivot back wide (but not very efficient).
df1 %>%
pivot_longer(everything()) %>%
left_join(., df2, by = c("value" = "Name")) %>%
select(-value) %>%
group_by(grp = ceiling(row_number()/2)) %>%
pivot_wider(names_from = "name", values_from = "ID") %>%
select(-grp)
Data
df1 <- structure(list(Source = c("DORTMUND", "MUMBAI", "XIOALAN", "ETTRINGEN",
"HILTER"), Target = c("ANTWERP", "SPIJKENISSE", "BEILUN", "BREMERHAVEN",
"BREMERHAVEN")), class = "data.frame", row.names = c(NA, -5L))
df2 <- structure(list(ID = c(2678L, 6049L, 9873L, 3014L, 4055L, 338L,
8323L, 824L, 1272L), Name = c("DORTMUND", "MUMBAI", "XIOALAN",
"ETTRINGEN", "HILTER", "ANTWERP", "SPIJKENISSE", "BEILUN", "BREMERHAVEN"
)), class = "data.frame", row.names = c(NA, -9L))