I have been struggling with this question for a couple of days. I need to scan every row from a data frame and then assign an univocal identifier for each rows based on values found in a second data frame. Here is a toy exemple.
df1<-data.frame(c(99443975,558,99009680,99044573,599,99172478))
names(df1)<-"Building"
V1<-c(558,134917,599,120384)
V2<-c(4400796,14400095,99044573,4500481)
V3<-c(NA,99009680,99340705,99132792)
V4<-c(NA,99156365,NA,99132794)
V5<-c(NA,99172478,NA, 99181273)
V6<-c(NA, NA, NA,99443975)
row_number<-1:4
df2<-data.frame(cbind(V1, V2,V3,V4,V5,V6, row_number))
The output I expect is what follows.
row_number_assigned<-c(4,1,2,3,3,2)
output<-data.frame(cbind(df1, row_number_assigned))
Any hints?
CodePudding user response:
You could use a tidyverse
approach:
library(dplyr)
library(tidyr)
df1 %>%
left_join(df2 %>%
pivot_longer(-row_number) %>%
select(-name),
by = c("Building" = "value"))
This returns
Building row_number
1 99443975 4
2 558 1
3 99009680 2
4 99044573 3
5 599 3
6 99172478 2
CodePudding user response:
Here's an efficient method using the arr.ind
feature of thewhich
function:
sapply( df1$Building, # will send Building entries one-by-one
function(inp){ which(inp == df2, # find matching values
arr.in=TRUE)[1]}) # return only row; not column
[1] 4 1 2 3 3 2
Incidentally your use of the data.frame(cbind(.))
construction is very dangerous. A much less dangerous, and using fewer keystrokes as well, method for dataframe construction would be:
df2<-data.frame( V1=c(558,134917,599,120384),
V2=c(4400796,14400095,99044573,4500481),
V3=c(NA,99009680,99340705,99132792),
V4=c(NA,99156365,NA,99132794),
V5=c(NA,99172478,NA, 99181273),
V6=c(NA, NA, NA,99443975) )
(It didn't cause coding errors this time but if there were any character columns it would changed all the numbers to character values.) If you learned this from a teacher, can you somehow approach them gently and do their future students a favor and let them know that cbind()
will coerce all of the arguments to the "lowest common denominator".