Home > front end >  How to assign unambiguous values for each row in a data frame based on values found in rows from ano
How to assign unambiguous values for each row in a data frame based on values found in rows from ano

Time:12-12

I have been struggling with this question for a couple of days. I need to scan every row from a data frame and then assign an univocal identifier for each rows based on values found in a second data frame. Here is a toy exemple.

df1<-data.frame(c(99443975,558,99009680,99044573,599,99172478))
names(df1)<-"Building"
V1<-c(558,134917,599,120384)
V2<-c(4400796,14400095,99044573,4500481)
V3<-c(NA,99009680,99340705,99132792)
V4<-c(NA,99156365,NA,99132794)
V5<-c(NA,99172478,NA, 99181273)
V6<-c(NA, NA, NA,99443975)
row_number<-1:4
df2<-data.frame(cbind(V1, V2,V3,V4,V5,V6, row_number))

The output I expect is what follows.

row_number_assigned<-c(4,1,2,3,3,2)
output<-data.frame(cbind(df1, row_number_assigned))

Any hints?

CodePudding user response:

You could use a tidyverse approach:

library(dplyr)
library(tidyr)

df1 %>% 
  left_join(df2 %>% 
              pivot_longer(-row_number) %>% 
              select(-name),
            by = c("Building" = "value"))

This returns

  Building row_number
1 99443975          4
2      558          1
3 99009680          2
4 99044573          3
5      599          3
6 99172478          2

CodePudding user response:

Here's an efficient method using the arr.ind feature of thewhich function:

 sapply( df1$Building, # will send  Building entries one-by-one
            function(inp){ which(inp == df2, # find matching values
                               arr.in=TRUE)[1]})  # return only row; not column
 [1] 4 1 2 3 3 2

Incidentally your use of the data.frame(cbind(.)) construction is very dangerous. A much less dangerous, and using fewer keystrokes as well, method for dataframe construction would be:

df2<-data.frame( V1=c(558,134917,599,120384),
  V2=c(4400796,14400095,99044573,4500481),
  V3=c(NA,99009680,99340705,99132792),
  V4=c(NA,99156365,NA,99132794),
  V5=c(NA,99172478,NA, 99181273),
  V6=c(NA, NA, NA,99443975) )

(It didn't cause coding errors this time but if there were any character columns it would changed all the numbers to character values.) If you learned this from a teacher, can you somehow approach them gently and do their future students a favor and let them know that cbind() will coerce all of the arguments to the "lowest common denominator".

  • Related