I've got a small incomplete dataframe which I'm to complete by drawing ID numbers from a significantly larger one. The structure of the small one (df) is as follows:
$ Date: chr [1:9] "2022-10-04" "2022-10-05" "2022-10-06" "2022-10-07"
$ Descr Activity: chr [1:9] "Mixed farming" "Support activities for crop production" "Support activities for animal production (other than farm animal boarding and care) n.e.c." "Post-harvest crop activities"
The larger one (df2) follows this pattern (there are many more columns in this df, but I've only posted snip of the relevant one):
$ Descr: chr [1:9] "Mixed farming" "Support activities for crop production" "Support activities for animal production (other than farm animal boarding and care) n.e.c." "Post-harvest crop activities"
$ SIC: chr [1:9] "01500" "01610" "01629" "01630"
I'm trying to source the 'SIC' column by matching the contents of the two 'Descr' columns by doing this (if the Descr doesn't match, "N/A" is what I'd like to be the row's place):
df$SIC <- ifelse(match(df$`Descr Activity`, df2$Descr), df2$SIC, "N/A")
However the above only posts the first 'SIC' value from the df2 sheet, and applies it to every row rather than trying to match the activity with the relevant 'SIC'. On the advice of others, I've tried a right_join:
df <- df2 %>% select(Descr, SIC) %>% right_join(df, by=`Descr Activity`)
But that results in a:
Error in standardise_join_by(by, x_names = x_names, y_names = y_names, :
object 'Descr Activity' not found
I've been getting other errors when trying variations, like:
Must subset columns with a valid subscript vector.
x Subscript has the wrong type `data.frame<>`.
i It must be numeric or character.
Does anyone have any advice on where I'm going wrong?
CodePudding user response:
match
returns the position of the first argument in the second.
Use the return of match
to extracto the index positions of SIC. The values not found wil be set to NA.
df$SIC <- df2$SIC[match(df$`Descr Activity`, df2$Descr)]
CodePudding user response:
The name of the join column is different in each data.frame, so you need to specify what the name is in each:
df <- df2 %>% select(Descr, SIC) %>% right_join(df, by=c(`Desc`=`Descr Activity`))