Say I have a data frame, df1
, with many columns, and I'm interested in only one column. something like this:
outcome Feature1 Feature2
Person1 Yes 10 11
Person10 Yes 28.3 9
Person11 Yes 1 0
Person12 No 100 23
Pt33 Yes 34 67
Pt40x No 42 98
pt80xx No 0 11
And I have another data frame, df2
, with only some of df1
and it does not have the outcome
column. Something like this:
Feature1 Feature2
Person1 10 11
Person10 28.3 9
Pt33 34 67
pt80xx 0 11
I want to add the outcome
column from df1
to df2
, only for the existing rows in df2
, so the final result:
Outcome Feature1 Feature 2
Person1 Yes 10 11
Person10 Yes 28.3 9
Pt33 Yes 34 67
pt80xx No 0 11
CodePudding user response:
You could use the filtering join, semi_join()
from dplyr
:
library(dplyr)
semi_join(df1, df2, by = c("Feature1", "Feature2"))
# outcome Feature1 Feature2
# Person1 Yes 10.0 11
# Person10 Yes 28.3 9
# Pt33 Yes 34.0 67
# pt80xx No 0.0 11
The mutating joins like left_join()
will drop the row names, so you need to convert row names to a explicit column before using them.
left_join(tibble::rownames_to_column(df2), df1, by = c("Feature1", "Feature2"))
# rowname Feature1 Feature2 outcome
# 1 Person1 10.0 11 Yes
# 2 Person10 28.3 9 Yes
# 3 Pt33 34.0 67 Yes
# 4 pt80xx 0.0 11 No
CodePudding user response:
Use the row.names
from 'df2' with 'Outcome' as column name to extract the values from 'df1' in base R
df2$outcome <- df1[row.names(df2), 'outcome']
df2 <- df2[names(df1)]
-output
> df2
outcome Feature1 Feature2
Person1 Yes 10.0 11
Person10 Yes 28.3 9
Pt33 Yes 34.0 67
pt80xx No 0.0 11
Or it is just subsetting based on the row.names
df2 <- df1[row.names(df2),]
df2
outcome Feature1 Feature2
Person1 Yes 10.0 11
Person10 Yes 28.3 9
Pt33 Yes 34.0 67
pt80xx No 0.0 11
data
df1 <- structure(list(outcome = c("Yes", "Yes", "Yes", "No", "Yes",
"No", "No"), Feature1 = c(10, 28.3, 1, 100, 34, 42, 0), Feature2 = c(11L,
9L, 0L, 23L, 67L, 98L, 11L)), class = "data.frame", row.names = c("Person1",
"Person10", "Person11", "Person12", "Pt33", "Pt40x", "pt80xx"
))
df2 <- structure(list(Feature1 = c(10, 28.3, 34, 0), Feature2 = c(11L,
9L, 67L, 11L)), class = "data.frame", row.names = c("Person1",
"Person10", "Pt33", "pt80xx"))