Adding a column from another dataframe with only some of the rows-CodePudding

Say I have a data frame, df1, with many columns, and I'm interested in only one column. something like this:

               outcome   Feature1   Feature2
Person1          Yes        10         11
Person10         Yes        28.3       9
Person11         Yes        1          0
Person12         No         100        23
Pt33             Yes        34         67
Pt40x            No         42         98
pt80xx           No         0          11

And I have another data frame, df2, with only some of df1 and it does not have the outcome column. Something like this:

                Feature1   Feature2
Person1           10         11
Person10          28.3       9
Pt33              34         67
pt80xx            0          11

I want to add the outcome column from df1 to df2, only for the existing rows in df2, so the final result:

              Outcome  Feature1   Feature 2
Person1         Yes      10         11
Person10        Yes      28.3       9
Pt33            Yes      34         67
pt80xx          No       0          11

CodePudding user response：

You could use the filtering join, semi_join() from dplyr:

library(dplyr)

semi_join(df1, df2, by = c("Feature1", "Feature2"))

#          outcome Feature1 Feature2
# Person1      Yes     10.0       11
# Person10     Yes     28.3        9
# Pt33         Yes     34.0       67
# pt80xx        No      0.0       11

The mutating joins like left_join() will drop the row names, so you need to convert row names to a explicit column before using them.

left_join(tibble::rownames_to_column(df2), df1, by = c("Feature1", "Feature2"))

#    rowname Feature1 Feature2 outcome
# 1  Person1     10.0       11     Yes
# 2 Person10     28.3        9     Yes
# 3     Pt33     34.0       67     Yes
# 4   pt80xx      0.0       11      No

CodePudding user response：

Use the row.names from 'df2' with 'Outcome' as column name to extract the values from 'df1' in base R

df2$outcome <- df1[row.names(df2), 'outcome']
df2 <- df2[names(df1)]

-output

> df2
         outcome Feature1 Feature2
Person1      Yes     10.0       11
Person10     Yes     28.3        9
Pt33         Yes     34.0       67
pt80xx        No      0.0       11

Or it is just subsetting based on the row.names

df2 <- df1[row.names(df2),]
df2
         outcome Feature1 Feature2
Person1      Yes     10.0       11
Person10     Yes     28.3        9
Pt33         Yes     34.0       67
pt80xx        No      0.0       11

data

df1 <- structure(list(outcome = c("Yes", "Yes", "Yes", "No", "Yes", 
"No", "No"), Feature1 = c(10, 28.3, 1, 100, 34, 42, 0), Feature2 = c(11L, 
9L, 0L, 23L, 67L, 98L, 11L)), class = "data.frame", row.names = c("Person1", 
"Person10", "Person11", "Person12", "Pt33", "Pt40x", "pt80xx"
))

df2 <- structure(list(Feature1 = c(10, 28.3, 34, 0), Feature2 = c(11L, 
9L, 67L, 11L)), class = "data.frame", row.names = c("Person1", 
"Person10", "Pt33", "pt80xx"))