I have two data sets. The aim is to add a row to df1 if a new code appears in the df2 and the df2$LABEL is set to PRIV. I tried simply with the if clause, but it didn't work for me. I think maybe there is a solution with dplyr library. Logics I used (maybe will be helpful):
df1$CODE != df2$CODE & df2$LABEL == "PRIV"
Dummy examples of datasets with dput (df1 first and df2 second):
structure(list(OUTLETID = c("11N", "12B", "17C",
"44O"), CODE = c(4623,
4623, 4111, 4343),
PRICE = c(1.45, 1.45, 5.45, 5.03), LABEL = c("PRIV", "PRIV", "NON PRIV", "NON PRIV")),
row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"))
structure(list(CODE = c(4623, 4205, 4551),
PRICE = c(1.45, 1.17, 5.99), LABEL = c("PRIV", "PRIV", "NON PRIV")),
row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))
So in this example, a new row with CODE=4205, PRICE=1.17 and LABEL="PRIV" should be added to df1. As in df2 there is no OUTLETID, should be just NA in that column. This code should work for the general case, bigger datasets, not just for this small example.
CodePudding user response:
As you have to deal with large data sets, please find below a solution with the data.table
package:
library(data.table)
setDT(df1)
setDT(df2)
rbind(df1,df2[CODE %in% setdiff(df2$CODE, df1$CODE) & LABEL == "PRIV",], fill = TRUE)
#> OUTLETID CODE PRICE LABEL
#> 1: 11N 4623 1.45 PRIV
#> 2: 12B 4623 1.45 PRIV
#> 3: 17C 4111 5.45 NON PRIV
#> 4: 44O 4343 5.03 NON PRIV
#> 5: <NA> 4205 1.17 PRIV
Not clear for me if you want to keep or remove the "NON PRIV" label from the df1
. So if you want to simultaneously remove the rows with "NON PRIV" label from df1
you can use the following line of code:
rbind(df1[LABEL == "PRIV"],df2[CODE %in% setdiff(df2$CODE, df1$CODE) & LABEL == "PRIV",], fill = TRUE)
#> OUTLETID CODE PRICE LABEL
#> 1: 11N 4623 1.45 PRIV
#> 2: 12B 4623 1.45 PRIV
#> 3: <NA> 4205 1.17 PRIV
Created on 2021-10-26 by the reprex package (v0.3.0)