Home > Net >  How to remove rows if values from a specified column in data set 1 does not match the values of the
How to remove rows if values from a specified column in data set 1 does not match the values of the

Time:01-31

I have 2 data sets, both include ID columns with the same IDs. I have already removed rows from the first data set. For the second data set, I would like to remove any rows associated with IDs that do not match the first data set by using dplyr.

Meaning whatever is DF2 must be in DF1, if it is not then it must be removed from DF2.

For example:

DF1
ID X Y Z
1  1 1 1
2  2 2 2
3  3 3 3
5  5 5 5
6  6 6 6

DF2
ID A B C
1  1 1 1
2  2 2 2
3  3 3 3
4  4 4 4
5  5 5 5
6  6 6 6
7  7 7 7

DF2 once rows have been removed
ID A B C
1  1 1 1 
2  2 2 2 
3  3 3 3 
5  5 5 5 
6  6 6 6 

I used anti_join() which shows me the difference in rows but I cannot figure out how to remove any rows associated with IDs that do not match the first data set by using dplyr.

CodePudding user response:

Try with paste

i1 <- do.call(paste, DF2) %in% do.call(paste, DF1)
# if it is only to compare the 'ID' columns
i1 <- DF2$ID %in% DF1$ID
DF3 <- DF2[i1,]
DF3
  ID A B C
1  1 1 1 1
2  2 2 2 2
3  3 3 3 3
4  5 5 5 5
5  6 6 6 6

DF4 <- DF2[!i1,]
DF4
  ID A B C
4  4 4 4 4
7  7 7 7 7

data

DF1 <- structure(list(ID = c(1L, 2L, 3L, 5L, 6L), X = c(1L, 2L, 3L, 
5L, 6L), Y = c(1L, 2L, 3L, 5L, 6L), Z = c(1L, 2L, 3L, 5L, 6L)), class = "data.frame", row.names = c(NA, 
-5L))

DF2 <- structure(list(ID = 1:7, A = 1:7, B = 1:7, C = 1:7), class = "data.frame", row.names = c(NA, 
-7L))

CodePudding user response:

# Load package
library(dplyr)

# Load dataframes
df1 <- data.frame(
  ID = 1:6,
  X = 1:6,
  Y = 1:6,
  Z = 1:6
)
df2 <- data.frame(
  ID = 1:7,
  X = 1:7,
  Y = 1:7,
  Z = 1:7
)

# Include all rows in df1
df1 %>%
  left_join(df2)

Joining, by = c("ID", "X", "Y", "Z")
  ID X Y Z
1  1 1 1 1
2  2 2 2 2
3  3 3 3 3
4  4 4 4 4
5  5 5 5 5
6  6 6 6 6
  • Related