Home > Mobile >  Taking a subset of a main dataset based on the values of another data frame that is a subset of the
Taking a subset of a main dataset based on the values of another data frame that is a subset of the

Time:10-21

I have these two datasets : df as the main data frame and g as a created data frame

df = data.frame(x = seq(1,20,2),y = letters[1:10] )
df

g = data.frame(xx = c(2,3,4,5,7,8,9) )

and I want to take a subset of the data frame df based on the values xx of the data frame g as follows

m = df[df$x==g$xx,]

but the result is based on the match between the two data frames for the order of the matched values. not the matched values themselves.

output

> m
  x y
2 3 b

I don't what the error I am making.

CodePudding user response:

Maybe you need to use %in% instead of ==

> df[df$x %in% g$xx,]
  x y
2 3 b
3 5 c
4 7 d
5 9 e

You can also use inner_join from dplyr:

library(dplyr)
df %>% 
  inner_join(g, by = c("x" = "xx"))

intersect can be useful too

df[intersect(df$x, g$xx),]

CodePudding user response:

using merge

 merge(df, g, by.x = "x", by.y = 'xx')
  x y
1 3 b
2 5 c
3 7 d
4 9 e
  • Related