Home > Net >  If a data frame contains a column whose members have different classes, how do I filter out the rows
If a data frame contains a column whose members have different classes, how do I filter out the rows

Time:10-28

This question is a bit difficult for me to ask because I am encountering it in a large dataset with spatial data which was given to me and I cannot post it here. In this data, one of the columns contains the geometry of spatial objects which differ in their classes, e.g. some are POLYGON, some are LINE etc. However, these attributes are not the only class that they have, they often have also sfg or sfc. So they are multiclass objects, and the same column contains objects of different classes.

I wanted to create some toy minimal example, but I don't know how to do it, because I can't find a way to set custom classes to rows in a data frame.

Therefore, the question is composed of two parts. In the first part I will post my code for creating example toy data, which doesn't work, but I think that the logic behind it will be quite easy to grasp and I am hoping that someone will be able to correct my code and create the example data.

In the second part, I will post the actual question.

First part:

a <- 1:5
b <- 6:10

df <- data.frame(a, b)

class(df$b[1]) <- c("X", "Y")
class(df$b[2]) <- c("X", "Z")
class(df$b[3]) <- c("X", "Y", "Z")
class(df$b[4]) <- c("Z", "Y")
class(df$b[5]) <- c("Y")

This doesn't work because if we call for example class(df$b[1]) it will give us integer instead of c("X", "Y"). But let's imagine that it did work, because that's what my data looks like.

Second part

I want to filter out just the cases where b possesses a class of Y. I was trying to use the following code:

df %>% 
  filter("Y" %in% class(b))

But it doesn't work. In my real dataset, I just get an empty dataframe back. It seems to me that class(b) inside of filter assesses the class of the whole b column instad of its individual members and therefore fails. But I don't know a way to change it.

Any help would be appreciated.

CodePudding user response:

Your %in% class() should work, but inherits is generally recommended way, e.g., inherits(b, "Y"). But as only lists can have items of different classes, you'll need to apply inherits to each list element. I would think df %>% filter(sapply(b, inherits, "Y")).

Here's my attempt at a reproducible example:

df = data.frame(a = 1:3)
df[["b"]] = list(1, 2, 3)

class(df$b[[1]]) = "X"
class(df$b[[2]]) = c("X", "Y")
class(df$b[[3]]) = "Y"

df %>% filter(sapply(b, inherits, "Y"))  
#   a b
# 1 2 2
# 2 3 3
  •  Tags:  
  • r
  • Related