Home > other >  (dataframe) remove all of something's data(e.g. abcdefg) if some data had problem(e.g. cdf)
(dataframe) remove all of something's data(e.g. abcdefg) if some data had problem(e.g. cdf)

Time:04-07

If I think there are some problem data and I want to remove all of fruit that has <0 data, how can I do?

fruit year price
apple  2021    2
apple  2020   -9
apple  2019    3
banana 2021    9
banana 2020    7
banana 2019    5
orange 2021    7
orange 2020    2
orange 2019   -3

->

fruit year price
banana 2021    9
banana 2020    7
banana 2019    5

CodePudding user response:

There are several possible solutions, here are three:

base R

dat[!dat$fruit %in% unique(dat[dat$price < 0, "fruit"]),]

dplyr

With all:

library(dplyr)
dat %>% 
  group_by(fruit) %>% 
  filter(all(price >= 0))

Or, with any:

dat %>% 
  group_by(fruit) %>% 
  filter(!any(price < 0))

output

# A tibble: 3 x 3
# Groups:   fruit [1]
  fruit   year price
  <chr>  <int> <int>
1 banana  2021     9
2 banana  2020     7
3 banana  2019     5

CodePudding user response:

First your data df:

   fruit year price
1  apple 2021     2
2  apple 2020    -9
3  apple 2019     3
4 banana 2021     9
5 banana 2020     7
6 banana 2019     5
7 orange 2021     7
8 orange 2020     2
9 orange 2019    -3

You can use the following code to remove all the rows of per group with a negative price:

df <- df[with(df, ave(price >= 0, fruit, FUN = all)), ]

df

Output:

   fruit year price
4 banana 2021     9
5 banana 2020     7
6 banana 2019     5

As you can see no negative values for banana.

Data

df <- data.frame(fruit = c("apple", "apple", "apple", "banana", "banana", "banana", "orange", "orange", "orange"),
                 year = c(2021, 2020, 2019, 2021, 2020, 2019, 2021, 2020, 2019),
                 price = c(2, -9, 3, 9, 7, 5, 7, 2, -3))
  •  Tags:  
  • r
  • Related