Home > Enterprise >  Removing 0s from a variable only if not present in the entire df r
Removing 0s from a variable only if not present in the entire df r

Time:07-30

I have the following test df :

df1 <- data.frame(site = c('1' , '1' , '1' , '1' , '2' , '2' , 
                           '2' , '2' , '3' , '3' , '3' , '3') , 
                  species = c('A' , 'B' , 'C' , 'D' , 'A' , 'B' , 
                              'C' , 'D' , 'A' , 'B' , 'C' , 'D') , 
                  value = c('1' , '0' , '0' , '4' , '0' , '0' , 
                            '3' , '4' , '0' , '0' , '0' , '1')) 

I need to filter out species only if they have a value of 0 for every site. I need to leave species and 0s if they have at least one value >= 1 in at least one site.

A tidyverse method is preferred.

CodePudding user response:

You can try this (with suggestion from benson23)

library(dplyr)

df1 %>% 
  group_by(species) %>% 
  filter(!all(value == "0"))
# A tibble: 9 × 3
# Groups:   species [3]
  site  species value
  <chr> <chr>   <chr>
1 1     A       1    
2 1     C       0    
3 1     D       4    
4 2     A       0    
5 2     C       3    
6 2     D       4    
7 3     A       0    
8 3     C       0    
9 3     D       1

CodePudding user response:

Your value column is a factor class, so we need to compare their numeric value to zero before filtering:

library(dplyr)

df1 %>% 
  group_by(species) %>%
  filter(any(as.numeric(as.character(value)) >= 1))

  # # A tibble: 9 x 3
  # # Groups:   species [3]
  # site  species value
  # <fct> <fct>   <fct>
  # 1 1     A       1    
  # 2 1     C       0    
  # 3 1     D       4    
  # 4 2     A       0    
  # 5 2     C       3    
  # 6 2     D       4    
  # 7 3     A       0    
  # 8 3     C       0    
  # 9 3     D       1   

CodePudding user response:

dplyr using any with filter:

df1 <- data.frame(site = c('1' , '1' , '1' , '1' , '2' , '2' , 
                           '2' , '2' , '3' , '3' , '3' , '3') , 
                  species = c('A' , 'B' , 'C' , 'D' , 'A' , 'B' , 
                              'C' , 'D' , 'A' , 'B' , 'C' , 'D') , 
                  value = c('1' , '0' , '0' , '4' , '0' , '0' , 
                            '3' , '4' , '0' , '0' , '0' , '1'))

library(dplyr)
df1 %>%
  group_by(species) %>%
  filter(any(value != 0))
#> # A tibble: 9 × 3
#> # Groups:   species [3]
#>   site  species value
#>   <chr> <chr>   <chr>
#> 1 1     A       1    
#> 2 1     C       0    
#> 3 1     D       4    
#> 4 2     A       0    
#> 5 2     C       3    
#> 6 2     D       4    
#> 7 3     A       0    
#> 8 3     C       0    
#> 9 3     D       1

Created on 2022-07-29 by the reprex package (v2.0.1)

base R option:

df1 <- data.frame(site = c('1' , '1' , '1' , '1' , '2' , '2' , 
                           '2' , '2' , '3' , '3' , '3' , '3') , 
                  species = c('A' , 'B' , 'C' , 'D' , 'A' , 'B' , 
                              'C' , 'D' , 'A' , 'B' , 'C' , 'D') , 
                  value = c('1' , '0' , '0' , '4' , '0' , '0' , 
                            '3' , '4' , '0' , '0' , '0' , '1'))

subset(df1, ave(value != 0, species, FUN = any))
#>    site species value
#> 1     1       A     1
#> 3     1       C     0
#> 4     1       D     4
#> 5     2       A     0
#> 7     2       C     3
#> 8     2       D     4
#> 9     3       A     0
#> 11    3       C     0
#> 12    3       D     1

Created on 2022-07-29 by the reprex package (v2.0.1)

CodePudding user response:

Using base R with %in% - subset the 'species' where 'value' is not equal to 0, then create the logical expression with 'species' from the entire dataset on the species subset

subset(df1, species %in% species[value != 0])
   site species value
1     1       A     1
3     1       C     0
4     1       D     4
5     2       A     0
7     2       C     3
8     2       D     4
9     3       A     0
11    3       C     0
12    3       D     1

Or the same approach with dplyr filter

library(dplyr)
df1 %>%
    filter(species %in% species[value != 0])
 site species value
1    1       A     1
2    1       C     0
3    1       D     4
4    2       A     0
5    2       C     3
6    2       D     4
7    3       A     0
8    3       C     0
9    3       D     1

CodePudding user response:

Filter all rows where the sum of the values in each group is not equal to 0, e.g.

library(dplyr)
df1 %>% 
  group_by(species) %>% 
  filter(sum(as.numeric(value)) != 0)
  • Related