Home > OS >  R subset rows that are NAs except where row contains a specific string
R subset rows that are NAs except where row contains a specific string

Time:10-31

I am trying to understand how a specific case, in this example "C7" stands against the rest of the population. I am doing a boxplot to visualize this. I have a dataframe in R, with the following columns:

gene    case    log2fc      symbol
g1        c1    0.236291026 GG
g2        c2    0.073854478 GG
g3        c6    0.722921499 GG
g4        c7    0           GG
g5        c8    0.925691334 GG
g1        c3    0.412097286 HH
g2        c4    0.98899995  HH
g3        c5    0.494138717 HH
g4        c7    0.996523937 HH
g5        c9    0           HH

I would like to remove rows that are 0 except for this specific case, C7 and then do the boxplot. So far, I managed to convert the 0's to NAs and remove rows from the entire dataframe. But, I am not sure how I can remove rows conditionally.

df[df == 0] <- NA

CodePudding user response:

In base R:

df[df$log2fc != 0 | df$case == 'c7',]

Tidyverse solution:

df %>%
  filter(log2fc != 0 | case == 'c7')

CodePudding user response:

df %>% 
  filter(log2fc!=0 | case=="c7")
 gene case     log2fc symbol
1   g1   c1 0.23629103     GG
2   g2   c2 0.07385448     GG
3   g3   c6 0.72292150     GG
4   g4   c7 0.00000000     GG
5   g5   c8 0.92569133     GG
6   g1   c3 0.41209729     HH
7   g2   c4 0.98899995     HH
8   g3   c5 0.49413872     HH
9   g4   c7 0.99652394     HH
  • Related