Home > OS >  Logical argument applied to multiple cells in a row using dplyr
Logical argument applied to multiple cells in a row using dplyr

Time:07-21

I have a data frame with 7 columns and lots of rows e.g.

structure(list(Suggested.Symbol = c("CCT4", "DHRS2", "PMS2", 
"FARSB", "RPL31", "ASNS"), p = c(0.0228515901406638, 0.0334943667503674, 
0.0380265628484489, 0.0479201571393373, 0.052163360517758, 0.0536304612182764
), p.10 = c(0.000166442958356447, 0.000401441243282832, 0.000537687151637518, 
0.000915758490675558, 0.00111333295283486, 0.00118675736050892
), p_onset = c(0.9378, 0.5983, 7.674e-10, 0.09781, 0.5495, 0.7841
), p_dc14 = c(0.3975, 0.3707, 6.117e-17, 0.2975, 0.4443, 0.7661
), p_tfc6 = c(0.2078, 0.896, 7.388e-19, 0.5896, 0.3043, 0.6696
), p_tms30 = c(0.5724, 0.3409, 4.594e-13, 0.2403, 0.1357, 0.3422
)), row.names = c(NA, 6L), class = "data.frame")

I'd like to make a new column called 'summary' to give the value 'significant' when the value of "p_onset", "p_dc14", "p_tfc6" or "p_tms30" are <0.05.

How can I do this with dplyr?

CodePudding user response:

What you need is if_any(), where you can pass a <tidy-select> clause to identify multiple columns. It returns a logical vector so that you can further pass it into ifelse().

df %>%
  mutate(summary = ifelse(if_any(starts_with("p_"), `<`, 0.05), 'significant', 'no'))

#   Suggested.Symbol          p         p.10   p_onset    p_dc14    p_tfc6   p_tms30     summary
# 1             CCT4 0.02285159 0.0001664430 9.378e-01 3.975e-01 2.078e-01 5.724e-01          no
# 2            DHRS2 0.03349437 0.0004014412 5.983e-01 3.707e-01 8.960e-01 3.409e-01          no
# 3             PMS2 0.03802656 0.0005376872 7.674e-10 6.117e-17 7.388e-19 4.594e-13 significant
# 4            FARSB 0.04792016 0.0009157585 9.781e-02 2.975e-01 5.896e-01 2.403e-01          no
# 5            RPL31 0.05216336 0.0011133330 5.495e-01 4.443e-01 3.043e-01 1.357e-01          no
# 6             ASNS 0.05363046 0.0011867574 7.841e-01 7.661e-01 6.696e-01 3.422e-01          no

CodePudding user response:

Another possible solution:

library(dplyr)

df %>% 
  mutate(new = if_else(rowSums(across(p_onset:p_tms30) < 0.05) == 4,
    "significant", NA_character_))

#>   Suggested.Symbol          p         p.10   p_onset    p_dc14    p_tfc6
#> 1             CCT4 0.02285159 0.0001664430 9.378e-01 3.975e-01 2.078e-01
#> 2            DHRS2 0.03349437 0.0004014412 5.983e-01 3.707e-01 8.960e-01
#> 3             PMS2 0.03802656 0.0005376872 7.674e-10 6.117e-17 7.388e-19
#> 4            FARSB 0.04792016 0.0009157585 9.781e-02 2.975e-01 5.896e-01
#> 5            RPL31 0.05216336 0.0011133330 5.495e-01 4.443e-01 3.043e-01
#> 6             ASNS 0.05363046 0.0011867574 7.841e-01 7.661e-01 6.696e-01
#>     p_tms30         new
#> 1 5.724e-01        <NA>
#> 2 3.409e-01        <NA>
#> 3 4.594e-13 significant
#> 4 2.403e-01        <NA>
#> 5 1.357e-01        <NA>
#> 6 3.422e-01        <NA>
  • Related