Home > Mobile >  R - New Rowwise-Column Based on Condition of Grouped-Columns
R - New Rowwise-Column Based on Condition of Grouped-Columns

Time:10-29

I have a large dataframe (400,000 rows). I am trying to create a new column based on a condition. Grouping by Gene and Group before evaluating the condition is necessary, but the new column should be generated independently for each row (if condition is met).

The condition: two or more Gene-Group must have a Counts value >=10. For instance, Gene1-GroupB does not pass this condition, while both Gene2-GroupB and Gene3-GroupB pass. I haven't been able to figure out how to code this condition.

If the Gene-Group passes the condition, I want to create a new column with the value of (NascentCountsCPM / TotalCountsCPM) for each row. If the condition is not met, I want to return NA values. I believe a combination of dplyr mutate() and case_when() can be used, but I simply cannot wrap my head around how to code the described condition.

Thanks for any help!

#Starting DataFrame
df>
| Gene  | Group | Sample   | Counts | TotalCountsCPM | NascentCountsCPM |
|-------|-------|----------|--------|----------------|-----------------|
| Gene1 | B     | SampleB1 | 7      | 1.36           | 0               |
| Gene2 | B     | SampleB1 | 269    | 52.29          | 2.92            |
| Gene3 | B     | SampleB1 | 25     | 4.86           | 0.19            |
| Gene1 | B     | SampleB2 | 2      | 0.49           | 0               |
| Gene2 | B     | SampleB2 | 212    | 52.45          | 0.99            |
| Gene3 | B     | SampleB2 | 16     | 3.96           | 0               |
| Gene1 | B     | SampleB3 | 3      | 0.64           | 0               |
| Gene2 | B     | SampleB3 | 219    | 46.58          | 1.7             |
| Gene3 | B     | SampleB3 | 41     | 8.72           | 0.21            |

df_new <- df %>%
group_by(Gene, Group) %>%
mutate(CountsRatio = case_when((CONDITION) ~ (NascentCountsCPM / TotalCountsCPM),
TRUE ~ NA_real_)


>df_new
| Gene  | Group | Sample   | Counts | TotalCountsCPM | NascentCountsCPM | CountsRatio |
|-------|-------|----------|--------|----------------|------------------|-------------|
| Gene1 | B     | SampleB1 | 7      | 1.36           | 0                | NA         |
| Gene2 | B     | SampleB1 | 269    | 52.29          | 2.92             | 0.056       |
| Gene3 | B     | SampleB1 | 25     | 4.86           | 0.19             | 0.4         |
| Gene1 | B     | SampleB2 | 2      | 0.49           | 0                | NA         |
| Gene2 | B     | SampleB2 | 212    | 52.45          | 0.99             | 0.019       |
| Gene3 | B     | SampleB2 | 16     | 3.96           | 0                | 0           |
| Gene1 | B     | SampleB3 | 3      | 0.64           | 0                | NA         |
| Gene2 | B     | SampleB3 | 219    | 46.58          | 1.7              | 0.056       |
| Gene3 | B     | SampleB3 | 41     | 8.72           | 0.21             | 0.024       |


df <- data.frame(
  stringsAsFactors = FALSE,
              Gene = c("Gene1","Gene2","Gene3",
                       "Gene1","Gene2","Gene3","Gene1","Gene2","Gene3"),
             Group = c("B", "B", "B", "B", "B", "B", "B", "B", "B"),
            Sample = c("SampleB1","SampleB1",
                       "SampleB1","SampleB2","SampleB2","SampleB2","SampleB3",
                       "SampleB3","SampleB3"),
            Counts = c(7L, 269L, 25L, 2L, 212L, 16L, 3L, 219L, 41L),
    TotalCountsCPM = c(1.36, 52.29, 4.86, 0.49, 52.45, 3.96, 0.64, 46.58, 8.72),
  NascentCountsCPM = c(0, 2.92, 0.19, 0, 0.99, 0, 0, 1.7, 0.21)
)

CodePudding user response:

df %>%
  add_count(Gene, Group, wt = Counts > 10) %>%
  mutate(CountsRatio = if_else(n >= 2, NascentCountsCPM/TotalCountsCPM, NA_real_))
  • Related