Home > Software design >  Make a new column with values according to other columns - in R
Make a new column with values according to other columns - in R

Time:06-21

Taking this dummy data for example

structure(list(Metastasis_Brain = c("1", "1", "0", "1", "0", 
"0"), Metastasis_Liver = c("0", "0", "1", "1", "1", "0"), Metastasis_Bone = c("1", 
"1", "0", "1", "1", "0")), class = "data.frame", row.names = c("Patient_1", 
"Patient_2", "Patient_3", "Patient_4", "Patient_5", "Patient_6"
))

Example of what I'm searching for: If there is 1 in columns Metastasis_Brain and Metastasis_Liver, the new column will contain "Brain, Liver".

If all three tissues are 1, then that row in the new column will contain "Brain, Liver, Bone". If all are 0, then it doesn't matter, NA would be fine.

CodePudding user response:

Using tidyverse:

library(tidyverse)
df  %>%
  rownames_to_column() %>%
  left_join(pivot_longer(.,-rowname, names_prefix = '.*_') %>%
  filter(value>0) %>%
  group_by(rowname) %>%
  summarise(nm = toString(name)))

   rowname Metastasis_Brain Metastasis_Liver Metastasis_Bone                 nm
1 Patient_1                1                0               1        Brain, Bone
2 Patient_2                1                0               1        Brain, Bone
3 Patient_3                0                1               0              Liver
4 Patient_4                1                1               1 Brain, Liver, Bone
5 Patient_5                0                1               1        Liver, Bone
6 Patient_6                0                0               0               <NA>

in Base R you could do:

aggregate(ind~rn, subset(transform(stack(df), 
         ind = sub('.*_', '', ind), rn = rownames(df)), values>0), toString)

         rn                ind
1 Patient_1        Brain, Bone
2 Patient_2        Brain, Bone
3 Patient_3              Liver
4 Patient_4 Brain, Liver, Bone
5 Patient_5        Liver, Bone

CodePudding user response:

base

df <- data.frame(
  stringsAsFactors = FALSE,
  row.names = c("Patient_1","Patient_2","Patient_3","Patient_4","Patient_5","Patient_6"),
  Metastasis_Brain = c("1", "1", "0", "1", "0", "0"),
  Metastasis_Liver = c("0", "0", "1", "1", "1", "0"),
  Metastasis_Bone = c("1", "1", "0", "1", "1", "0"),
  res = c("Brain, Bone","Brain, Bone",
          "Liver","Brain, Liver, Bone","Liver, Bone",NA)
)

df$res <- sapply(apply(df, 1, function(x) gsub("Metastasis_", "", names(df)[x == 1])), toString)            
df
#>           Metastasis_Brain Metastasis_Liver Metastasis_Bone                res
#> Patient_1                1                0               1        Brain, Bone
#> Patient_2                1                0               1        Brain, Bone
#> Patient_3                0                1               0              Liver
#> Patient_4                1                1               1 Brain, Liver, Bone
#> Patient_5                0                1               1        Liver, Bone
#> Patient_6                0                0               0                 NA

Created on 2022-06-20 by the reprex package (v2.0.1)

  • Related