Home > Mobile >  How do I extract the first letter of a dataframe value based on another column?
How do I extract the first letter of a dataframe value based on another column?

Time:03-16

In the res.sig dataframe, for all Group1 respondents, I want to extract the first letter in the n_mutated_group1 column. For example, if the value for n_mutated_group1 is 0 of 25, I want to extract only 0. Eventually, I want to create a list x for scatter plot.

for (i in res.sig$Group1){
  for (j in res.sig$n_mutated_group1){
    if (i=="Responder"){
      x <- res.sig[i, j:= substr(j, 1, 1)]
      print(x)
    }
  }
}

Traceback:

Error in `[.data.table`(res.sig, i, `:=`(j, substr(j, 1, 1))) : 
  When i is a data.table (or character vector), the columns to join by must be specified using 'on=' argument (see ?data.table), by keying x (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing column names between x and i (i.e., a natural join). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM.

res.sig

> dput(res.sig)
structure(list(Hugo_Symbol = c("ERCC2", "ERCC2", "AKAP9", "AKAP9", 
"HERC1", "HERC1", "HECTD1", "HECTD1", "MACF1", "MACF1", "MROH2B", 
"MROH2B", "KMT2C", "KMT2C"), Group1 = c("Non-Responder", "Responder", 
"Non-Responder", "Responder", "Non-Responder", "Responder", "Non-Responder", 
"Responder", "Non-Responder", "Responder", "Non-Responder", "Responder", 
"Non-Responder", "Responder"), Group2 = c("Rest", "Rest", "Rest", 
"Rest", "Rest", "Rest", "Rest", "Rest", "Rest", "Rest", "Rest", 
"Rest", "Rest", "Rest"), n_mutated_group1 = c("0 of 25", "9 of 25", 
"0 of 25", "6 of 25", "0 of 25", "6 of 25", "0 of 25", "6 of 25", 
"0 of 25", "6 of 25", "0 of 25", "6 of 25", "1 of 25", "7 of 25"
), n_mutated_group2 = c("9 of 25", "0 of 25", "6 of 25", "0 of 25", 
"6 of 25", "0 of 25", "6 of 25", "0 of 25", "6 of 25", "0 of 25", 
"6 of 25", "0 of 25", "7 of 25", "1 of 25"), p_value = c(0.00163083541184905, 
0.00163083541184905, 0.022289766970618, 0.022289766970618, 0.022289766970618, 
0.022289766970618, 0.022289766970618, 0.022289766970618, 0.022289766970618, 
0.022289766970618, 0.022289766970618, 0.022289766970618, 0.0487971536957187, 
0.0487971536957187), OR = c(0, Inf, 0, Inf, 0, Inf, 0, Inf, 0, 
Inf, 0, Inf, 0.111488645279478, 8.96952328636894), OR_low = c(0, 
2.56647319276964, 0, 1.33358819424024, 0, 1.33358819424024, 0, 
1.33358819424024, 0, 1.33358819424024, 0, 1.33358819424024, 0.00228988507629356, 
1.0079479819766), OR_high = c(0.38963976043749, Inf, 0.749856668137133, 
Inf, 0.749856668137133, Inf, 0.749856668137133, Inf, 0.749856668137133, 
Inf, 0.749856668137133, Inf, 0.992114690322592, 436.703138665198
), fdr = c(0.109265972593886, 0.109265972593886, 0.248902397838568, 
0.248902397838568, 0.248902397838568, 0.248902397838568, 0.248902397838568, 
0.248902397838568, 0.248902397838568, 0.248902397838568, 0.248902397838568, 
0.248902397838568, 0.467058471087594, 0.467058471087594)), row.names = c(NA, 
-14L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x000002adab171ef0>)

CodePudding user response:

I'm not sure if I understand your question, but I believe the below works based on my understanding:

res.sig$n_mutated_group1 %>% map(~str_extract(.x,pattern="."))

CodePudding user response:

If you want to extract only the 1st character, you may use the substr() function from base:

ch_1st <- substr(res.sig$n_mutated_group1, start=1, stop=1)

This requires a bit of care, since "" will be transformed into "", and NA into NA --- and you didn't explain yet how you want to deal with those cases.

CodePudding user response:

Not entirely sure if I understood your question. If you want to obtain a vector of the first letters of n_mutated_group1 for (only) Responders in Group1, this would be a tidy-style approach (in addition to the solutions already posted):

library(dplyr)

res.sig %>%
    filter(Group1 == 'Responder') %>%
    mutate(first_letter = substr(n_mutated_group1, 1, 1)) %>%
    pull(first_letter)
  •  Tags:  
  • r
  • Related