I have a data frame of healthy and patient values. For healthy samples, the names start with HC followed by a number, like HC938 PBMC (the PBMC refers to a tissue type). For patient samples, names start with AS followed by a number, like AS345 SFMC (the SFMC denotes a different tissue type).
I am trying to encode a new variable called SAMPLE TYPE, where I can code the samples as either "HC" if they have HC in their sample name or "AS" if they do not. Similarly, I am trying to encode another variable called TISSUE TYPE, where if the sample names have PBMC in them, they're coded as "Blood" otherwise they will be coded as "SF".
Naturally, I turned to ifelse statements, however I think the code tries to capture the entire sample name and not part of it which is why I'm having trouble.
My code below:
#data structure
SAMPLE<- c("HC1374 PBMC","HC462 PBMC","AS234 SFMC","AS958 PBMC","HC73 PBMC","AS09 SFMC")
VALUES<- c(46,749,62,84,888,52)
data<- data.frame(SAMPLE,VALUES)
#adding sample type variable (coding the sample as HC or AS depending on whether the letters "HC" are in the sample name)
data$SAMPLETYPE<- ifelse(data$SAMPLE=="HC","HC","AS")
#adding tissue type variable (coding the sample as Blood or SF depending on whether the sample name contains the words PBMC or not)
data$TISSUETYPE<- ifelse(data$SAMPLE=="PBMC","Blood","SF")
This is what I get for my output below: output
Here, HC1374 should be coded as HC in the "SAMPLETYPE" variable, and Blood in the "TISSUETYPE", but it doesn't seem to have worked out.
Any tips would be appreciated! Many thanks!
CodePudding user response:
I think you want to use grepl to check if a string contains a pattern.
data$SAMPLETYPE<-ifelse(grepl("HC",data$SAMPLE),"HC","AS")
data$TISSUETYPE<- ifelse(grepl("PBMC",data$SAMPLE),"Blood","SF")