I feel like this question is ask a lot but all the solutions I found dont work for me either.
I have a dataframe with a column (called ID) in which I have a string of numbers and letters (e.g: Q8A203). In a few rows there are two of those constructs seperated by a vertical bar (e.g: Q8AA66|Q8AAT5). For my analysis it doesnt matter which one I keep so I wanted to make a new column named NewColumn in which I transfer the first and split the string at |.
I know that the vertical bar must be treated differently and that I have to put \ in front. I tried strsplit and unlist:
df$NewColumn <- strsplit(df$ID,split='\\|',fixed=TRUE)
df$NewColumn <- unlist(strsplit(df$ID, " \\| ", fixed=TRUE))
both options return the exact same content I have in column ID to the NewColumn.
I would very much appreciate the help.
CodePudding user response:
Rather than splitting you can simply substitute the second part with nothing and it will keep the first ID.
df <- data.frame(ID = c("Q8A203", "Q8AA66|Q8AAT5"))
df$NewColumn <- sub("\\|.*$","", df$ID, )
df
# ID NewColumn
# 1 Q8A203 Q8A203
# 2 Q8AA66|Q8AAT5 Q8AA66
Please next time, add an minimal reproductible example (your df
here) to speed up answers ;)
strsplit can work if you remove the fixed option, but you need to provide an exact regex. Also, you will need to work with a list after, which is more complex.
# Working with a list
unlist(lapply(strsplit(df$ID, split='\\|'), "[[", 1))