Home > Mobile >  Split string at a vertical bar character "|" in R
Split string at a vertical bar character "|" in R

Time:03-16

I feel like this question is ask a lot but all the solutions I found dont work for me either.

I have a dataframe with a column (called ID) in which I have a string of numbers and letters (e.g: Q8A203). In a few rows there are two of those constructs seperated by a vertical bar (e.g: Q8AA66|Q8AAT5). For my analysis it doesnt matter which one I keep so I wanted to make a new column named NewColumn in which I transfer the first and split the string at |.

I know that the vertical bar must be treated differently and that I have to put \ in front. I tried strsplit and unlist:

df$NewColumn <- strsplit(df$ID,split='\\|',fixed=TRUE)

df$NewColumn <- unlist(strsplit(df$ID, " \\| ", fixed=TRUE))

both options return the exact same content I have in column ID to the NewColumn.

I would very much appreciate the help.

CodePudding user response:

Rather than splitting you can simply substitute the second part with nothing and it will keep the first ID.

df <- data.frame(ID = c("Q8A203", "Q8AA66|Q8AAT5"))
df$NewColumn <- sub("\\|.*$","", df$ID, )
df  
#              ID NewColumn
# 1        Q8A203    Q8A203
# 2 Q8AA66|Q8AAT5    Q8AA66

Please next time, add an minimal reproductible example (your df here) to speed up answers ;)

strsplit can work if you remove the fixed option, but you need to provide an exact regex. Also, you will need to work with a list after, which is more complex.

# Working with a list
unlist(lapply(strsplit(df$ID, split='\\|'), "[[", 1))
  • Related