I'm working on R, I would like to cut my column to have only the text between the 3rd and 4th comma.
Col1<- c("Sample1")
Col2 <- c("1A00318:268:H27G3DSX3:4:1101:20989:1047KJ758397.1.1794_U;tax=k:Eukaryota,d:Stramenopiles,p:Ochrophyta,c:Bacillariophyta,o:Bacillariophyta_X,f:Raphid-pennate,g:Raphid-pennate_X")
df <- data.frame(Col1, Col2)
Col1 | Col2 |
---|---|
Sample1 | 1A00318:268:H27G3DSX3:4:1101:20989:1047KJ758397.1.1794_U;tax=k:Eukaryota,d:Stramenopiles,p:Ochrophyta,c:Bacillariophyta,f:Raphid-pennate,g:Raphid-pennate_X |
With this table, I would like to have:
Col1 | Col2 |
---|---|
Sample1 | Bacillariophyta |
My dataset is really big, does anyone know how I can do this?
CodePudding user response:
You can use sapply to extract the 4th element with strsplit command.
df$Col3 <- sapply(df$Col2, function(x)unlist(strsplit(x, ","))[4])
df
# Col1
#1 Sample1
#Col2
#1 #1A00318:268:H27G3DSX3:4:1101:20989:1047KJ758397.1.1794_U;tax=k:Eukaryota,d:Stramenopiles,p:Ochrophyta,c:Bacillariophyta,o:Bacillariophyta_X,f:Raphid-pennate,g:Raphid-pennate_X
# Col3
#1 c:Bacillariophyta
CodePudding user response:
An alternative would be to use sub
:
sub("^(?:[^,] ,){3}([^,] ).*", "\\1", df$Col2) -> df$Col2
# Col1 Col2
# 1 Sample1 c:Bacillariophyta