I have a table here I have columns with this type of string:
d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Paludibacteraceae;g__uncultured;s__uncultured_bacterium
I would like the columns to remain only with the name that follows after the "p__". For example, in the string above, I would like it to read: Bacteroidota. I have been using the following code to filter the last names, however, it does not filter the names after "p__".
nivel7_especie <- as.data.frame(read_csv("/Users/lorenzo/Documents/FIL - Lab ECyN/Proyecto FATZEIMER/Microbiota/Vegan_Diversity/Tablas/nivel7-especie_con_grupos.csv"))
# Le simplifico los nombres
colnames(nivel7_especie) <- gsub(colnames(nivel7_especie),pattern = '.*p__', replacement = "")
Thanks!
CodePudding user response:
If you are trying to rename the column(s) that start with p__
, then you can do this:
colnames(nivel7_especie) <- gsub("^p__","",colnames(nivel7_especie))
If you are trying to retain only the column that start with p__
, then you can do this:
nivel7_especie[,grepl("^p__",colnames(nivel7_especie)),drop=F]
CodePudding user response:
If I understand you correctly, you want to reduce certains strings, such as this one to only the alphanumeric string that follows "p__".
Data:
x <- "d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Paludibacteraceae;g__uncultured;s__uncultured_bacterium"
If this is correct you can do it by defining p__
as a positive lookbehind (?<=p__)
to match one or more alphanumeric characters \\w
occurring right after it:
library(tidyverse)
data.frame(x) %>%
mutate(p = str_extract(x, "(?<=p__)\\w "))
1 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Paludibacteraceae;g__uncultured;s__uncultured_bacterium
p
1 Bacteroidota