I have a dataframe like this:
import pandas as pd
data = pd.DataFrame({
"k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Aeromonadales.f__Aeromonadaceae.g__Aeromonas.s__Aeromonas_dhakensis": [123, 1234, 543, 2133],
"k__Bacteria.p__Firmicutes.c__Clostridia.o__Clostridiales.f__Lachnospiraceae.g__Faecalicatena.s__Faecalicatena_orotica": [543, 324, 234, 652]
})
Where each column has a that big name,like this:
k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Aeromonadales.f__Aeromonadaceae.g__Aeromonas.s__Aeromonas_dhakensis
And I would like to get the last part after .s__
, I mean, I need to rename the columns like this:
Does anyone know how to do it?
CodePudding user response:
You can try split by s__
and get last part or with .str.extract
data = data.rename(columns=lambda col: col.split('s__')[1])
# or
data.columns = data.columns.str.extract('s__(.*)')[0]
print(data)
Aeromonas_dhakensis Faecalicatena_orotica
0 123 543
1 1234 324
2 543 234
3 2133 652
CodePudding user response:
You can use extract
:
df.columns = df.columns.str.extract(r'\.s__(.*)$')[0]