Home > database >  Change name of columns deleting regex in pandas
Change name of columns deleting regex in pandas

Time:07-02

I have a dataframe like this:

import pandas as pd

data = pd.DataFrame({
    "k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Aeromonadales.f__Aeromonadaceae.g__Aeromonas.s__Aeromonas_dhakensis": [123, 1234, 543, 2133],
    "k__Bacteria.p__Firmicutes.c__Clostridia.o__Clostridiales.f__Lachnospiraceae.g__Faecalicatena.s__Faecalicatena_orotica": [543, 324, 234, 652]
})

Where each column has a that big name,like this:

k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Aeromonadales.f__Aeromonadaceae.g__Aeromonas.s__Aeromonas_dhakensis

And I would like to get the last part after .s__, I mean, I need to rename the columns like this:

enter image description here

Does anyone know how to do it?

CodePudding user response:

You can try split by s__ and get last part or with .str.extract

data = data.rename(columns=lambda col: col.split('s__')[1])
# or
data.columns = data.columns.str.extract('s__(.*)')[0]
print(data)

   Aeromonas_dhakensis  Faecalicatena_orotica
0                  123                    543
1                 1234                    324
2                  543                    234
3                 2133                    652

CodePudding user response:

You can use extract:

df.columns = df.columns.str.extract(r'\.s__(.*)$')[0]
  • Related