Home > database >  Python Trimming a few column names but not all in a dataframe
Python Trimming a few column names but not all in a dataframe

Time:02-27

I have a dataframe of many columns. Now I am trimming a few columns to reduce the text length.

Code:

xdf = pd.DataFrame({'Column1':[10,25],'Column2':[10,25],'Fix_col':[10,25]})

## Rename `Column1` to `C1` and for `C2` as well
req_cols = ['Column1','Column2']

xdf[req_cols].columns = [x[0] y for name in xdf[req_cols].str.findall(r'([A-Za-z] )(\d )' for x,y in name]

Present solution:

print([x[0] y for name in xdf[req_cols].str.findall(r'([A-Za-z] )(\d )' for x,y in name])
['C1','C2']
print(xdf[req_cols].columns)
['Column1','Column2']

Column names did not change. Don't know why?

Expected Answer:

xdf.columns = ['C1','C2','Fix_col']

CodePudding user response:

You can use

import pandas as pd
import re

xdf = pd.DataFrame({'Column1':[10,25],'Column2':[10,25],'Fix_col':[10,25]})
req_cols = ['Column1','Column2']

xdf.rename(columns=lambda x : x if x not in req_cols else re.sub(r'^(\D?)\D*(\d*)', r'\1\2', x), inplace=True)

Output of xdf.columns:

Index(['C1', 'C2', 'Fix_col'], dtype='object')

See the regex demo. Details:

  • ^ - start of string
  • (\D?) - Group 1 (\1): an optional non-digit char
  • \D* - zero or more non-digit chars
  • (\d*) - Group 2 (\2): zero or more digits.
  • Related