I have a dataframe of many columns. Now I am trimming a few columns to reduce the text length.
Code:
xdf = pd.DataFrame({'Column1':[10,25],'Column2':[10,25],'Fix_col':[10,25]})
## Rename `Column1` to `C1` and for `C2` as well
req_cols = ['Column1','Column2']
xdf[req_cols].columns = [x[0] y for name in xdf[req_cols].str.findall(r'([A-Za-z] )(\d )' for x,y in name]
Present solution:
print([x[0] y for name in xdf[req_cols].str.findall(r'([A-Za-z] )(\d )' for x,y in name])
['C1','C2']
print(xdf[req_cols].columns)
['Column1','Column2']
Column names did not change. Don't know why?
Expected Answer:
xdf.columns = ['C1','C2','Fix_col']
CodePudding user response:
You can use
import pandas as pd
import re
xdf = pd.DataFrame({'Column1':[10,25],'Column2':[10,25],'Fix_col':[10,25]})
req_cols = ['Column1','Column2']
xdf.rename(columns=lambda x : x if x not in req_cols else re.sub(r'^(\D?)\D*(\d*)', r'\1\2', x), inplace=True)
Output of xdf.columns
:
Index(['C1', 'C2', 'Fix_col'], dtype='object')
See the regex demo. Details:
^
- start of string(\D?)
- Group 1 (\1
): an optional non-digit char\D*
- zero or more non-digit chars(\d*)
- Group 2 (\2
): zero or more digits.