Home > Software design >  pandas rename multiple columns using regex pattern
pandas rename multiple columns using regex pattern

Time:03-21

I have a dataframe like as shown below

ID,US-Test1,US-Test2,US-Test3
1,11,12,13
2,13,16,18
3,15,19,21

I would like to remove the keyword US - from all my column names

I tried the below but there should be better way to do this

newNames = {
    'US-Test1':'Test1',
    'US-Test2':'Test2'
}
df.rename(columns=newNames,inplace=True)

But my real data has 70 plus columns and this is not efficient.

Any regex approach to rename columns based on regex to exclude the pattern and retain only what I want?

I expect my output to be like as shown below

ID,Test1,Test2,Test3
1,11,12,13
2,13,16,18
3,15,19,21

CodePudding user response:

You could use a regex that matches the "US-" at the beginning like this:

df.columns = df.columns.str.replace("^US-", "", regex=True)

It replaces the matching "US-" with an empty string.

Also, if you know the columns that you want to transform you could apply slicing on their names to remove the first 3 characters:

df.columns = df.columns.str.slice(3)

Of course, this will affect columns that do not match your condition (i.e. do not begin with "US-")

CodePudding user response:

There are many ways to solve the problem. In addition to use of regular expression, one possible way is to use list comprehension to get new column names without first 3 characters, as follows:

import pandas as pd

df = pd.DataFrame({
    'US-Test1': [11, 13, 15],
    'US-Test2': [13, 16, 18],
    'US-Test3': [13, 18, 21]
})

new_columns = [col[3:] for col in df.columns]
print(new_columns)
# ['Test1', 'Test2', 'Test3']

df.columns = new_columns
print(df)

Output:

   Test1  Test2  Test3
0     11     13     13
1     13     16     18
2     15     18     21
  • Related