pandas rename multiple columns using regex pattern-CodePudding

I have a dataframe like as shown below

ID,US-Test1,US-Test2,US-Test3
1,11,12,13
2,13,16,18
3,15,19,21

I would like to remove the keyword US - from all my column names

I tried the below but there should be better way to do this

newNames = {
    'US-Test1':'Test1',
    'US-Test2':'Test2'
}
df.rename(columns=newNames,inplace=True)

But my real data has 70 plus columns and this is not efficient.

Any regex approach to rename columns based on regex to exclude the pattern and retain only what I want?

I expect my output to be like as shown below

ID,Test1,Test2,Test3
1,11,12,13
2,13,16,18
3,15,19,21

CodePudding user response：

You could use a regex that matches the "US-" at the beginning like this:

df.columns = df.columns.str.replace("^US-", "", regex=True)

It replaces the matching "US-" with an empty string.

Also, if you know the columns that you want to transform you could apply slicing on their names to remove the first 3 characters:

df.columns = df.columns.str.slice(3)

Of course, this will affect columns that do not match your condition (i.e. do not begin with "US-")

CodePudding user response：

There are many ways to solve the problem. In addition to use of regular expression, one possible way is to use list comprehension to get new column names without first 3 characters, as follows:

import pandas as pd

df = pd.DataFrame({
    'US-Test1': [11, 13, 15],
    'US-Test2': [13, 16, 18],
    'US-Test3': [13, 18, 21]
})

new_columns = [col[3:] for col in df.columns]
print(new_columns)
# ['Test1', 'Test2', 'Test3']

df.columns = new_columns
print(df)

Output:

   Test1  Test2  Test3
0     11     13     13
1     13     16     18
2     15     18     21