I have a dataframe like as shown below
ID,US-Test1,US-Test2,US-Test3
1,11,12,13
2,13,16,18
3,15,19,21
I would like to remove the keyword US -
from all my column names
I tried the below but there should be better way to do this
newNames = {
'US-Test1':'Test1',
'US-Test2':'Test2'
}
df.rename(columns=newNames,inplace=True)
But my real data has 70 plus columns and this is not efficient.
Any regex approach to rename columns based on regex to exclude the pattern and retain only what I want?
I expect my output to be like as shown below
ID,Test1,Test2,Test3
1,11,12,13
2,13,16,18
3,15,19,21
CodePudding user response:
You could use a regex that matches the "US-" at the beginning like this:
df.columns = df.columns.str.replace("^US-", "", regex=True)
It replaces the matching "US-" with an empty string.
Also, if you know the columns that you want to transform you could apply slicing on their names to remove the first 3 characters:
df.columns = df.columns.str.slice(3)
Of course, this will affect columns that do not match your condition (i.e. do not begin with "US-")
CodePudding user response:
There are many ways to solve the problem. In addition to use of regular expression, one possible way is to use list comprehension to get new column names without first 3 characters, as follows:
import pandas as pd
df = pd.DataFrame({
'US-Test1': [11, 13, 15],
'US-Test2': [13, 16, 18],
'US-Test3': [13, 18, 21]
})
new_columns = [col[3:] for col in df.columns]
print(new_columns)
# ['Test1', 'Test2', 'Test3']
df.columns = new_columns
print(df)
Output:
Test1 Test2 Test3
0 11 13 13
1 13 16 18
2 15 18 21