how to manipulate column header strings in a dataframe-CodePudding

how to remove part of string "test_" in column headers. image the dataframe has many columns, so df.rename(columns={"test_Stock B":"Stock B"}) is not the solution i am looking for!


import pandas as pd

data = {'Stock A':[1, 1, 1, 1],
           'test_Stock B':[3, 3, 4, 4],
           'Stock C':[4, 4, 3, 2],
           'test_Stock D':[2, 2, 2, 3],
           }

df = pd.DataFrame(data)

# expect
data = {'Stock A':[1, 1, 1, 1],
           'Stock B':[3, 3, 4, 4],
           'Stock C':[4, 4, 3, 2],
           'Stock D':[2, 2, 2, 3],
           }

df_expacte = pd.DataFrame(data)

I expect all column headers only labeled as "Stock x" instead of "test_Stock x". Thank you for the ideas!

CodePudding user response：

You can redefine the columns via list comprehension with:

df.columns = [x.replace("test_","") for x in df]

This outputs:

   Stock A  Stock B  Stock C  Stock D
0        1        3        4        2
1        1        3        4        2
2        1        4        3        2
3        1        4        2        3

CodePudding user response：

You can clean your data before converting it to the dataframe using this code:

cleaned_data = {k.replace('test_', ''): v for k,v in data.items()}

CodePudding user response：

If need extract values Stock x use Series.str.extract:

#if need uppercase letter after Stock   space
df.columns = df.columns.str.extract('(Stock\s [A-Z]{1})', expand=False)
#if need any value after Stock   space
#df.columns = df.columns.str.extract('(Stock\s .*)', expand=False)
print (df)
   Stock A  Stock B  Stock C  Stock D
0        1        3        4        2
1        1        3        4        2
2        1        4        3        2
3        1        4        2        3

Or if need remove test_ use Series.str.replace:

df.columns = df.columns.str.replace('test_', '')

CodePudding user response：

import pandas as pd

data = {'Stock A':[1, 1, 1, 1],
           'test_Stock B':[3, 3, 4, 4],
           'Stock C':[4, 4, 3, 2],
           'test_Stock D':[2, 2, 2, 3],
           }

df = pd.DataFrame(data)

df.columns = [x.replace('test_','') for x in df.columns]

output :

print(df)
Out[9]: 
   Stock A  Stock B  Stock C  Stock D
0        1        3        4        2
1        1        3        4        2
2        1        4        3        2
3        1        4        2        3