how to remove part of string "test_" in column headers. image the dataframe has many columns, so df.rename(columns={"test_Stock B":"Stock B"}) is not the solution i am looking for!
import pandas as pd
data = {'Stock A':[1, 1, 1, 1],
'test_Stock B':[3, 3, 4, 4],
'Stock C':[4, 4, 3, 2],
'test_Stock D':[2, 2, 2, 3],
}
df = pd.DataFrame(data)
# expect
data = {'Stock A':[1, 1, 1, 1],
'Stock B':[3, 3, 4, 4],
'Stock C':[4, 4, 3, 2],
'Stock D':[2, 2, 2, 3],
}
df_expacte = pd.DataFrame(data)
I expect all column headers only labeled as "Stock x" instead of "test_Stock x". Thank you for the ideas!
CodePudding user response:
You can redefine the columns via list comprehension with:
df.columns = [x.replace("test_","") for x in df]
This outputs:
Stock A Stock B Stock C Stock D
0 1 3 4 2
1 1 3 4 2
2 1 4 3 2
3 1 4 2 3
CodePudding user response:
You can clean your data before converting it to the dataframe using this code:
cleaned_data = {k.replace('test_', ''): v for k,v in data.items()}
CodePudding user response:
If need extract values Stock x
use Series.str.extract
:
#if need uppercase letter after Stock space
df.columns = df.columns.str.extract('(Stock\s [A-Z]{1})', expand=False)
#if need any value after Stock space
#df.columns = df.columns.str.extract('(Stock\s .*)', expand=False)
print (df)
Stock A Stock B Stock C Stock D
0 1 3 4 2
1 1 3 4 2
2 1 4 3 2
3 1 4 2 3
Or if need remove test_
use Series.str.replace
:
df.columns = df.columns.str.replace('test_', '')
CodePudding user response:
import pandas as pd
data = {'Stock A':[1, 1, 1, 1],
'test_Stock B':[3, 3, 4, 4],
'Stock C':[4, 4, 3, 2],
'test_Stock D':[2, 2, 2, 3],
}
df = pd.DataFrame(data)
df.columns = [x.replace('test_','') for x in df.columns]
output :
print(df)
Out[9]:
Stock A Stock B Stock C Stock D
0 1 3 4 2
1 1 3 4 2
2 1 4 3 2
3 1 4 2 3