I wonder if it is possible to create a loop to remove strings in dataframe column. I have multiple dataframes which look like the structure below.
df = pd.DataFrame({
'xyz CODE': [1,2,3,3,4, 5,6,7,7,8],
'a': [4, 5, 3, 1, 2, 20, 10, 40, 50, 30],
'b': [20, 10, 40, 50, 30, 4, 5, 3, 1, 2],
'c': [25, 20, 5, 15, 10, 25, 20, 5, 15, 10] })
For each dataframe I want to remove string 'CODE' in the first column. I wrote the following
if __name__ == '__main__':
path = os.getcwd()
csv_files = glob.glob(os.path.join(path, "*.xlsx"))
dataframes_list = []
for file in csv_files:
dataframes_list.append(pd.read_excel(file))
for i in dataframes_list:
i.columns[0] = i.columns[0].replace('CODE', '')
print(i.columns[0])
i = dosomethingtoeachdf(i)
i.to_excel(f'{i.columns[0]}' '.xlsx')
I ran into an error TypeError: Index does not support mutable operations
. I know I'm missing some basics here, appreciate any help!
CodePudding user response:
Try to use DataFrame.rename
:
df = df.rename(columns={df.columns[0]: df.columns[0].replace(" CODE", "")})
print(df)
Prints:
xyz a b c
0 1 4 20 25
1 2 5 10 20
2 3 3 40 5
3 3 1 50 15
4 4 2 30 10
5 5 20 4 25
6 6 10 5 20
7 7 40 3 5
8 7 50 1 15
9 8 30 2 10
CodePudding user response:
df.columns = [column.replace(' CODE', '') if index == 0 else column for index, column in enumerate(df.columns)]
Or you can just apply the str replace method to all columns if you don't care about it impacting columns other than the first e.g.
df.columns = df.columns.str.replace(' CODE', '')