Home > Software engineering >  Loop to remove string in selected dataframe column header
Loop to remove string in selected dataframe column header

Time:11-18

I wonder if it is possible to create a loop to remove strings in dataframe column. I have multiple dataframes which look like the structure below.

df = pd.DataFrame({
    'xyz CODE': [1,2,3,3,4, 5,6,7,7,8],
    'a': [4, 5, 3, 1, 2, 20, 10, 40, 50, 30],
    'b': [20, 10, 40, 50, 30, 4, 5, 3, 1, 2],
    'c': [25, 20, 5, 15, 10, 25, 20, 5, 15, 10] })

For each dataframe I want to remove string 'CODE' in the first column. I wrote the following

if __name__ == '__main__':
    path = os.getcwd()
    csv_files = glob.glob(os.path.join(path, "*.xlsx"))
    dataframes_list = []
    for file in csv_files:
        dataframes_list.append(pd.read_excel(file))

    for i in dataframes_list:
        i.columns[0] = i.columns[0].replace('CODE', '')
        print(i.columns[0])
        i = dosomethingtoeachdf(i)
        i.to_excel(f'{i.columns[0]}'   '.xlsx')

I ran into an error TypeError: Index does not support mutable operations. I know I'm missing some basics here, appreciate any help!

CodePudding user response:

Try to use DataFrame.rename:

df = df.rename(columns={df.columns[0]: df.columns[0].replace(" CODE", "")})
print(df)

Prints:

   xyz   a   b   c
0    1   4  20  25
1    2   5  10  20
2    3   3  40   5
3    3   1  50  15
4    4   2  30  10
5    5  20   4  25
6    6  10   5  20
7    7  40   3   5
8    7  50   1  15
9    8  30   2  10

CodePudding user response:

df.columns = [column.replace(' CODE', '') if index == 0 else column for index, column in enumerate(df.columns)]

Or you can just apply the str replace method to all columns if you don't care about it impacting columns other than the first e.g.

df.columns = df.columns.str.replace(' CODE', '')
  • Related