Home > Back-end >  how to drop level 0 in two dataframes using for loop in pandas
how to drop level 0 in two dataframes using for loop in pandas

Time:05-09

week and month are both dataframes, filtered from a Master based on based on month and week values like week 18 and "May" I have run pivot_table function on them, after which I want to rename level 1 column names and then drop the level 0 column names. both the dataframes week and month have gone through same operation the statement df = df.droplevel(0,axis=1) does not change the df in anyway, what m I doing wrong

for df in (week, month):
        
df.columns.set_levels(['a','b','c','d'],level=1,inplace=True)
        
df = df.droplevel(0,axis=1)
        

CodePudding user response:

You can't change a dataframe in a for loop like that, as you are actually just changing variable df each time, not week and month: see here. There are definitely ways of doing this, but do you need to do this in a loop?

Here is one way of completing this (I have changed df to x, and changed your list of dataframes to list of variable names as strings), by using globals():

for x in ["week", "month"]:
    globals()[x].columns.set_levels(['a','b','c','d'],level=1,inplace=True)
    globals()[x] = globals()[x].droplevel(0,axis=1)

CodePudding user response:

The problem is that .droplevel() is not in-place: You have to assign the modified dataframe back to df which then has no relation to the original variable anymore (apart from a common history - which is irrelevant).

To avoid that you could pack the dataframes in a list, iterate over it to do the stuff, pack the modified dataframe back in the list at the same place, and destructure the list afterwards accordingly:

dfs = [week, month]
for i, df in enumerate(dfs):
    df.columns.set_levels(['a', 'b', 'c', 'd'], level=1, inplace=True)    
    df = df.droplevel(0, axis=1)
    dfs[i] = df
week, month = dfs

This is okay for a small number of variables, especially when you have a lot of stuff to do. It might be overkill here: more lines of code than doing it directly.

If you have a lot of dataframes to handle, then it wouldn't make a lot of sense to manage that with individual variables. You'd use a list or dictionary etc. to organise them, and the issue would resolve itself.

  • Related