Home > database >  rename column name index according to list placement with multiple duplicate names
rename column name index according to list placement with multiple duplicate names

Time:11-02

I just asked a similar question rename columns according to list which has a correct answer for how to add suffixes to column names correctly. But i have a new issue. I want to rename the actual index name for the columns per dataframe. I have three lists of data frames (some of the data frames contain duplicate column index names (and actual data frame names as well - but thats not the issue, the issue is the duplicated original column.names). I simply want to append a suffix to each dataframe.column.name within each list, with a name in the suffix list, based on its numeric order.

here is an example of the data and the output i would like:

# add string to end of x in list of dfs

df1, df2, df3, df4 = (pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=('a', 'b')), 
                      pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=('c', 'd')),
                      pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=('e', 'f')),
                      pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=('g', 'h')))

df1.columns.name = 'abc'
df2.columns.name = 'abc'
df3.columns.name = 'efg'
df4.columns.name = 'abc'

cat_a = [df2, df1]
cat_b = [df3, df2, df1]
cat_c = [df1]


dfs = [cat_a, cat_b, cat_c]
suffix = ['group1', 'group2', 'group3']

# expected output = 
#for df in cat_a: df.columns.name = df.columns.name   'group1'
#for df in cat_b: df.columns.name = df.columns.name   'group2'   
#for df in cat_c: df.columns.name = df.columns.name   'group3' 

and here is some code that i have written that doesn't work - where df.column.names are duplicated across data frames, multiple suffixes are appended

for x, df in enumerate(dfs):
    for i in df:
        n = ([(i.columns.name   '_'   str(suffix[x])) for out in i.columns.name])
        i.columns.name=n[x]

thank you for looking, i really appreciate it

CodePudding user response:

Your current code is not working as you have multiple references to the same df in your lists, so only the last change matters. You need to make copies.

Assuming you want to change the columns index name for each df in dfs, you can use a list comprehension:

dfs = [[d.rename_axis(suffix[i], axis=1) for d in group]
       for i,group in enumerate(dfs)]

output:

>>> dfs[0][0]
group1  c  d
0       5  0
1       9  3
2       3  9
3       4  2
4       1  0
5       7  6
6       5  2
7       8  0
8       1  2
9       7  2
  • Related