Home > front end >  Iteratively merge panda columns with new column names
Iteratively merge panda columns with new column names

Time:10-05

Assume I am merging a panda data frame iteratively in a loop but after two or three iterations panda repeat the column name for example consider the following example where I am merging the columns iteratively but without loop for simplicity:

A= {'Name':['A','B','C'],'GPA':[4.0,3.80,3.70], 'School':['U','U','U'], 'Time':[22,26,30]}
A1 = pd.DataFrame(A)
B= {'Name':['D','E','F'],'GPA':[3.50,3.70,3.60], 'School':['S','S','S'],'Time':[34,44,54]}
B1 = pd.DataFrame(B)
C= {'Name':['G','H','I'],'GPA':[3.70,3.50,3.70], 'School':['C','C','C'],'Time':[76,86,96]}
C1 = pd.DataFrame(C)
L= [A1,B1,C1]
comb = A1
for ii in L[1:]:
    comb = pd.concat([comb,ii],ignore_index=True)
comb

enter image description here

B = pd.merge(comb, comb, on=['Name','GPA'])
C = pd.merge(B, comb, on=['Name','GPA'])
D = pd.merge(C, comb, on=['Name','GPA'])

enter image description here

You see Panda is repeating the School_x and School_y name twice, is there anyway to change it to School_x and School_y, School_z and School_t. I am not talking about renaming it afterward but forcing the merge to choose new column names for columns that are not the same. Otherwise how one can distinguish the data frames with 1000 columns and imagine 500 have the same column names.

Update: Above was just an example assume you are merging the multiple data frames in loop like this:

  for ii in list:
      df  = df.merge(A,on = 'some column', how = 'outer')

Then how do you change the column name iteratively it seems to me every time the same columns will be repeated even with the suffix.

CodePudding user response:

Try changing the suffixes argument to a tuple of ('_z', '_t'):

B = pd.merge(comb, comb, on=['Name','GPA'])
C = pd.merge(B, comb, on=['Name','GPA'])
D = pd.merge(C, comb, on=['Name','GPA'], suffixes=('_z', '_t'))

>>> D
  Name  GPA School_x  Time_x School_y  Time_y School_z  Time_z School_t  Time_t
0    A  4.0        U      22        U      22        U      22        U      22
1    B  3.8        U      26        U      26        U      26        U      26
2    C  3.7        U      30        U      30        U      30        U      30
3    D  3.5        S      34        S      34        S      34        S      34
4    E  3.7        S      44        S      44        S      44        S      44
5    F  3.6        S      54        S      54        S      54        S      54
6    G  3.7        C      76        C      76        C      76        C      76
7    H  3.5        C      86        C      86        C      86        C      86
8    I  3.7        C      96        C      96        C      96        C      96
>>> 

As in the pd.merge documentation:

Parameters:
...
...

suffixes: list-like, default is (“_x”, “_y”)

A length-2 sequence where each element is optionally a string indicating the suffix to add to overlapping column names in left and right respectively. Pass a value of None instead of a string to indicate that the column name from left or right should be left as-is, with no suffix. At least one of the values must not be None.

... ...


Edit:

For the recent update on the question, try creating an iterator and with next.

Much better with functools.reduce:

from functools import reduce
from string import ascii_lowercase
it = iter(ascii_lowercase)
print(reduce(lambda x, y: pd.merge(x, y, on=['Name','GPA'], suffixes=('_'   next(it), '_'   next(it))), [comb for _ in range(4)]))

Output:

  Name  GPA School_a  Time_a School_b  Time_b School_e  Time_e School_f  Time_f
0    A  4.0        U      22        U      22        U      22        U      22
1    B  3.8        U      26        U      26        U      26        U      26
2    C  3.7        U      30        U      30        U      30        U      30
3    D  3.5        S      34        S      34        S      34        S      34
4    E  3.7        S      44        S      44        S      44        S      44
5    F  3.6        S      54        S      54        S      54        S      54
6    G  3.7        C      76        C      76        C      76        C      76
7    H  3.5        C      86        C      86        C      86        C      86
8    I  3.7        C      96        C      96        C      96        C      96

As you can see I create a list comprehension with [comb for _ in range(4)], that would loop and merge 4 times, to change the number of times just change the number i.e. [comb for _ in range(10)].

For a function:

from functools import reduce
from string import ascii_lowercase
def cumulative_merge(df, n):
    it = iter(ascii_lowercase)
    return reduce(lambda x, y: pd.merge(x, y, on=['Name','GPA'], suffixes=('_'   next(it), '_'   next(it))), [comb for _ in range(n)])

Execution:

print(cumulative_merge(df, 4))

Output:

  Name  GPA School_a  Time_a School_b  Time_b School_e  Time_e School_f  Time_f
0    A  4.0        U      22        U      22        U      22        U      22
1    B  3.8        U      26        U      26        U      26        U      26
2    C  3.7        U      30        U      30        U      30        U      30
3    D  3.5        S      34        S      34        S      34        S      34
4    E  3.7        S      44        S      44        S      44        S      44
5    F  3.6        S      54        S      54        S      54        S      54
6    G  3.7        C      76        C      76        C      76        C      76
7    H  3.5        C      86        C      86        C      86        C      86
8    I  3.7        C      96        C      96        C      96        C      96
  • Related