Home > database >  Loop to iterate join over columns in pandas
Loop to iterate join over columns in pandas

Time:11-26

I have a dataframe:

import pandas as pd
data = [('s1', 's2'),
       ('s1', 's3'),
       ('s2', 's4'),
       ('s3', 's5'),
       ('s5', 's6')]

df = pd.DataFrame(data, columns=['start', 'end'])
 ----- --- 
|start|end|
 ----- --- 
|   s1| s2|
|   s1| s3|
|   s2| s4|
|   s3| s5|
|   s5| s6|
 ----- --- 

I want to see if the end column has values from the start column and write their values in a new end2 column

new_df = df
df = df.join(new_df , (df.start== new_df.end))

The result is something like this:

 ----- --- ---- 
|start|end|end2|
 ----- --- ---- 
|   s1| s2|  s4|
|   s1| s3|  s5|
|   s2| s4|null|
|   s3| s5|  s6|
|   s5| s6|null|
 ----- --- ---- 

Then I want to join again to see if end2 has values in start and write their values from the end column to the new end3 column. And so do the join until the last column is filled with all the None values

That is, an iterative join along the columns comes out (there are actually a lot more rows in my dataframe, so writing each join is not very good). But I don't understand how to do it. The result should be something like this

while df[-1].notnull():
     df = df.join... #join ends columns
     new_df[] =  #add new column


 ----- --- ---- ---- ---- 
|start|end|end2|end3|end4|
 ----- --- ---- ---- ---- 
|   s1| s2|  s4|None|None|
|   s1| s3|  s5|  s6|None|
|   s2| s4|None|None|None|
|   s3| s5|  s6|None|None|
|   s5| s6|None|None|None|
 ----- --- ---- ---- ---- 

CodePudding user response:

Try mapping the original dictionary to the end column, using start as the key, and end as the values:

df.assign(end2 = df['end'].map(dict(df.to_records(index=False))))

Output:

  start end end2
0    s1  s2   s4
1    s1  s3   s5
2    s2  s4  NaN
3    s3  s5   s6
4    s5  s6  NaN

To create all possible columns, we can use a while loop:

i = 2
m = dict(df.to_records(index=False))
while df.iloc[:,-1].count() != 0:
    df['end{}'.format(i)] =  df.iloc[:,-1].map(m)
    i  = 1

Output:

  start end end2 end3 end4
0    s1  s2   s4  NaN  NaN
1    s1  s3   s5   s6  NaN
2    s2  s4  NaN  NaN  NaN
3    s3  s5   s6  NaN  NaN
4    s5  s6  NaN  NaN  NaN
  • Related