I have a dataframe:
import pandas as pd
data = [('s1', 's2'),
('s1', 's3'),
('s2', 's4'),
('s3', 's5'),
('s5', 's6')]
df = pd.DataFrame(data, columns=['start', 'end'])
----- ---
|start|end|
----- ---
| s1| s2|
| s1| s3|
| s2| s4|
| s3| s5|
| s5| s6|
----- ---
I want to see if the end column
has values from the start column
and write their values in a new end2 column
new_df = df
df = df.join(new_df , (df.start== new_df.end))
The result is something like this:
----- --- ----
|start|end|end2|
----- --- ----
| s1| s2| s4|
| s1| s3| s5|
| s2| s4|null|
| s3| s5| s6|
| s5| s6|null|
----- --- ----
Then I want to join again to see if end2
has values in start
and write their values from the end
column to the new end3
column. And so do the join until the last column is filled with all the None values
That is, an iterative join along the columns comes out (there are actually a lot more rows in my dataframe, so writing each join is not very good). But I don't understand how to do it. The result should be something like this
while df[-1].notnull():
df = df.join... #join ends columns
new_df[] = #add new column
----- --- ---- ---- ----
|start|end|end2|end3|end4|
----- --- ---- ---- ----
| s1| s2| s4|None|None|
| s1| s3| s5| s6|None|
| s2| s4|None|None|None|
| s3| s5| s6|None|None|
| s5| s6|None|None|None|
----- --- ---- ---- ----
CodePudding user response:
Try mapping the original dictionary to the end column, using start as the key, and end as the values:
df.assign(end2 = df['end'].map(dict(df.to_records(index=False))))
Output:
start end end2
0 s1 s2 s4
1 s1 s3 s5
2 s2 s4 NaN
3 s3 s5 s6
4 s5 s6 NaN
To create all possible columns, we can use a while loop:
i = 2
m = dict(df.to_records(index=False))
while df.iloc[:,-1].count() != 0:
df['end{}'.format(i)] = df.iloc[:,-1].map(m)
i = 1
Output:
start end end2 end3 end4
0 s1 s2 s4 NaN NaN
1 s1 s3 s5 s6 NaN
2 s2 s4 NaN NaN NaN
3 s3 s5 s6 NaN NaN
4 s5 s6 NaN NaN NaN