I have a data frame that looks like this:
df = pd.DataFrame({'id':[1,2,3,4],
'col1':['a','b','c','d'],
'col2':['b','a','d','c']})
df
id col1 col2
0 1 a b
1 2 b a
2 3 c d
3 4 d c
I want the output to look like this:
id col1 col2 col2_id
0 1 a b 2
1 2 b a 1
2 3 c d 4
3 4 d c 3
I tried using pd.merge
but I know there is a more efficient way to do this with bigger samples.
CodePudding user response:
Using index trickery (disclaimer: did not do any timing, so you'll have to determine what works best/fastest yourself)
import pandas as pd
df = pd.DataFrame({'id':[1,2,3,4],
'col1':['a','b','c','d'],
'col2':['b','a','d','c']})
df["col2_id"] = df["id"].set_axis(df["col1"]).reindex(df["col2"]).reset_index(drop=True)
id col1 col2 col2_id
0 1 a b 2
1 2 b a 1
2 3 c d 4
3 4 d c 3
CodePudding user response:
Maybe there is shorter way to do this. In the meantime you could try:
(df.iloc[:, [0, 1]].merge(df.iloc[:, [0, 2]], left_on='col1', right_on='col2', suffixes=('', '_x'))
.rename(columns = {'id_x' :'col2_id'}))
id col1 col2_id col2
0 1 a 2 a
1 2 b 1 b
2 3 c 4 c
3 4 d 3 d