Pandas create column based on values in other rows and columns-CodePudding

I am currently trying to add a new column in a pandas dataset whose data is based on the contents of other rows in the dataset.

In the example, for each row x, I want to find the entry from id_real from row y, so that the content of id_par in row x matches the content from id in row y. See the following example.

id_real id id_par
100     1  2      
200     2  3      
300     3  4   



id_real id id_par new_col 
100     1  2      200
200     2  3      300
300     3  4      NaN

I have tried a lot of things and the last thing I tried was the following:

df["new_col"] = df[df["id"] == df["id_par"]]["node_id"]

Unfortunately, the new column then only contains NaN entries. Can you help me?

CodePudding user response：

Use Series.map with DataFrame.drop_duplicates for match first id rows:

df["new_col"] = df['id_par'].map(df.drop_duplicates('id').set_index('id')['id_real'])
print (df)
   id_real  id  id_par  new_col
0      100   1       2    200.0
1      200   2       3    300.0
2      300   3       4      NaN

CodePudding user response：

Use map to match the "id_par" using "id" as index and "id_real" as values:

df['new_col'] = df['id_par'].map(df.set_index('id')['id_real'])

output:

   id_real  id  id_par  new_col
0      100   1       2    200.0
1      200   2       3    300.0
2      300   3       4      NaN