I am currently trying to add a new column in a pandas dataset whose data is based on the contents of other rows in the dataset.
In the example, for each row x, I want to find the entry from id_real from row y, so that the content of id_par in row x matches the content from id in row y. See the following example.
id_real id id_par
100 1 2
200 2 3
300 3 4
id_real id id_par new_col
100 1 2 200
200 2 3 300
300 3 4 NaN
I have tried a lot of things and the last thing I tried was the following:
df["new_col"] = df[df["id"] == df["id_par"]]["node_id"]
Unfortunately, the new column then only contains NaN entries. Can you help me?
CodePudding user response:
Use Series.map
with DataFrame.drop_duplicates
for match first id
rows:
df["new_col"] = df['id_par'].map(df.drop_duplicates('id').set_index('id')['id_real'])
print (df)
id_real id id_par new_col
0 100 1 2 200.0
1 200 2 3 300.0
2 300 3 4 NaN
CodePudding user response:
Use map
to match the "id_par" using "id" as index and "id_real" as values:
df['new_col'] = df['id_par'].map(df.set_index('id')['id_real'])
output:
id_real id id_par new_col
0 100 1 2 200.0
1 200 2 3 300.0
2 300 3 4 NaN