Suppose I have these two dataframes:
df1=
ID index
087 4
087 5
087 6
df2=
ID index
087 1
087 2
087 3
...
087 10
087 11
087 12
...
And I would like to compare/join them based on the ID and index columns and then create a 'pred' column in the rows that df1 also has. So that the resulting df would look like this:
result_df=
ID index pred
087 1 0
087 2 0
087 3 0
087 4 1
087 5 1
087 6 1
087 7 0
...
087 12 0
...
Does anybody have a neat solution to this?
CodePudding user response:
I would do it following way
import pandas as pd
df1 = pd.DataFrame({"ID":[87,87,87],"index":[4,5,6]})
df2 = pd.DataFrame({"ID":[87,87,87,87,87,87],"index":[1,2,3,10,11,12]})
df1["pred"] = 1 # mark rows of df1
df = df1.merge(df2,"outer",["ID","index"])
df.pred = df.pred.fillna(0.0)
print(df)
output
ID index pred
0 87 4 1.0
1 87 5 1.0
2 87 6 1.0
3 87 1 0.0
4 87 2 0.0
5 87 3 0.0
6 87 10 0.0
7 87 11 0.0
8 87 12 0.0
Explanation: I add pred
column to df1
so after OUTER merge rows from df1
will have set pred value, whilst those from df2
will have NaN
which I then replace using zeros.
CodePudding user response:
If you want to merge them you can use this code:
import pandas as pd
new_df = pd.merge(df1, df2, how='inner', on = 'ID')
new_df