Num | Algo | Distance | Result |
---|---|---|---|
525 | M | 25 | Good |
524 | M | 28 | Good |
523 | M | 30 | Good |
522 | M | 75 | Good |
Num | Algo | Distance | Result |
---|---|---|---|
525 | T | 25 | Good |
524 | T | 28 | Bad |
520 | T | 98 | Good |
df_1 = pd.DataFrame({'Num' : [525, 524, 523, 522], 'Algo' : [M, M, M, M], 'Distance' : [25, 28, 30, 75], 'Result' : ['Good', 'Good', 'Good', 'Good']})
df_2 = pd.DataFrame({'Num' : [525, 524, 520], 'Algo' : [T, T, T], 'Distance' : [25, 28, 98], 'Result' : ['Good', 'Bad', 'Good']})
I have two dataframes as above and I want to join/merge them in the way as below (I tried different joins in pandas but it does not work as I want to):
Num | Algo | Distance | Result |
---|---|---|---|
525 | M, T | 25 | Good |
524 | M, T | 28 | Good |
523 | M | 30 | Good |
522 | M | 75 | Good |
520 | T | 98 | Good |
Also, I want to give priority to the df_1['Result']
while joining as it can be seen I used 'Good'
for 'Num' = 524
.
CodePudding user response:
IIUC, you can use pandas.DataFrame.groupby
after concat
:
df = pd.concat([df_1, df_2])
new_df = df.groupby(["Num", "Distance"],
as_index=False,
sort=False).agg({"Algo" : ", ".join,
"Result" : "first"})
Output:
Num Distance Algo Result
0 525 25 M, T Good
1 524 28 M, T Good
2 523 30 M Good
3 522 75 M Good
4 520 98 T Good
CodePudding user response:
Try with merge
:
output = df_1.merge(df_2, on=["Num", "Distance"], how="outer")
#concat Algo columns from both dfs to a string
output["Algo"] = output["Algo_x"].fillna("").str.cat(output["Algo_y"].fillna(""), sep=", ").str.strip().str.strip(",")
#combine Result column using df_2 data only when df_1 is NaN
output["Result"] = output[["Result_x", "Result_y"]].ffill(axis=0)["Result_x"]
output = output[["Num", "Algo", "Distance", "Result"]]
>>> output
Num Algo Distance Result
0 525 M, T 25 Good
1 524 M, T 28 Good
2 523 M 30 Good
3 522 M 75 Good
4 520 T 98 Good