I would really appreciate assistance in developing Python/ Pandas code to left-join 2 separate CSV files. I'm very new to Python so unclear how to begin. I have used Excel plugins to achieve same but it would normally hang or take hours to complete due to the huge amount of records being processed.
Scenario: joining on the first column.
CSV1
AN DNM OCD TRI
1 343 5656 90
2 344 5657 91
3 345 5658 92
4 346 5659 93
CSV2
AN2 STATE PLAN
4 3 19
3 2 35
7 3 19
8 3 19
Result inclusive of a match status if possible:
AN DNM OCD TRI STATE PLAN Join Status
1 343 5656 90 No_match
2 344 5657 91 No_match
3 345 5658 92 2 35 Match
4 346 5659 93 3 19 Match
All help appreciated.
CodePudding user response:
You can use .merge
with indicator=
parameter:
out = df1.merge(
df2, left_on="AN", right_on="AN2", indicator="Join Status", how="left"
)
out = out.drop(columns="AN2")
out["Join Status"] = out["Join Status"].map(
{"left_only": "No_match", "both": "Match"}
)
print(out)
Prints:
AN DNM OCD TRI STATE PLAN Join Status
0 1 343 5656 90 NaN NaN No_match
1 2 344 5657 91 NaN NaN No_match
2 3 345 5658 92 2.0 35.0 Match
3 4 346 5659 93 3.0 19.0 Match
CodePudding user response:
Let's assume you have df1 and df2 and you want to merge the two dataframe
df = df1.merge(df2, how='left', left_on='AN', right_on='AN2')
I hope this will help you