I have a dataframe where I have my complete dataset, it has 4 columns, but I need to remove all rows that have the customer_id from another dataframe
Here's an example:
df1 = pd.DataFrame({'client_id' : ['123456', '451245', '563234', '753214', '412456', '442312', '123411', '987654'],
'name_fran' : [ 'will', 'bruce', 'john', 'jorge', 'well', 'dani', 'rafa', 'manoel'],
'id_fran' : ['331' , '221', '312', '531', '678', '214', '456', '886']})
df2 = pd.DataFrame({'client_id' : ['123456', '987654']})
df1 :
index | client_id | name_fran | id_fran |
---|---|---|---|
0 | 123456 | will | 331 |
1 | 451245 | bruce | 221 |
2 | 563234 | john | 312 |
3 | 753214 | jorge | 531 |
4 | 412456 | well | 678 |
5 | 442312 | dani | 214 |
6 | 123411 | rafa | 456 |
7 | 987654 | manoel | 886 |
df2:
index | client_id |
---|---|
0 | 123456 |
1 | 987654 |
result I need is this:
index | client_id | name_fran | id_fran |
---|---|---|---|
0 | 451245 | bruce | 221 |
1 | 563234 | john | 312 |
2 | 753214 | jorge | 531 |
3 | 412456 | well | 678 |
4 | 442312 | dani | 214 |
5 | 123411 | rafa | 456 |
I've tried a few ways but got no results, what's the best way to do this?
CodePudding user response:
We could do isin
df1 = df1.loc[~df1['client_id'].isin(df2['client_id'])].reset_index(drop=True)
df1
Out[278]:
client_id name_fran id_fran
0 451245 bruce 221
1 563234 john 312
2 753214 jorge 531
3 412456 well 678
4 442312 dani 214
5 123411 rafa 456