I have 2 dataframes. DF1 is my main dataframe, DF2 is another data set. I need to drop rows from DF1 contain "Security IDs" found in DF2. This new DF will be called DF3. How would I do this?
Sample Dataframe:
df1 = pd.DataFrame({'Security ID': ['10003', '10009', '10010', '10034', '10665'],
'SomeNum': [12,13,13,13,13],
'Color of dogs': ['red','orange','red','red','red'],
'Date1': [20120506, 20120506, 20120506,20120506,20120620],
'Date2': [20120528, 20120507, 20120615,20120629,20120621]})
df2 = pd.DataFrame({'Security ID': ['10003', '10009', '10010', '70011', '70012','70034','70034', '70034'],
'SomeNum': [12,13,13,13,13,13,14,14],
'Num2': [121,122,122,124,125,126,127,128],
'InterestData': [20120506, 20120506, 20120506,20120506,20120620,20120506,20120206,20120405],
'UnreleatedStuff': [20120528, 20120507, 20120615,20120629,20120621,20120629,20120506,20120506]})
Required Resulting Dataframe:
df3 = pd.DataFrame({'Security ID': ['10034', '10665'],
'SomeNum': [13,13],
'Color of dogs': ['red','red'],
'Date1': [20120506,20120620],
'Date2': [20120629,20120621]})
How would I go about this? I tried df3=[~df2.index.isin(df1.index)]
after changing the index by df.set_index('Security ID')
CodePudding user response:
Just negate the condition with the boolean NOT operator ~
:
df3 = df1[~df1['Security ID'].isin(df2['Security ID'])]
output:
Security ID SomeNum Color of dogs Date1 Date2
3 10034 13 red 20120506 20120629
4 10665 13 red 20120620 20120621
Another nice trick to know, to use several columns to drop, is to perform a reverse merge
:
cols = ['Security ID'] # columns to use
(df1
.merge(df2[cols].drop_duplicates(), how='outer', indicator=True)
.query('_merge == "left_only"').drop(columns='_merge')
)
CodePudding user response:
IIUC you want this:
df3 = df1[~df1['Security ID'].isin(df2['Security ID'])]
CodePudding user response:
I have a less cool and more sql like solution:
Left join on Security ID
and using indicator=True
tells you the appearances of the row.
df3 = df1.merge(df2[['Security ID']], on='Security ID', how='left', indicator=True)
10034 and 10665 does not appear in df2
so filter for left_only
in the merge
column
df3 = df3[df3['_merge'] == 'left_only']
And then you can just drop the _merge
column after that
df3.drop('_merge', 1)