I'm working on a Pandas dataframe with transactional data (customer purchases) and want to exclude rows with certain customer numbers contained in a column 'CUSTOMER_ID'.
To achieve this, I created a list with the customer numbers to be exluded:
excluded_customers = ['2000', '2100', '3100', '4000', '4100', '4200', '4300', '4400', '4700', '6802']
Then I used the .isin() function to filter my df accordingly and save it in a new df2:
df2 = df[(df['CUSTOMER_ID'].isin(excluded_customers) == False)]
Then I want to sort the new df2 by column 'CUSTOMER_ID' in ascending order. However, the excluded customer numbers still appear in the dataframe:
df2.sort_values(by=['CUSTOMER_ID'])
I would much appreciate some hints why they aren't dropped from the df.
Thank you!
CodePudding user response:
Convert column to strings and for invert mask use ~
:
df2 = (df[~df['CUSTOMER_ID'].astype(str).isin(excluded_customers)]
.sort_values(by=['CUSTOMER_ID']))