Home > Enterprise >  Filtering pandas dataframe on 2 columns
Filtering pandas dataframe on 2 columns

Time:11-02

I'm having brain fog with basic pandas filtering, I know this is very basic but my pandas is rusty :( Many thanks in advanced!

I have the below dataframe, I want to filter it to find only unique emails that are in both event years (e.g. 2022 and 2023):

   df_current = pd.DataFrame({'Email':['[email protected]', '[email protected]', '[email protected]', '[email protected]'],
               'Tickets Sold':[1, 2, 1, 1], 'Year':['2022', '2023', '2023', '2023']})


df_desired = pd.DataFrame({'Email':['[email protected]', '[email protected]'],
               'Tickets Sold':[1, 1], 'Year':['2022', '2023']})

CodePudding user response:

Try this out

duplicates = df_current.duplicated(subset=['Email'], keep=False)
df_desired = df_current[duplicates]

CodePudding user response:

Here's another way you could use:

import pandas as pd

result = df_current.loc[df_current.groupby('Email')['Year'].transform('nunique') >= 2, :]

print(result)
# Prints:
#                  Email  Tickets Sold  Year
# 0  [email protected]             1  2022
# 2  [email protected]             1  2023
  • Related