I'm having brain fog with basic pandas filtering, I know this is very basic but my pandas is rusty :( Many thanks in advanced!
I have the below dataframe, I want to filter it to find only unique emails that are in both event years (e.g. 2022 and 2023):
df_current = pd.DataFrame({'Email':['[email protected]', '[email protected]', '[email protected]', '[email protected]'],
'Tickets Sold':[1, 2, 1, 1], 'Year':['2022', '2023', '2023', '2023']})
df_desired = pd.DataFrame({'Email':['[email protected]', '[email protected]'],
'Tickets Sold':[1, 1], 'Year':['2022', '2023']})
CodePudding user response:
Try this out
duplicates = df_current.duplicated(subset=['Email'], keep=False)
df_desired = df_current[duplicates]
CodePudding user response:
Here's another way you could use:
import pandas as pd
result = df_current.loc[df_current.groupby('Email')['Year'].transform('nunique') >= 2, :]
print(result)
# Prints:
# Email Tickets Sold Year
# 0 [email protected] 1 2022
# 2 [email protected] 1 2023