Get only data that are repeated any the given year two or more times in pandas-CodePudding

Below is the Raw Data.

Event   Year    Month
Event1  2011    January
Event1  2012    January
Event1  2013    February
Event1  2014    January
Event1  2015    March
Event2  2011    January
Event2  2014    April
Event3  2012    January
Event3  2015    March
Event4  2013    February

So only get those Event data that are occurred two or more times in given list of year i.e. [2011,2012,2013,2014].

So Output should be.

    Event   Year    Month
Event1  2011    January
Event1  2012    January
Event1  2013    February
Event1  2014    January
Event1  2015    March
Event2  2011    January
Event2  2014    April

CodePudding user response：

Use isin to filter years in the list. Groupby count and filter those greater than or equals to 2

s=df[df['Year'].astype(str).isin(lst)]
s[s.groupby('Event')['Month'].transform('count').ge(2)]

CodePudding user response：

First filter rows by list in Series.isin with boolean indexing and then are filtered duplicated events by DataFrame.duplicated, last filter original column Event:

L = [2011,2012,2013,2014]
df1 = df.loc[df['Year'].isin(L)]

df = df[df['Event'].isin(df1.loc[df1.duplicated(['Event']),'Event'])]

print (df)
    Event  Year     Month
0  Event1  2011   January
1  Event1  2012   January
2  Event1  2013  February
3  Event1  2014   January
4  Event1  2015     March
5  Event2  2011   January
6  Event2  2014     April

Or you can test if greater or equal 2 is number of filtered events:

L = [2011,2012,2013,2014]
df1 = df.loc[df['Year'].isin(L)]

s = df1['Event'].value_counts()
df = df[df['Event'].isin(s.index[s.ge(2)])]

print (df)
    Event  Year     Month
0  Event1  2011   January
1  Event1  2012   January
2  Event1  2013  February
3  Event1  2014   January
4  Event1  2015     March
5  Event2  2011   January
6  Event2  2014     April