Below is the Raw Data.
Event Year Month
Event1 2011 January
Event1 2012 January
Event1 2013 February
Event1 2014 January
Event1 2015 March
Event2 2011 January
Event2 2014 April
Event3 2012 January
Event3 2015 March
Event4 2013 February
So only get those Event data that are occurred two or more times in given list of year i.e. [2011,2012,2013,2014].
So Output should be.
Event Year Month
Event1 2011 January
Event1 2012 January
Event1 2013 February
Event1 2014 January
Event1 2015 March
Event2 2011 January
Event2 2014 April
CodePudding user response:
Use isin to filter years in the list. Groupby count and filter those greater than or equals to 2
s=df[df['Year'].astype(str).isin(lst)]
s[s.groupby('Event')['Month'].transform('count').ge(2)]
CodePudding user response:
First filter rows by list in Series.isin
with boolean indexing
and then are filtered duplicated events by DataFrame.duplicated
, last filter original column Event
:
L = [2011,2012,2013,2014]
df1 = df.loc[df['Year'].isin(L)]
df = df[df['Event'].isin(df1.loc[df1.duplicated(['Event']),'Event'])]
print (df)
Event Year Month
0 Event1 2011 January
1 Event1 2012 January
2 Event1 2013 February
3 Event1 2014 January
4 Event1 2015 March
5 Event2 2011 January
6 Event2 2014 April
Or you can test if greater or equal 2
is number of filtered events
:
L = [2011,2012,2013,2014]
df1 = df.loc[df['Year'].isin(L)]
s = df1['Event'].value_counts()
df = df[df['Event'].isin(s.index[s.ge(2)])]
print (df)
Event Year Month
0 Event1 2011 January
1 Event1 2012 January
2 Event1 2013 February
3 Event1 2014 January
4 Event1 2015 March
5 Event2 2011 January
6 Event2 2014 April