I have the following dataframe:
Date Name Grade Hobby
01/01/2005 Albert 4 Drawing
08/04/1996 Martha 6 Horseback riding
03/03/2003 Jack 5 Singing
07/01/2001 Millie 5 Netflix
24/09/2000 Julie 7 Sleeping
...
I want to filter the df to only contain the rows for repeat dates, so where df['Date'].value_counts()>=2
And then groupby dates sorted in chronological order so that I can have something like:
Date Name Grade Hobby
08/08/1996 Martha 6 Horseback riding
Matt 4 Sleeping
Paul 5 Cooking
24/09/2000 Julie 7 Sleeping
Simone 4 Sleeping
...
I have tried some code, but I get stuck on the first step. I tried something like:
same=df['Date'].value_counts()
same=same.loc[lambda x:x >=2]
mult=same.index.to_list()
for i in df['Date']:
if i not in mult:
df.drop(df[df['Date'==i]].index)
I also tried
new=df.loc[df['Date'].isin(mult)]
plot=pd.pivot_table(new, index=['Date'],columns=['Name'])
But this only gets 1 of the rows per each repeat dates instead of all the rows with the same date
CodePudding user response:
Think this should do the job
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df_new = df[df['Date'].duplicated(keep=False)].sort_values('Date')
CodePudding user response:
Convert Date
to datetimes by to_datetime
, then filter rows in boolean indexing
and last sorting by DataFrame.sort_values
:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
same=df['Date'].value_counts()
df1 = df[df['Date'].map(same) >= 2].sort_values('Date')
Or use Series.duplicated
with keep=False
for count 2 and more, what is same like duplicates:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df1 = df[df['Date'].duplicated(keep=False)].sort_values('Date')