Home > other >  Delete entire dataframe rows based on condition and stack remaining rows
Delete entire dataframe rows based on condition and stack remaining rows

Time:11-03

I have the following dataframe:

Date          Name      Grade       Hobby
01/01/2005    Albert    4           Drawing
08/04/1996    Martha    6           Horseback riding
03/03/2003    Jack      5           Singing
07/01/2001    Millie    5           Netflix
24/09/2000    Julie     7           Sleeping
...

I want to filter the df to only contain the rows for repeat dates, so where df['Date'].value_counts()>=2 And then groupby dates sorted in chronological order so that I can have something like:

Date          Name      Grade       Hobby
08/08/1996    Martha    6           Horseback riding
              Matt      4           Sleeping
              Paul      5           Cooking
24/09/2000    Julie     7           Sleeping
              Simone    4           Sleeping
...
    

I have tried some code, but I get stuck on the first step. I tried something like:

same=df['Date'].value_counts()
same=same.loc[lambda x:x >=2]
mult=same.index.to_list()

for i in df['Date']:
    if i not in mult:
        df.drop(df[df['Date'==i]].index)

I also tried

new=df.loc[df['Date'].isin(mult)]
plot=pd.pivot_table(new, index=['Date'],columns=['Name'])

But this only gets 1 of the rows per each repeat dates instead of all the rows with the same date

CodePudding user response:

Think this should do the job

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

df_new = df[df['Date'].duplicated(keep=False)].sort_values('Date')

CodePudding user response:

Convert Date to datetimes by to_datetime, then filter rows in boolean indexing and last sorting by DataFrame.sort_values:

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

same=df['Date'].value_counts()
df1 = df[df['Date'].map(same) >= 2].sort_values('Date')

Or use Series.duplicated with keep=False for count 2 and more, what is same like duplicates:

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

df1 = df[df['Date'].duplicated(keep=False)].sort_values('Date')
  • Related