How can you drop certain dates in a data frame grouped by day?-CodePudding

I am working on a code that groupes a data frame by date:

gk = df_HR.groupby(['date'])

I now get a data frame where every first row from each date is looking like this:

2022-05-23  22:18   60  2022-05-23 22:18:00 1653344280  1.000000
2022-05-24  00:00   54  2022-05-24 00:00:00 1653350400  0.900000

....

I want to drop as an example all the data for the date '2022-05-24'. However, when I use the .drop() function I get the error 'DataFrameGroupBy' object has no attribute 'drop''. How can I still drop all the data from this date?

CodePudding user response：

Save your group by result in Dataframe-df and then use below code to select list of dates you want to drop .

date_list_filter = [datetime(2009, 5, 2),
             datetime(2010, 8, 22)]

df.drop(date_list, inplace=True)

hope this helps !

CodePudding user response：

From what i gather, the goal is to group the data frames by date, and drop dataframes with date's on a certain day

import pandas as pd

# ...

gk = df_HR.groupby(['date'])
good_dfs = []
for date, sub_df in gk:
  if DATE_TO_DROP not in date:
    good_dfs.append(sub_df)

final_df = pd.concat(good_dfs)

Alternatively, you can just drop rows where 'date' has that string included

df_HR.drop(df_HR[ DATE_TO_REMOVE in df_HR.date].index, inplace=True)

The above is for removing a single date. if you have multiple dates here are those two options again

option1:

dates_to_drop = [] 
gk = df_HR.groupby(['date'])
good_dfs = []
for date, sub_df in gk:
  for bad_date in dates_to_drop:
    if bad_date in date:
      good_dfs.append(sub_df)

final_df = pd.concat(good_dfs)

option2:

dates_to_drop = [] 
for bad_date in dates_to_drop:
  df_HR.drop(df_HR[ bad_date in df_HR.date ].index, inplace=True)

The reason we have to loop through is because the dates in the DF include more than just the string you're looking for. checking for substring existence in python involves using the 'in' operator. But we can't check if a list of strings is in a string, and so we loop over bad dates, removing all rows with each bad date.

CodePudding user response：

See below code to explain further

my_date=[datetime(2009, 5, 2),
    datetime(2010, 8, 22),
     datetime(2022,8,22),
     datetime(2009,5,2),
     datetime(2010,8,22)       
    ]

df=pd.DataFrame(my_date)
df.columns=['Date']
df1=df.groupby('Date').mean()
df1 # now see below data of dataframe df1

df1.drop('2009-05-02',inplace=True) 
# given date will be dropped-see screenshot  
df1