Home > OS >  Aggregating the counts on a certain day of the week in python
Aggregating the counts on a certain day of the week in python

Time:10-12

I'm having this data frame:

id date count 1 8/31/22 1 1 9/1/22 2 1 9/2/22 8 1 9/3/22 0 1 9/4/22 3 1 9/5/22 5 1 9/6/22 1 1 9/7/22 6 1 9/8/22 5 1 9/9/22 7 1 9/10/22 1 2 8/31/22 0 2 9/1/22 2 2 9/2/22 0 2 9/3/22 5 2 9/4/22 1 2 9/5/22 6 2 9/6/22 1 2 9/7/22 1 2 9/8/22 2 2 9/9/22 2 2 9/10/22 0

I want to aggregate the count by id and date to get sum of quantities Details:

Date: the all counts in a week should be aggregated on Saturday. A week starts from Sunday and ends on Saturday. The time period (the first day and the last day of counts) is fixed for all of the ids.

The desired output is given below:

id date count 1 9/3/22 11 1 9/10/22 28 2 9/3/22 7 2 9/10/22 13

I have already the following code for this work and it does work but it is not efficient as it takes a long time to run for a large database. I am looking for a much faster and efficient way to get the output:

df['day_name'] = new_df['date'].dt.day_name()

df_week_count = pd.DataFrame(columns=['id', 'date', 'count'])

for id in ids:

# make a dataframe for each id
df_id = new_df.loc[new_df['id'] == id]
df_id.reset_index(drop=True, inplace=True)
# find Starudays index
saturday_indices = df_id.loc[df_id['day_name'] == 'Saturday'].index
j = 0
sat_index = 0
while(j < len(df_id)):
    # find sum of count between j and saturday_index[sat_index]
    sum_count = df_id.loc[j:saturday_indices[sat_index], 'count'].sum()
    # add id, date, sum_count to df_week_count
    temp_df = pd.DataFrame([[id, df_id.loc[saturday_indices[sat_index], 'date'], sum_count]], columns=['id', 'date', 'count'])
    df_week_count = pd.concat([df_week_count, temp_df], ignore_index=True)
    j = saturday_indices[sat_index]   1
    sat_index  = 1
    if sat_index >= len(saturday_indices):
        break
if(j < len(df_id)):
    sum_count = df_id.loc[j:, 'count'].sum()
    temp_df = pd.DataFrame([[id, df_id.loc[len(df_id) - 1, 'date'], sum_count]], columns=['id', 'date', 'count'])
    df_week_count = pd.concat([df_week_count, temp_df], ignore_index=True)

df_final = df_week_count.copy(deep=True)

CodePudding user response:

Create a grouping factor from the dates.

 week = pd.to_datetime(df['date'].to_numpy()).strftime('%U %y')
 df.groupby(['id',week]).agg({'date':max, 'count':sum}).reset_index()

  id level_1   date count
0  1   35 22 9/3/22    11
1  1   36 22 9/9/22    28
2  2   35 22 9/3/22     7
3  2   36 22 9/9/22    13

CodePudding user response:

I tried to understand as much as i can :)

here is my process

# reading data
df = pd.read_csv(StringIO(data), sep=' ')
# data type fix
df['date'] = pd.to_datetime(df['date'])
# initial grouping
df = df.groupby(['id', 'date'])['count'].sum().to_frame().reset_index()
df.sort_values(by=['date', 'id'], inplace=True)
df.reset_index(drop=True, inplace=True)
# getting name of the day
df['day_name'] = df.date.dt.day_name()
# getting week number
df['week'] = df.date.dt.isocalendar().week
# adjusting week number to make saturday the last day of the week
df.loc[df.day_name == 'Sunday','week'] = df.loc[df.day_name == 'Sunday', 'week']   1

what i think you are looking for

df.groupby(['id','week'])['count'].sum().to_frame().reset_index()
id week count
0 1 35 11
1 1 36 28
2 2 35 7
3 2 36 13
  • Related