Home > database >  pandas grooup by according to group of days of the week selected
pandas grooup by according to group of days of the week selected

Time:03-29

I have this dataframe:

rng = pd.date_range(start='2018-01-01', end='2018-01-21')
rnd_values = np.random.rand(len(rng)) 3

df = pd.DataFrame({'time':rng.to_list(),'value':rnd_values})

let's say that I want to group it according to the day of the week and compute the mean:

df['span'] = np.where((df['time'].dt.day_of_week <= 2 , 'Th-Sn', 'Mn-Wd')
df['wkno'] = df['time'].dt.isocalendar().week.shift(fill_value=0) 
df.groupby(['wkno','span']).mean()

However, I would like to make this procedure more general.

Let's say that I define the following day is the week:

days=['Monday','Thursday']

Is there any option that allows me to do what I have done by using "days". I imagine that I have to compute the number of day between 'Monday','Thursday' and then I should use that number. What about the case when

days=['Monday','Thursday','Friday']

I was thinking to set-up a dictionary as:

days={'Monday':0,'Thursday':3,'Friday':4}

then

idays = list(days.values())[:]

How can I use now idays inside np.where? Indeed I have three interval.

Thanks

CodePudding user response:

If you want to use more than one threshold you need np.searchsorted the resulting function would look something like

def groupby_daysspan_week(dfc,days):
    df = dfc.copy()
    day_to_dayofweek = {'Monday':0,'Tuesday':1,'Wednesday':2,
                        'Thursday':3,'Friday':4,'Saturday':5,'Sunday':6}
    short_dict = {0:'Mn',1:'Tu',2:'Wd',3:'Th',4:'Fr',5:'St',6:'Sn'}
    day_split = [day_to_dayofweek[d] for d in days]
    df['wkno'] = df['time'].dt.isocalendar().week
    df['dow'] = df['time'].dt.day_of_week
    df['span'] = np.searchsorted(day_split,df['dow'],side='right')
    span_name_dict = {i 1:short_dict[day_split[i]] '-' short_dict[(day_split [6])[i 1]] 
                      for i in range(len(day_split))}
    df_agg = df.groupby(['wkno','span'])['value'].mean()
    df_agg = df_agg.rename(index=span_name_dict,level=1)
    return df_agg
  • Related