I have this dataframe:
rng = pd.date_range(start='2018-01-01', end='2018-01-21')
rnd_values = np.random.rand(len(rng)) 3
df = pd.DataFrame({'time':rng.to_list(),'value':rnd_values})
let's say that I want to group it according to the day of the week and compute the mean:
df['span'] = np.where((df['time'].dt.day_of_week <= 2 , 'Th-Sn', 'Mn-Wd')
df['wkno'] = df['time'].dt.isocalendar().week.shift(fill_value=0)
df.groupby(['wkno','span']).mean()
However, I would like to make this procedure more general.
Let's say that I define the following day is the week:
days=['Monday','Thursday']
Is there any option that allows me to do what I have done by using "days". I imagine that I have to compute the number of day between 'Monday','Thursday' and then I should use that number. What about the case when
days=['Monday','Thursday','Friday']
I was thinking to set-up a dictionary as:
days={'Monday':0,'Thursday':3,'Friday':4}
then
idays = list(days.values())[:]
How can I use now idays inside np.where? Indeed I have three interval.
Thanks
CodePudding user response:
If you want to use more than one threshold you need np.searchsorted
the resulting function would look something like
def groupby_daysspan_week(dfc,days):
df = dfc.copy()
day_to_dayofweek = {'Monday':0,'Tuesday':1,'Wednesday':2,
'Thursday':3,'Friday':4,'Saturday':5,'Sunday':6}
short_dict = {0:'Mn',1:'Tu',2:'Wd',3:'Th',4:'Fr',5:'St',6:'Sn'}
day_split = [day_to_dayofweek[d] for d in days]
df['wkno'] = df['time'].dt.isocalendar().week
df['dow'] = df['time'].dt.day_of_week
df['span'] = np.searchsorted(day_split,df['dow'],side='right')
span_name_dict = {i 1:short_dict[day_split[i]] '-' short_dict[(day_split [6])[i 1]]
for i in range(len(day_split))}
df_agg = df.groupby(['wkno','span'])['value'].mean()
df_agg = df_agg.rename(index=span_name_dict,level=1)
return df_agg