Home > Software design >  Python group by aggregation and function to filter
Python group by aggregation and function to filter

Time:09-28

I have a dataset which I am trying to filter via 2 values that the user can select and display mean, median and n for the selected category. I have a function that I used to calculate the aggregated values for each ID, but I am trying to calculate the values for the selected groups rather than the individual ID. I need some help converting the function from an ID to a group

below is a sample dataset df1

ID Test Group Date
001 A 10 11/01/2020
001 B 10 11/02/2020
001 A 10 11/03/2020
001 B 10 11/04/2020
002 A 10 11/08/2020
002 A 10 11/11/2020
003 A 20 11/01/2020
003 B 20 11/05/2020
003 C 20 11/08/2020

When I select Group 10 and Test A, I want it to filter only for those values and create the following dataset where I am calculating the difference b/w days for each ID.

ID Test Group Date Diff_in_Days
001 A 10 11/01/2020 0
001 A 10 11/03/2020 2
002 A 10 11/08/2020 0
002 A 10 11/11/2020 3

Then I want to calculate mean and median of the diff_in_days column for the selected Group and test.

I am able to do this on an ID level and need help calculating aggregation on group level.

       df['Diff_in_days'] = df['date'].diff().apply(lambda x: x/np.timedelta64(1, 'D')).fillna(0).astype('int64')
       display(df)

       
       df = df.iloc[1:]

     
       df['mean']= df['Diff_in_days'].mean()
       df['median'] = df['Diff_in_days'].median()
       df['Number of Tests'] = len(df)   1

       
       df = df[['group','measure','mean','median','Number of Tests']].drop_duplicates()

       display(df)

CodePudding user response:

This works:

(df
 .assign(Date=lambda x: pd.to_datetime(x.Date))
 .assign(diff=lambda x: x.groupby(['ID', 'Test', 'Group'])['Date'].transform(lambda x: x.diff()))
 .sort_values('Test')
)

CodePudding user response:

if I understood you correctly, you want to slice dataframe by group and test variables which are selected by 'user'. if so, you need to create a function with parameters as per the example below:

def slice_group(df, group_num, test):
    # function slices dataframe by selected group: 'group_num' and test,
    # later creates columns: Diff_in_Days, mean, median, Number of Tests
    df.Date = pd.to_datetime(df.Date)
    df = df[df['Group'] == group_num].copy() # slicing data frame by group_num
    df = df[df['Test'] == test].copy() # slicing data frame by test

    df['Diff_in_Days'] = df.groupby(['ID'])['Date'].diff().fillna(pd.Timedelta(seconds=0)).dt.days.astype('int')
    df['mean'] = df['Diff_in_Days'].mean()
    df['median'] = df['Diff_in_Days'].median()
    df['Number of Tests'] = len(df)
    return df


user_chosen_group = 10 # user selected group
test = 'A' # user selected test

new_df = slice_group(df, user_chosen_group, test) # passing dataframe, group number and test variable to function
  • Related