I have a dataset which I am trying to filter via 2 values that the user can select and display mean, median and n for the selected category. I have a function that I used to calculate the aggregated values for each ID, but I am trying to calculate the values for the selected groups rather than the individual ID. I need some help converting the function from an ID to a group
below is a sample dataset df1
ID | Test | Group | Date |
---|---|---|---|
001 | A | 10 | 11/01/2020 |
001 | B | 10 | 11/02/2020 |
001 | A | 10 | 11/03/2020 |
001 | B | 10 | 11/04/2020 |
002 | A | 10 | 11/08/2020 |
002 | A | 10 | 11/11/2020 |
003 | A | 20 | 11/01/2020 |
003 | B | 20 | 11/05/2020 |
003 | C | 20 | 11/08/2020 |
When I select Group 10 and Test A, I want it to filter only for those values and create the following dataset where I am calculating the difference b/w days for each ID.
ID | Test | Group | Date | Diff_in_Days |
---|---|---|---|---|
001 | A | 10 | 11/01/2020 | 0 |
001 | A | 10 | 11/03/2020 | 2 |
002 | A | 10 | 11/08/2020 | 0 |
002 | A | 10 | 11/11/2020 | 3 |
Then I want to calculate mean and median of the diff_in_days column for the selected Group and test.
I am able to do this on an ID level and need help calculating aggregation on group level.
df['Diff_in_days'] = df['date'].diff().apply(lambda x: x/np.timedelta64(1, 'D')).fillna(0).astype('int64')
display(df)
df = df.iloc[1:]
df['mean']= df['Diff_in_days'].mean()
df['median'] = df['Diff_in_days'].median()
df['Number of Tests'] = len(df) 1
df = df[['group','measure','mean','median','Number of Tests']].drop_duplicates()
display(df)
CodePudding user response:
This works:
(df
.assign(Date=lambda x: pd.to_datetime(x.Date))
.assign(diff=lambda x: x.groupby(['ID', 'Test', 'Group'])['Date'].transform(lambda x: x.diff()))
.sort_values('Test')
)
CodePudding user response:
if I understood you correctly, you want to slice dataframe by group and test variables which are selected by 'user'. if so, you need to create a function with parameters as per the example below:
def slice_group(df, group_num, test):
# function slices dataframe by selected group: 'group_num' and test,
# later creates columns: Diff_in_Days, mean, median, Number of Tests
df.Date = pd.to_datetime(df.Date)
df = df[df['Group'] == group_num].copy() # slicing data frame by group_num
df = df[df['Test'] == test].copy() # slicing data frame by test
df['Diff_in_Days'] = df.groupby(['ID'])['Date'].diff().fillna(pd.Timedelta(seconds=0)).dt.days.astype('int')
df['mean'] = df['Diff_in_Days'].mean()
df['median'] = df['Diff_in_Days'].median()
df['Number of Tests'] = len(df)
return df
user_chosen_group = 10 # user selected group
test = 'A' # user selected test
new_df = slice_group(df, user_chosen_group, test) # passing dataframe, group number and test variable to function