How can I iterate through a variable in dataframe while the same time aggregating by another variabl-CodePudding

timestamp   IFPID   Outcome Share Price Trade Qty   Date
0   10/05/2022 08:04    SP3K4K5 dn  36  100 2022-10-05
1   10/05/2022 08:04    SP3K4K5 up  64  100 2022-10-05
2   10/05/2022 08:04    SP3K4K5 up  65  100 2022-10-05
3   10/05/2022 08:04    SP3K4K5 dn  35  100 2022-10-05
4   11/05/2022 00:54    SP3K4K5 up  57  64  2022-11-05
5   11/05/2022 00:54    SP3K4K5 dn  43  64  2022-11-05

I want to new variable Expected(a binary class with buy or sell) which will be computed based on the aggregation of the the Date and Outcome variables by the share price variable and assign buy when the mean of the of up outcome for a particular date is higher than the mean of the dn outcome.

CodePudding user response：

To iterate through a variable in a DataFrame and aggregate by another variable, you can use a for loop and the groupby method. Here's an example of how you might do this:

    import pandas as pd
    # Assume we have a DataFrame with columns 'var1' and 'group'
    df = pd.DataFrame({'var1': [1, 2, 3, 4, 5], 'group': ['A', 'A', 'B', 'B', 'B']})

    # Create an empty list to store the aggregated values
    agg_values = []

    # Iterate through the unique values in 'var1'
    for var in df['var1'].unique():
      # Aggregate the data for each value of 'var1' by 'group' and store the result
      agg = df[df['var1'] == var].groupby('group').mean()
      agg_values.append(agg)

    # Concatenate all of the aggregated DataFrames into a single DataFrame
    agg_df = pd.concat(agg_values)

    # Add a new column with labels based on the values of 'var1'
    agg_df['label'] = [f'Value {v}' for v in df['var1'].unique()]

    print(agg_df)

This will output a DataFrame with the aggregated values for each unique value of var1, and a new column label with labels based on the values of var1.

           var1      label
    group               
    A       1.5  Value 1
    B       4.0  Value 2
    B       5.0  Value 3

CodePudding user response：

import pandas as pd
import numpy as np

df = pd.DataFrame({'outcome': ['dn', 'up', 'up', 'dn', 'up', 'dn'], 'share price': [36,64,65,35,57,43],'date':['10/5/22', '10/5/22', '10/5/22', '10/5/22', '11/5/22','11/5/22']})
print(df)

df['mean_by_date_by_outcome']  = df.groupby(['date','outcome'])['share price'].transform(np.mean)
df.loc[:,'label'] = 'buy'
df.loc[df['share price']<df['mean_by_date_by_outcome'],'label'] = 'sell'
df