Home > Enterprise >  getting first value of the groups in Pandas
getting first value of the groups in Pandas

Time:02-23

I have a DataFrame and want to generate a report on it.

Sample data

data = {
 'Date': {0: '2021-01-04 10:45:00',
  1: '2021-01-04 10:45:00',
  2: '2021-01-05 11:15:00',
  3: '2021-01-05 11:15:00',
  4: '2021-01-06 12:15:00',
  5: '2021-01-06 12:15:00'},
 'Action': {0: 'A', 1: 'B', 2: 'P', 3: 'Q', 4: 'X', 5: 'Y'},
 'Profit': {0: np.NaN, 1: -2637.93, 2: np.NaN, 3: 11008.4, 4: np.NaN, 5: -2977.49},
 }
df = pd.DataFrame(data)

My normal move is to make a function that will handle all the calculations like this.

def magic(x):
    result = {
        'Action' : x['Action'], #Need help here
        'Profit': x['Profit'].sum()
    }
    return pd.Series(result)

df = df.groupby(['Date']).apply(magic)

What I want to happen:

The Action column should contain the first value of the group.

Wanted output:

Date                | Action     | Profit
2021-01-04 10:45:00 |A           |-2637.93
2021-01-05 11:15:00 |P           |11008.40
2021-01-06 12:15:00 |X           |-2977.49

Actual output:

Date                | Action                                | Profit
2021-01-04 10:45:00 |0 A 1 B Name: Action, dtype: object    |-2637.93
2021-01-05 11:15:00 |2 P 3 Q Name: Action, dtype: object    |11008.40
2021-01-06 12:15:00 |4 X 5 Y Name: Action, dtype: object    |-2977.49

My actual function will have more columns so it'll be preferred to have all the operations done inside the magic function.

CodePudding user response:

I think you want to call the first method on Action:

out = df.groupby('Date').agg({'Action': 'first', 'Profit': 'sum'})

Output:

                    Action    Profit
Date                                
2021-01-04 10:45:00      A  -2637.93
2021-01-05 11:15:00      P  11008.40
2021-01-06 12:15:00      X  -2977.49

CodePudding user response:

If magic function count new columns from multiple input columns then agg is not possible use.

Then select first value by indexing:

def magic(x):
result = {
    'Action' : x['Action'].iat[0]
    'Profit': x['Profit'].sum()
}
return pd.Series(result)
  • Related