I have a DataFrame and want to generate a report on it.
Sample data
data = {
'Date': {0: '2021-01-04 10:45:00',
1: '2021-01-04 10:45:00',
2: '2021-01-05 11:15:00',
3: '2021-01-05 11:15:00',
4: '2021-01-06 12:15:00',
5: '2021-01-06 12:15:00'},
'Action': {0: 'A', 1: 'B', 2: 'P', 3: 'Q', 4: 'X', 5: 'Y'},
'Profit': {0: np.NaN, 1: -2637.93, 2: np.NaN, 3: 11008.4, 4: np.NaN, 5: -2977.49},
}
df = pd.DataFrame(data)
My normal move is to make a function that will handle all the calculations like this.
def magic(x):
result = {
'Action' : x['Action'], #Need help here
'Profit': x['Profit'].sum()
}
return pd.Series(result)
df = df.groupby(['Date']).apply(magic)
What I want to happen:
The Action column should contain the first value of the group.
Wanted output:
Date | Action | Profit
2021-01-04 10:45:00 |A |-2637.93
2021-01-05 11:15:00 |P |11008.40
2021-01-06 12:15:00 |X |-2977.49
Actual output:
Date | Action | Profit
2021-01-04 10:45:00 |0 A 1 B Name: Action, dtype: object |-2637.93
2021-01-05 11:15:00 |2 P 3 Q Name: Action, dtype: object |11008.40
2021-01-06 12:15:00 |4 X 5 Y Name: Action, dtype: object |-2977.49
My actual function will have more columns so it'll be preferred to have all the operations done inside the magic function.
CodePudding user response:
I think you want to call the first
method on Action
:
out = df.groupby('Date').agg({'Action': 'first', 'Profit': 'sum'})
Output:
Action Profit
Date
2021-01-04 10:45:00 A -2637.93
2021-01-05 11:15:00 P 11008.40
2021-01-06 12:15:00 X -2977.49
CodePudding user response:
If magic function count new columns from multiple input columns then agg
is not possible use.
Then select first value by indexing:
def magic(x):
result = {
'Action' : x['Action'].iat[0]
'Profit': x['Profit'].sum()
}
return pd.Series(result)