I just can't seem to connect the dots on this. I'm trying to do something similar to this question
I have a dataframe that takes the form:
Origination | Orig Balance |
---|---|
Q1 | 3000 |
Q1 | 2000 |
Q1 | 4000 |
Q2 | 3000 |
Q2 | 3000 |
Q3 | 1000 |
Q3 | 4000 |
Q3 | 3000 |
And I'm trying to create a dataframe that looks like this:
Origination | Orig Balance |
---|---|
Q1 | 9000 |
Q2 | 6000 |
Q3 | 8000 |
I don't want to set the specific parameters, so something like df.loc[df['Origination'] == 'Q1', 'Orig Balance'].sum()
wouldn't work for me.
CodePudding user response:
You want to group by Origination
first, then take the sum of Orig Balance
:
sums = df.groupby('Origination')['Orig Balance'].sum().reset_index()
Output:
>>> sums
Origination Orig Balance
0 Q1 9000
1 Q2 6000
2 Q3 8000
CodePudding user response:
What about pandas.DataFrame.groupby?
df.groupby(by=["Origination"]).sum()
CodePudding user response:
Can use aggregate sum
df.groupby('Origination').agg('sum').reset_index()