I am learning Python data analysis with pandas
I have a game selling data frame that would look like this :
(This data is not real, only for question purposes)
Name Year Publisher Total Sales
GTA V 2013 Rockstar 133000
Super Mario Bros 1985 Nintendo 430500
GTA VI 2025 Rockstar 86000
RDR 3 2025 Rockstar 129030
Super Mario Sister 1985 Nintendo 308900
Super Mario End 2000 Nintendo 112100
Then I drop the name and group it by Publisher name with this command :
df.drop(columns='Name', inplace=True)
df.groupby(['Publisher','Year','Total Sales']).sum().reset_index()
The dataframe now look like this :
Publisher Year Total Sales
Nintendo 1985 308900
Nintendo 1985 430500
Nintendo 2000 112100
Rockstar 2013 133000
Rockstar 2025 129030
Rockstar 2025 86000
This is good but I want to sum the total sales of the same year of the same publisher
I want the dataframe to look like this:
Publisher Year Total Sales
Nintendo 1985 739400
Nintendo 2000 86000
Rockstar 2013 129030
Rockstar 2025 215030
Is there a way to do that?
Here is my df code:
data = {'Name':['GTA V','Super Mario Bros','GTA VI','RDR 3','Super Mario Sister','Super Mario End'],'Year':['2013','1985','2025','2025','1985','2000'],
'Publisher':['Rockstar','Nintendo','Rockstar','Rockstar','Nintendo','Nintendo'],'Total Sales':['133000','430500','86000','129030','308900','112100']}
df = pd.DataFrame(data)
df
CodePudding user response:
Use pivot_table
:
>>> df.pivot_table('Total Sales', ['Year', 'Publisher'], aggfunc='sum').reset_index()
Year Publisher Total Sales
0 1985 Nintendo 739400
1 2000 Nintendo 112100
2 2013 Rockstar 133000
3 2025 Rockstar 215030
Note: if Total Sales
column contains strings, convert it to int
(or float
):
>>> df.astype({'Total Sales': int}).pivot_table(...)
CodePudding user response:
import pandas as pd
data = {'Name':['GTA V','Super Mario Bros','GTA VI','RDR 3','Super Mario Sister','Super Mario End'],'Year':['2013','1985','2025','2025','1985','2000'],
'Publisher':['Rockstar','Nintendo','Rockstar','Rockstar','Nintendo','Nintendo'],'Total Sales':['133000','430500','86000','129030','308900','112100']}
df = pd.DataFrame(data)
df['Total Sales'] = df['Total Sales'].astype(int)
df.groupby(['Year', 'Publisher'])['Total Sales'].agg('sum').reset_index()
CodePudding user response:
This is one way to do it:
df.drop(columns='Name', inplace=True)
df['Total Sales'] = pd.to_numeric(df['Total Sales'])
df2 = df.groupby(['Publisher','Year']).sum().reset_index()
df2