Home > front end >  How to sum the total value of the same year? (Pandas)
How to sum the total value of the same year? (Pandas)

Time:01-04

I am learning Python data analysis with pandas

I have a game selling data frame that would look like this :

(This data is not real, only for question purposes)

Name                Year    Publisher   Total Sales
GTA V               2013    Rockstar    133000
Super Mario Bros    1985    Nintendo    430500
GTA VI              2025    Rockstar    86000
RDR 3               2025    Rockstar    129030
Super Mario Sister  1985    Nintendo    308900
Super Mario End     2000    Nintendo    112100

Then I drop the name and group it by Publisher name with this command :

df.drop(columns='Name', inplace=True)
df.groupby(['Publisher','Year','Total Sales']).sum().reset_index()

The dataframe now look like this :

Publisher   Year    Total Sales
Nintendo    1985    308900
Nintendo    1985    430500
Nintendo    2000    112100
Rockstar    2013    133000
Rockstar    2025    129030
Rockstar    2025    86000

This is good but I want to sum the total sales of the same year of the same publisher

I want the dataframe to look like this:

Publisher   Year    Total Sales
Nintendo    1985    739400
Nintendo    2000    86000
Rockstar    2013    129030
Rockstar    2025    215030

Is there a way to do that?

Here is my df code:

data = {'Name':['GTA V','Super Mario Bros','GTA VI','RDR 3','Super Mario Sister','Super Mario End'],'Year':['2013','1985','2025','2025','1985','2000'],
        'Publisher':['Rockstar','Nintendo','Rockstar','Rockstar','Nintendo','Nintendo'],'Total Sales':['133000','430500','86000','129030','308900','112100']}

df = pd.DataFrame(data)
df

CodePudding user response:

Use pivot_table:

>>> df.pivot_table('Total Sales', ['Year', 'Publisher'], aggfunc='sum').reset_index()

   Year Publisher  Total Sales
0  1985  Nintendo       739400
1  2000  Nintendo       112100
2  2013  Rockstar       133000
3  2025  Rockstar       215030

Note: if Total Sales column contains strings, convert it to int (or float):

>>> df.astype({'Total Sales': int}).pivot_table(...)

CodePudding user response:

import pandas as pd

data = {'Name':['GTA V','Super Mario Bros','GTA VI','RDR 3','Super Mario Sister','Super Mario End'],'Year':['2013','1985','2025','2025','1985','2000'],
        'Publisher':['Rockstar','Nintendo','Rockstar','Rockstar','Nintendo','Nintendo'],'Total Sales':['133000','430500','86000','129030','308900','112100']}

df = pd.DataFrame(data)
df['Total Sales'] = df['Total Sales'].astype(int)


df.groupby(['Year', 'Publisher'])['Total Sales'].agg('sum').reset_index()

CodePudding user response:

This is one way to do it:

df.drop(columns='Name', inplace=True)
df['Total Sales'] = pd.to_numeric(df['Total Sales'])
df2 = df.groupby(['Publisher','Year']).sum().reset_index()
df2
  • Related