Home > Net >  Getting row sums for specific columns - Python
Getting row sums for specific columns - Python

Time:09-13

I created a simple dataset df with three columns, Area, Year_2010, Year_2020. The related code:

# Import pandas library

import pandas as pd
  
# initialize list of lists
data = [['Netherlands', 100, 200], ['Belgium', 15, 80], ['Germany', 125, 300]]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Area', 'Year_2010', 'Year_2020'])
  
# print dataframe.
df

#prints
          Area  Year_2010  Year_2020
0  Netherlands        100        200
1      Belgium         15         80
2      Germany        125        300

For two of the three columns (being Year_2010 and Year_2020), I would like to obtain the sum of all related rows. The output of the sum should appear in a new row for both columns.

Ideally, the output reads:

                        Area  Year_2010  Year_2020
0                Netherlands        100        200
1                    Belgium         15         80
2                    Germany        125        300
3                                   240        580

However, if I apply

total = df.sum(axis=0)
df = df.append(total, ignore_index=True)

The Area column will also be aggregated, leading to concatenation of country names in the Area column:

                        Area  Year_2010  Year_2020
0                Netherlands        100        200
1                    Belgium         15         80
2                    Germany        125        300
3  NetherlandsBelgiumGermany        240        580

How can I get the row sums for specific columns, while leaving out other columns?

As an alternative to leaving the value NetherlandsBelgiumGermany blank, the value can be replaced by Total.

CodePudding user response:

One solution could be to filter the columns by type and use indexing:

cols = df.select_dtypes('number').columns

df.loc[len(df), cols] = df[cols].sum()

updated DataFrame:

          Area  Year_2010  Year_2020
0  Netherlands      100.0      200.0
1      Belgium       15.0       80.0
2      Germany      125.0      300.0
3          NaN      240.0      580.0

Alternative if you want the empty string:

cols = df.select_dtypes('number').columns

df.loc[len(df)] = df[cols].sum().reindex(df.columns, fill_value='')

output:

          Area  Year_2010  Year_2020
0  Netherlands        100        200
1      Belgium         15         80
2      Germany        125        300
3                     240        580

NB. this assumes a range index, if this is not the case, please update the example for alternative ways (using pandas.concat for example).

CodePudding user response:

You could do that manually:

df.at[3, "Area"] = "" # or whatever else you want
  • Related