Home > other >  Python Pandas: sum() adds int values in data frame as string instead of integers
Python Pandas: sum() adds int values in data frame as string instead of integers

Time:10-06

My problem seems very simple but I can't find answer for that. I am trying to use sum() in pandas to calculate how many women and men are attempting suicide in Albania base on dataset from kaggle.

code:

import pandas as pd
pd.options.mode.chained_assignment = None

#Create a dataframe
suicide = pd.read_csv('who_suicide_statistics.csv', header=None)

#Rename column names because it was int
suicide = suicide.rename(columns={0: 'country', 1: 'year', 2: 'sex', 3: 'age', 4:'suicides_no', 5: 'population'})
#Delete first row because it was a duplicate with column names
suicide = suicide.iloc[1: , :]

#Filter values only with Albania
albania_suicide = suicide.loc[(suicide['country'] == 'Albania')]

#Delete rows with Nan values
albania_suicide.dropna(subset=['suicides_no'], inplace=True)

# Is it more women or men who attempts suicide?
print(albania_suicide.loc[albania_suicide['sex'] == 'female', 'suicides_no'].sum())

And the output is:

"144600185403252701074201010771206420121378222161091122116760232109160191351646350029412262147151411491309620111733300000000000013814093209191750000006612272" 

And these are numbers are display one by one as if they were treated like a string. It should be 14 4 6 0 0 18 ....

CodePudding user response:

Probably your suicides_no column is of string type (strings of numbers). If you are sure the values should be all numeric, you can try, e.g.

df['suicides_no'].astype(int).sum()

or permanently convert your datatype to int before your code to sum it:

df['suicides_no'] = df['suicides_no'].astype(int)

# then sum
df['suicides_no'].sum()

CodePudding user response:

Did you try something like this? Its hard to know why you are concatenating instead of summing without that line of code.

total = df['suicides_no'].astype(int).sum()
  • Related