My problem seems very simple but I can't find answer for that. I am trying to use sum() in pandas to calculate how many women and men are attempting suicide in Albania base on dataset from kaggle.
code:
import pandas as pd
pd.options.mode.chained_assignment = None
#Create a dataframe
suicide = pd.read_csv('who_suicide_statistics.csv', header=None)
#Rename column names because it was int
suicide = suicide.rename(columns={0: 'country', 1: 'year', 2: 'sex', 3: 'age', 4:'suicides_no', 5: 'population'})
#Delete first row because it was a duplicate with column names
suicide = suicide.iloc[1: , :]
#Filter values only with Albania
albania_suicide = suicide.loc[(suicide['country'] == 'Albania')]
#Delete rows with Nan values
albania_suicide.dropna(subset=['suicides_no'], inplace=True)
# Is it more women or men who attempts suicide?
print(albania_suicide.loc[albania_suicide['sex'] == 'female', 'suicides_no'].sum())
And the output is:
"144600185403252701074201010771206420121378222161091122116760232109160191351646350029412262147151411491309620111733300000000000013814093209191750000006612272"
And these are numbers are display one by one as if they were treated like a string. It should be 14 4 6 0 0 18 ....
CodePudding user response:
Probably your suicides_no
column is of string type (strings of numbers). If you are sure the values should be all numeric, you can try, e.g.
df['suicides_no'].astype(int).sum()
or permanently convert your datatype to int before your code to sum it:
df['suicides_no'] = df['suicides_no'].astype(int)
# then sum
df['suicides_no'].sum()
CodePudding user response:
Did you try something like this? Its hard to know why you are concatenating instead of summing without that line of code.
total = df['suicides_no'].astype(int).sum()