I have a data frame dft
:
Date Total Value
02/01/2022 2
03/01/2022 6
03/08/2022 4
03/11/2022
03/15/2022 4
05/01/2022 4
I want to calculate the total value in March, I used the following code:
Mar22 = dft.loc[dft['Date'].between('03/01/2022', '03/31/2022', inclusive='both'),'Total Value'].sum()
03/11/2022 has a null value, which caused an error. What should I add to my code so that I only sum the values that are not null?
Would that be isnull() == False
?
CodePudding user response:
This issue is that you have an empty string (it should rather be a NaN).
You can ensure having only numbers with pandas.to_numeric
:
out = (pd.to_numeric(df['Total Value'], errors='coerce')[dft['Date']
.between('03/01/2022', '03/31/2022', inclusive='both')].sum()
)
Or if you only have empty strings as non numeric values:
out = (dft.loc[dft['Date'].between('03/01/2022', '03/31/2022', inclusive='both'), 'Total Value']
.replace('', float('nan')).sum()
)
output: 14.0
CodePudding user response:
Try the pandas
built-in notnull()
function.
Mar22 = dft.loc[dft['Total Value'].notnull()].loc[dft['Date'].between('03/01/2022', '03/31/2022', inclusive='both'),'Total Value'].sum()