Home > Blockchain >  Sum of range of rows in a dataframe column
Sum of range of rows in a dataframe column

Time:04-26

For the following csv file

ID,Kernel Time,device__attribute_warp_size,cycles_elapsed,time_duration
,,,cycle,msecond
0,2021-Dec-09 23:04:13,32,175013.666667,0.122208
1,2021-Dec-09 23:04:16,32,2988.833333,0.002592
2,2021-Dec-09 23:04:18,32,2911.666667,0.002624

I want to sum the values of a column, cycles_elapsed, but as you can see the first row is not a number. I wrote the following code, but the result is not what I expect.

import pandas as pd
import csv
df = pd.read_csv('test.csv', thousands=',', usecols=['ID', 'cycles_elapsed'])
print(df['cycles_elapsed'])
c_sum = df['cycles_elapsed'].loc[1:].sum()
print(c_sum)


$ python3 test.py 
0            cycle
1    175013.666667
2      2988.833333
3      2911.666667
Name: cycles_elapsed, dtype: object
175013.6666672988.8333332911.666667

How can I fix that?

CodePudding user response:

There is problem with second data of file, omit this row by skiprows=[1] parameter, so get numeric column with correct sum:

df = pd.read_csv('cycles_elapsed.csv', skiprows=[1], usecols=['ID', 'cycles_elapsed'])
print (df)
   ID  cycles_elapsed
0   0   175013.666667
1   1     2988.833333
2   2     2911.666667

print (df.dtypes)
ID                  int64
cycles_elapsed    float64
dtype: object

c_sum = df['cycles_elapsed'].sum()
print(c_sum)
180914.166667
  • Related