For the following csv file
ID,Kernel Time,device__attribute_warp_size,cycles_elapsed,time_duration
,,,cycle,msecond
0,2021-Dec-09 23:04:13,32,175013.666667,0.122208
1,2021-Dec-09 23:04:16,32,2988.833333,0.002592
2,2021-Dec-09 23:04:18,32,2911.666667,0.002624
I want to sum the values of a column, cycles_elapsed, but as you can see the first row is not a number. I wrote the following code, but the result is not what I expect.
import pandas as pd
import csv
df = pd.read_csv('test.csv', thousands=',', usecols=['ID', 'cycles_elapsed'])
print(df['cycles_elapsed'])
c_sum = df['cycles_elapsed'].loc[1:].sum()
print(c_sum)
$ python3 test.py
0 cycle
1 175013.666667
2 2988.833333
3 2911.666667
Name: cycles_elapsed, dtype: object
175013.6666672988.8333332911.666667
How can I fix that?
CodePudding user response:
There is problem with second data of file, omit this row by skiprows=[1]
parameter, so get numeric column with correct sum
:
df = pd.read_csv('cycles_elapsed.csv', skiprows=[1], usecols=['ID', 'cycles_elapsed'])
print (df)
ID cycles_elapsed
0 0 175013.666667
1 1 2988.833333
2 2 2911.666667
print (df.dtypes)
ID int64
cycles_elapsed float64
dtype: object
c_sum = df['cycles_elapsed'].sum()
print(c_sum)
180914.166667