Previously I posted a question to calculate the average value for every 10 rows. It was successful in Zero's coding. Calculating the average value for every 10 cells in each column by pandas
However, there is an error in calculating the average value of the time date
import numpy as np
location2='C:\\Users\\Poon\\Downloads\\20211014_SBS_BEMS\\20211014_SBS_BEMS\\Test1044.csv'
csvfiles2=glob.glob(location2)
df3=pd.DataFrame()
for file_new_2 in csvfiles2:
df3=pd.read_csv(file_new_2)
df4=pd.concat([pd.to_datetime(df3.iloc[:,0]), df3.iloc[:, 1:].apply(pd.to_numeric)], axis = 1)
df4.dropna(inplace = True)
df4= df4.groupby(np.arange(len(df4))//10).mean()
print(df4)
The error code is
Unable to parse string "2019-05-19 00:00:00" at position 0
I guess the commend pd.to_datetime cannot be summed up then divided by 10?
Here are some of the data from my excel, but totally there are 100k rows.
19/5/2019 0:00 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:01 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:02 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:03 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:04 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:05 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:06 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:07 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:08 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:09 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:10 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:11 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:12 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:13 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:14 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:15 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:16 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:17 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:18 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:19 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:20 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:21 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:22 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:23 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:24 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:25 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:26 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:27 8840 20 237 64.93 82.35 16.15 46.88
CodePudding user response:
Assuming column 0
in the provided example, convert the timestamps to integer, group by the floor division of the index by 10 and aggregate:
import numpy as np
out = pd.to_datetime(pd.to_datetime(df[0])
.astype(np.int64)
.groupby(df.index//10)
.mean())
Output:
0 2019-05-19 00:04:30
1 2019-05-19 00:14:30
2 2019-05-19 00:23:30
Name: 0, dtype: datetime64[ns]
CodePudding user response:
You can use resample
:
>>> (df4.assign(**{'dt': pd.to_datetime(df.iloc[:, 0])})[1:]
.resample('10T', on='dt').mean())
1 2 4 5 6
dt
2019-05-19 00:00:00 8840.0 20.0 82.35 16.15 46.88
2019-05-19 00:10:00 8840.0 20.0 82.35 16.15 46.88
2019-05-19 00:20:00 8840.0 20.0 82.35 16.15 46.88
It can be easier if your columns have names.