I am analysing data in csv files which is sorted by Date and Time Column (Sdate) seen as below (Note: this is all one column): Sdate 01/01/2016 00:00 01/01/2016 01:00 01/01/2016 02:00 etc
However when the data to be analysed is split into 15 minute intervals. Example seen below:
Sdate 01/01/2016 00:00 01/01/2016 00:15 01/01/2016 00:30 etc
The output then seems to group my data hourly anyway and also misses data as it continues.
Currently I am reading in all csv files in the directory and sorting them. I used pd.to_datetime function which worked for the hourly intervals but not the 15 minute ones:
for file_ in allFiles:
df = df = pd.read_csv(file_,index_col=None, header=0, low_memory=False)
df['Sdate'] = pd.to_datetime(df['Sdate'])
df.reset_index()
list_.append(df)
Does anyone know if this is an issue with pd.to_datetime or is it possibly an issue with the way I have grouped the contents hourly see below:
hourly = grouped.aggregate(np.sum).reset_index()
Any help would be greatly appreciated. Thank you!
CodePudding user response:
Pandas way of solving this The pandas.read_csv() function has a keyword argument called parse_dates
Using this you can on the fly convert strings, floats or integers into datetimes using the default date_parser (dateutil.parser.parser)
pd.read_csv(file, header=None, names=headers, dtype=dtypes, parse_dates='Sdate')