I have : (dateTime is in string format)
df
dateTime level
2020-10-31T23:18:00.000 slow
2020-10-31T23:25:00.000 moderate
I want to convert this to time series in 1min level by replicating the 'level' for every minute until the next instance where there is a change:
df
dateTime level
2020-10-31 23:18:00 slow
2020-10-31 23:19:00 slow
2022-10-31 23:20:00 slow
......
2020-10-31 23:25:00 moderate
2022-10-31 23:26:00 moderate
How do I, first convert the string dateTime format into datetime dtype, and convert the dataframe into time series?
CodePudding user response:
Create DatetimeIndex
and use DataFrame.asfreq
:
df['dateTime'] = pd.to_datetime(df['dateTime'])
df = df.set_index('dateTime').asfreq('1Min', method='ffill')
print (df)
level
dateTime
2020-10-31 23:18:00 slow
2020-10-31 23:19:00 slow
2020-10-31 23:20:00 slow
2020-10-31 23:21:00 slow
2020-10-31 23:22:00 slow
2020-10-31 23:23:00 slow
2020-10-31 23:24:00 slow
2020-10-31 23:25:00 moderate
CodePudding user response:
I personally like resample
to do that. Resample has the advantage to produce "even" results (e.g. full minutes), while asfreq
would keep seconds. More info on that
More info on fillna methods
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO('''
dateTime level
2020-10-31T23:18:00.000 slow
2020-10-31T23:25:00.000 moderate
'''), sep=" ") # , infer_datetime_format=True, parse_dates=["dateTime"], index_col="dateTime"
df.dateTime = pd.to_datetime(df.dateTime) # string to datetime
df = df.set_index('dateTime') # datetime index needed for resample
df = df.resample("1min").fillna("pad") # use strings like '1h'
Produces:
dateTime | level |
---|---|
2020-10-31 23:18:00 | slow |
2020-10-31 23:19:00 | slow |
2020-10-31 23:20:00 | slow |
2020-10-31 23:21:00 | slow |
2020-10-31 23:22:00 | slow |
2020-10-31 23:23:00 | slow |
2020-10-31 23:24:00 | slow |
2020-10-31 23:25:00 | moderate |
You can also do the conversion of datetime and set index while reading:
df = pd.read_csv(StringIO('''
dateTime level
2020-10-31T23:18:00.000 slow
2020-10-31T23:25:00.000 moderate
'''), sep=" ", infer_datetime_format=True, parse_dates=["dateTime"], index_col="dateTime")