Home > Software engineering >  How to convert data into time series
How to convert data into time series

Time:11-03

I have : (dateTime is in string format)

df
dateTime                  level  
2020-10-31T23:18:00.000   slow     
2020-10-31T23:25:00.000   moderate

I want to convert this to time series in 1min level by replicating the 'level' for every minute until the next instance where there is a change:

df
dateTime                level   
2020-10-31 23:18:00     slow      
2020-10-31 23:19:00     slow
2022-10-31 23:20:00     slow
......
2020-10-31 23:25:00     moderate     
2022-10-31 23:26:00     moderate       

How do I, first convert the string dateTime format into datetime dtype, and convert the dataframe into time series?

CodePudding user response:

Create DatetimeIndex and use DataFrame.asfreq:

df['dateTime'] = pd.to_datetime(df['dateTime'])
df = df.set_index('dateTime').asfreq('1Min', method='ffill')
print (df)
                        level
dateTime                     
2020-10-31 23:18:00      slow
2020-10-31 23:19:00      slow
2020-10-31 23:20:00      slow
2020-10-31 23:21:00      slow
2020-10-31 23:22:00      slow
2020-10-31 23:23:00      slow
2020-10-31 23:24:00      slow
2020-10-31 23:25:00  moderate

CodePudding user response:

I personally like resample to do that. Resample has the advantage to produce "even" results (e.g. full minutes), while asfreq would keep seconds. More info on that

More info on fillna methods

import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO('''
dateTime level
2020-10-31T23:18:00.000 slow
2020-10-31T23:25:00.000 moderate
'''), sep=" ") #  , infer_datetime_format=True, parse_dates=["dateTime"], index_col="dateTime"

df.dateTime = pd.to_datetime(df.dateTime) # string to datetime

df = df.set_index('dateTime') # datetime index needed for resample

df = df.resample("1min").fillna("pad") # use strings like '1h' 

Produces:

dateTime level
2020-10-31 23:18:00 slow
2020-10-31 23:19:00 slow
2020-10-31 23:20:00 slow
2020-10-31 23:21:00 slow
2020-10-31 23:22:00 slow
2020-10-31 23:23:00 slow
2020-10-31 23:24:00 slow
2020-10-31 23:25:00 moderate

You can also do the conversion of datetime and set index while reading:

df = pd.read_csv(StringIO('''
dateTime level
2020-10-31T23:18:00.000 slow
2020-10-31T23:25:00.000 moderate
'''), sep=" ", infer_datetime_format=True, parse_dates=["dateTime"], index_col="dateTime")
  • Related