I am new to Python and may be it's a very basic question for many here so apologies in advance and please bear with me. I have a timeseries of water level records where the timestamps are not continuous. I want to create a new timeseries which is continuous and for the intervals where there is no data I want to assign nan. I created a continuous time series with Nan values for level. I am trying to fill the observed values in it using df.replace function in a iterative way but I cannot produce what I want. Here is the same of my code:
Input data example:
Time stamp Level
2020-06-18 18:00:00 161.287
2020-06-18 21:00:00 161.286
2020-06-19 12:00:00 161.283
2020-06-19 15:00:00 161.283
dti = dti = pd.date_range("2020-05-01", periods=1224, freq="3H")
dti_df = pd.DataFrame(dti, columns=['Timestamp'])
dti_df["Level"] = np.nan
dti_df
df3 = pd.read_csv(r'C:\Users\krusm\Documents\water Levels Resampled.csv')
for i in dti_df.index:
for j in df3.index:
if dti_df['Timestamp'][i] == df3['Timestamp'][j]:
dti_df['Level'][i].replace(df3['Level'][j], inplace = True)
else:
pass
dti_df
This code runs without any error however produces the Nan timeseries. NaN data frame created through first part of the code:
Time stamp Level
2020-06-18 18:00:00 NaN
2020-06-18 21:00:00 NaN
2020-06-19 00:00:00 NaN
2020-06-18 03:00:00 NaN
2020-06-19 06:00:00 NaN
2020-06-19 09:00:00 NaN
2020-06-19 12:00:00 NaN
2020-06-19 15:00:00 NaN
Output Expectation:
Time stamp Level
2020-06-18 18:00:00 161.287
2020-06-18 21:00:00 161.286
2020-06-19 00:00:00 NaN
2020-06-18 03:00:00 NaN
2020-06-19 06:00:00 NaN
2020-06-19 09:00:00 NaN
2020-06-19 12:00:00 161.283
2020-06-19 15:00:00 161.283
CodePudding user response:
This answer assumes that your input df, df3
, has had the column renamed to Timestamp
and the dtype converted to datetime. In this case, you can just use merge
to make your life much easier
dti = pd.date_range("2020-05-01", periods=1224, freq="3H")
dti_df = pd.DataFrame(dti, columns=['Timestamp'])
new_df=pd.merge(left=dti_df,right=df3, on='Timestamp',how='left')
new_df.iloc[390:400,:]
Timestamp Level
390 2020-06-18 18:00:00 161.287
391 2020-06-18 21:00:00 161.286
392 2020-06-19 00:00:00 NaN
393 2020-06-19 03:00:00 NaN
394 2020-06-19 06:00:00 NaN
395 2020-06-19 09:00:00 NaN
396 2020-06-19 12:00:00 161.283
397 2020-06-19 15:00:00 161.283
398 2020-06-19 18:00:00 NaN
399 2020-06-19 21:00:00 NaN