Home > Blockchain >  Replacing one data frame value from another based on timestamp Criterion
Replacing one data frame value from another based on timestamp Criterion

Time:11-23

I am new to Python and may be it's a very basic question for many here so apologies in advance and please bear with me. I have a timeseries of water level records where the timestamps are not continuous. I want to create a new timeseries which is continuous and for the intervals where there is no data I want to assign nan. I created a continuous time series with Nan values for level. I am trying to fill the observed values in it using df.replace function in a iterative way but I cannot produce what I want. Here is the same of my code:

Input data example:
Time stamp                 Level
2020-06-18 18:00:00        161.287
2020-06-18 21:00:00        161.286
2020-06-19 12:00:00        161.283
2020-06-19 15:00:00        161.283 

dti = dti = pd.date_range("2020-05-01", periods=1224, freq="3H")
dti_df = pd.DataFrame(dti, columns=['Timestamp'])
dti_df["Level"] = np.nan
dti_df
df3 = pd.read_csv(r'C:\Users\krusm\Documents\water Levels Resampled.csv')
for i in dti_df.index:
    for j in df3.index:
        if dti_df['Timestamp'][i] == df3['Timestamp'][j]:
            dti_df['Level'][i].replace(df3['Level'][j], inplace = True)
          
        else:
            pass
dti_df

This code runs without any error however produces the Nan timeseries. NaN data frame created through first part of the code:

Time stamp                 Level
2020-06-18 18:00:00        NaN
2020-06-18 21:00:00        NaN
2020-06-19 00:00:00        NaN
2020-06-18 03:00:00        NaN
2020-06-19 06:00:00        NaN
2020-06-19 09:00:00        NaN
2020-06-19 12:00:00        NaN
2020-06-19 15:00:00        NaN
Output Expectation:
Time stamp                 Level
2020-06-18 18:00:00        161.287
2020-06-18 21:00:00        161.286
2020-06-19 00:00:00        NaN
2020-06-18 03:00:00        NaN
2020-06-19 06:00:00        NaN
2020-06-19 09:00:00        NaN
2020-06-19 12:00:00        161.283
2020-06-19 15:00:00        161.283

CodePudding user response:

This answer assumes that your input df, df3, has had the column renamed to Timestamp and the dtype converted to datetime. In this case, you can just use merge to make your life much easier

dti = pd.date_range("2020-05-01", periods=1224, freq="3H")
dti_df = pd.DataFrame(dti, columns=['Timestamp'])
new_df=pd.merge(left=dti_df,right=df3, on='Timestamp',how='left')

new_df.iloc[390:400,:]

    Timestamp           Level
390 2020-06-18 18:00:00 161.287
391 2020-06-18 21:00:00 161.286
392 2020-06-19 00:00:00 NaN
393 2020-06-19 03:00:00 NaN
394 2020-06-19 06:00:00 NaN
395 2020-06-19 09:00:00 NaN
396 2020-06-19 12:00:00 161.283
397 2020-06-19 15:00:00 161.283
398 2020-06-19 18:00:00 NaN
399 2020-06-19 21:00:00 NaN
  • Related