Home > database >  Add additional timestamp to Pandas DataFrame items based on item timestamp/index
Add additional timestamp to Pandas DataFrame items based on item timestamp/index

Time:07-19

I have a large time-indexed Pandas DataFrame with time-series data of a couple of devices. The structure of this DataFrame (in code below self._combined_data_frame) looks like this:

DateTimeIndex|device_name|col1|...|colN

The DateTimeIndex and device_name are filled for every row, the other columns contain nan values. Sample data is available on Google Drive:

enter image description here

CodePudding user response:

IIUC use merge_asof:

reference_timestamps = ['2017-12-31','2019-01-31','2020-12-31']

df1 = pd.DataFrame({'reference_timestamp': pd.to_datetime(reference_timestamps)})

DataFrame = pd.merge_asof(DataFrame.reset_index(), 
                          df1, 
                          left_on='DateTimeIndex', 
                          right_on='reference_timestamp',
                          direction='forward')
print (DataFrame)
  DateTimeIndex device_name  col reference_timestamp
0    2016-11-30           A  NaN          2017-12-31
1    2017-04-30           A  NaN          2017-12-31
2    2018-01-31           A  NaN          2019-01-31
3    2019-09-30           A  NaN          2020-12-31
4    2020-04-30           A  NaN          2020-12-31
5    2020-11-30           A  NaN          2020-12-31

CodePudding user response:

Using jezrael's solution I modified my code such that it works with the sample data provided. This is my working solution:

Method integrate_trafo_conacq_rawdataneeded to be adjusted:

def integrate_trafo_contacq_rawdata(self, nomen: str, groupby_column: str, drop_columns: list):
    print('opening:', nomen)
    device_dataset: DataFrame = DataFrame(pd.read_hdf(self._input_file_name, key=nomen)).groupby(groupby_column).sum()
    device_dataset = device_dataset.drop(columns=drop_columns)
    for column in device_dataset.columns:
        device_dataset = device_dataset.rename(columns={column: '_'.join([column, 'sum'])})

    self._lastinjection_items = device_dataset.index.to_frame()

    device_dataset['nomen'] = '_'.join([nomen.split('_')[0], 'integral'])
    device_dataset.index = pd.to_datetime(device_dataset.index)

    self._combined_data_frame = pd.concat([self._combined_data_frame, device_dataset])

    return self

Updated method add_lastinjection_markers

def add_lastinjection_markers(self):
    reference_timestamps = self._lastinjection_items.rename(columns={'lastinjection': 'reference_timestamp'})
    frame = self._combined_data_frame.reset_index().rename(columns={'index': 'time'})
    frame['time'] = frame['time'].apply(lambda item: item.value)

    self._combined_data_frame = pd.merge_asof(frame.sort_values(by='time'),
                              reference_timestamps,
                              left_on='time',
                              right_on='reference_timestamp',
                              direction='forward')
  • Related