Home > OS >  Subtract each index row value (which is '<M8[ns]') from column header (which is of Time
Subtract each index row value (which is '<M8[ns]') from column header (which is of Time

Time:09-29

I have a df, which has multiple timestamp columns (the column headers themselves are dates/times) and have an index of time rows (every 2 min per row). Eg:

timestamp          2021-08-20 08:00:00    2021-08-27 08:00:00     2021-10-08 08:00:00
2021-08-17 00:02:00     100                      150                       200
2021-08-17 00:04:00     100                      150                       200
2021-08-17 00:06:00     100                      150                       200

I want to find the number of days (possibly in float) between the column header and the index time for each row. SO the desired output should look like

timestamp          2021-08-20 08:00:00    2021-08-27 08:00:00     2021-10-08 08:00:00
2021-08-17 00:02:00     3.                      10.                       51.
2021-08-17 00:04:00     3.                      10.                       51.
2021-08-17 00:06:00     3.                      10.                       51.

Can you please advise how i can efficiently perform this sort of subtraction on timestamp/datetimes

CodePudding user response:

You can get the datetime difference (timedelta) between column header and the index time. Then, use .total_seconds() and divide by 86400 (seconds in one day) to get the number of days in float, as follows:

# convert to datetime/timestamp if not already in the format
df.index = pd.to_datetime(df.index)
df.columns = pd.to_datetime(df.columns)

df.apply(lambda x: (x.name - x.index).total_seconds() / 86400)

x.name to get the column label (timestamp) and x.index to get the index.

Result:

                     2021-08-20 08:00:00  2021-08-27 08:00:00  2021-10-08 08:00:00
timestamp                                                                         
2021-08-17 00:02:00             3.331944            10.331944            52.331944
2021-08-17 00:04:00             3.330556            10.330556            52.330556
2021-08-17 00:06:00             3.329167            10.329167            52.329167

CodePudding user response:

If performance is important use broadcasting for subtract columns with index values in numpy:

df.columns = pd.to_datetime(df.columns)
df.index = pd.to_datetime(df.index)

arr = (df.columns.to_numpy()- df.index.to_numpy()[:, None]) / np.timedelta64(1, 'D')

df = pd.DataFrame(arr, index=df.index, columns=df.columns)
print (df)
                     2021-08-20 08:00:00  2021-08-27 08:00:00  \
timestamp                                                       
2021-08-17 00:02:00             3.331944            10.331944   
2021-08-17 00:04:00             3.330556            10.330556   
2021-08-17 00:06:00             3.329167            10.329167   

                     2021-10-08 08:00:00  
timestamp                                 
2021-08-17 00:02:00            52.331944  
2021-08-17 00:04:00            52.330556  
2021-08-17 00:06:00            52.329167  
  • Related