Home > database >  Pandas - Datetime Manipulation
Pandas - Datetime Manipulation

Time:11-22

I have a dataframe like so:

              CREATED_AT             COUNT
'1990-01-01'  '2022-01-01 07:30:00'      5
'1990-01-02'  '2022-01-01 07:30:00'     10
...

Where the index is a date and the CREATED_AT column is a datetime that is the same value for all rows.

How can I update the CREATED_AT_COLUMN such that it inherits its date portion from the index? The result should look like:

              CREATED_AT             COUNT
'1990-01-01'  '1990-01-01 07:30:00'      5
'1990-01-02'  '1990-01-02 07:30:00'     10
...

Attempts at this result in errors like:

cannot add DatetimeArray and DatetimeArray

CodePudding user response:

You can use df.reset_index() to use the index as a column and then do a simple maniuplation to get the output you want like this:

# Creating a test df
import pandas as pd
from datetime import datetime, timedelta, date

df = pd.DataFrame.from_dict({
    "CREATED_AT": [datetime.now(), datetime.now()   timedelta(hours=1)],
    "COUNT": [5, 10]
})
df_with_index = df.set_index(pd.Index([date.today() - timedelta(days=10), date.today() - timedelta(days=9)]))

# Creating the column with the result
df_result = df_with_index.reset_index()
df_result["NEW_CREATED_AT"] = pd.to_datetime(df_result["index"].astype(str)   ' '   df_result["CREATED_AT"].dt.time.astype(str))

Result:

    index       CREATED_AT                  COUNT   NEW_CREATED_AT
0   2022-11-11  2022-11-21 16:15:31.520960  5   2022-11-11 16:15:31.520960
1   2022-11-12  2022-11-21 17:15:31.520965  10  2022-11-12 17:15:31.520965

CodePudding user response:

You can use:

# ensure CREATED_AT is a datetime
s = pd.to_datetime(df['CREATED_AT'])

# subtract the date to only get the time, add to the index
# ensuring the index is of datetime type
df['CREATED_AT'] = s.sub(s.dt.normalize()).add(pd.to_datetime(df.index))

If everything is already of datetime type, this simplifies to:

df['CREATED_AT'] = (df['CREATED_AT']
                    .sub(df['CREATED_AT'].dt.normalize())
                    .add(df.index)
                    )

Output:

                    CREATED_AT  COUNT
1990-01-01 1990-01-01 07:30:00      5
1990-01-02 1990-01-02 07:30:00     10
  • Related