Working with numpy.where when datetime64 could not be promoted by str

import pandas as pd
from datetime import timedelta
import numpy as np

df = pd.DataFrame({
    'open_local_data':['2022-08-24 15:00:00','2022-08-24 18:00:00'],
    'result':['WINNER','']
})
df['open_local_data'] = pd.to_datetime(df['open_local_data'])
df['clock_now'] = np.where(
    df['result'] != '',
    df['open_local_data']   timedelta(minutes=150),
    ''
)
print(df[['open_local_data','clock_now']])

Since I must work using conditions and only later decide whether to handle changes in a column, what should I do in case I receive this error:

    df['clock_now'] = np.where(
  File "<__array_function__ internals>", line 180, in where
TypeError: The DType <class 'numpy.dtype[datetime64]'> could not be promoted by <class 'numpy.dtype[str_]'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtype[datetime64]'>, <class 'numpy.dtype[str_]'>)

CodePudding user response：

You can .astype(str) the addition so that NumPy is happy but at the end you'll have strings. Instead, you can use df.where:

df["clock_now"] = df["result"].where(df["result"].eq(""),
                                     other=df["open_local_data"].add(pd.Timedelta("150min")))

keep the "result" values as is where they are equal to empty string
and put local_data 150minutes to the other places

to get

>>> df

      open_local_data  result            clock_now
0 2022-08-24 15:00:00  WINNER  2022-08-24 17:30:00
1 2022-08-24 18:00:00

where df.at[0, "clock_now"] is actually a Timestamp, not string.

CodePudding user response：

Try this:

import pandas as pd
from datetime import timedelta
import numpy as np

df = pd.DataFrame({
    'open_local_data':['2022-08-24 15:00:00','2022-08-24 18:00:00'],
    'result':['WINNER','']
})
df['open_local_data'] = pd.to_datetime(df['open_local_data'])

df['clock_now'] = df.apply(lambda row: row.open_local_data   timedelta(minutes=150) if row.result != '' else np.nan,axis=1)
df