I have the df bellow, and I am trying to make a status column with specific dates by row
But I get the informativos about DF object instead the value of row.
Code:
import pandas as pd
import numpy as np
data = {'local': ['client', 'hub'],
'delivery_date': [pd.to_datetime('2022-04-24'), 'null'],
'estimated_date': [pd.to_datetime('2022-04-24'), pd.to_datetime('2022-04-26')],
'max_date': ['delivery_date', 'estimated_date']
}
df = pd.DataFrame(data)
cond = [(df["max_date"] == "delivery_date") & (df["local"] == "client"),
(df["max_date"] == "estimated_date") & (df["local"] == "hub")]
choices = ["It was delivered to the customer on the date {}".format(df["delivery_date"]),
"delivery forecast for {}".format(df["estimated_date"])]
df["status"] = np.select(cond, choices, default = np.nan)
CodePudding user response:
You could use string concatenation:
choices = ["It was delivered to the customer on the date " df["delivery_date"].astype(str),
"delivery forecast for " df["estimated_date"].astype(str)]
df["status"] = np.select(cond, choices, default = np.nan)
As @Parfait notes, if you have pandas>=1.0.0, please use astype("string") for the StringDtype introduced. So
choices = ["It was delivered to the customer on the date " df["delivery_date"].astype('string'),
"delivery forecast for " df["estimated_date"].astype('string')]
Output:
local delivery_date estimated_date max_date status
0 client 2022-04-24 00:00:00 2022-04-24 delivery_date It was delivered to the customer on the date 2...
1 hub null 2022-04-26 estimated_date delivery forecast for 2022-04-26
CodePudding user response:
Since Series objects contain many values and str.format
expects a singular value, consider Series string concatenation with Series.str.cat
.
...
# ADD HELPER COLUMNS
df["delivery_note"] = "It was delivered to the customer on the date "
df["forecast_note"] = "delivery forecast for "
choices = [
df["delivery_note"].str.cat(df["delivery_date"]),
df["forecast_note"].str.cat(df["estimated_date"])
]
df["status"] = np.select(cond, choices, default = np.nan)
# REMOVE HELPER COLUMNS
df.drop(["delivery_note", "forecast_note"], axis="columns", inplace=True)