I have a pandas df as follows:
Name Cust1 Cust2 Cust3 Cust4
ABC Y N Y 2022-01-01
DEF N N N N
I am looking to detect if a date is in a row for Cust1, Cust2, Cust3 and Cust4
and if so create a column to populate that date.
So that output would look like
Name Date
ABC 2022-01-01
DEF na
Any ideas on how I can do this?
I am trying to do df.iloc[:,1:].apply(np.where<xxx>
but not sure of how to approach this from here on.
Thanks!
CodePudding user response:
Try convert values of all columns to datetimes by to_datetime
with errors='coerce'
for missing values if not datetimelike values and then get maximal dates per rows:
f = lambda x: pd.to_datetime(x, errors='coerce')
df = df.set_index('Name').apply(f).max(axis=1).reset_index(name='Date')
print (df)
Name Date
0 ABC 2022-01-01
1 DEF NaT
Alternative solution:
f = lambda x: pd.to_datetime(x, errors='coerce')
df = df[['Name']].join(df.iloc[:,1:].apply(f).max(axis=1).rename('Date'))
print (df)
Name Date
0 ABC 2022-01-01
1 DEF NaT
CodePudding user response:
You can flatten your dataframe then keep the most recent date per Name:
to_date = lambda x: pd.to_datetime(x['Date'], errors='coerce')
out = df.melt('Name', value_name='Date').assign(Date=to_date) \
.groupby('Name', as_index=False)['Date'].max()
print(out)
# Output
Name Date
0 ABC 2022-01-01
1 DEF NaT