Home > Software design >  Pandas df, detecting dates
Pandas df, detecting dates

Time:03-16

I have a pandas df as follows:

Name    Cust1   Cust2   Cust3      Cust4

ABC      Y       N       Y       2022-01-01
DEF      N       N       N          N

I am looking to detect if a date is in a row for Cust1, Cust2, Cust3 and Cust4 and if so create a column to populate that date. So that output would look like

Name    Date

ABC    2022-01-01
DEF      na

Any ideas on how I can do this? I am trying to do df.iloc[:,1:].apply(np.where<xxx> but not sure of how to approach this from here on.

Thanks!

CodePudding user response:

Try convert values of all columns to datetimes by to_datetime with errors='coerce' for missing values if not datetimelike values and then get maximal dates per rows:

f = lambda x: pd.to_datetime(x, errors='coerce')
df = df.set_index('Name').apply(f).max(axis=1).reset_index(name='Date')
print (df)
  Name       Date
0  ABC 2022-01-01
1  DEF        NaT

Alternative solution:

f = lambda x: pd.to_datetime(x, errors='coerce')
df = df[['Name']].join(df.iloc[:,1:].apply(f).max(axis=1).rename('Date'))
print (df)
  Name       Date
0  ABC 2022-01-01
1  DEF        NaT

CodePudding user response:

You can flatten your dataframe then keep the most recent date per Name:

to_date = lambda x: pd.to_datetime(x['Date'], errors='coerce')
out = df.melt('Name', value_name='Date').assign(Date=to_date) \
        .groupby('Name', as_index=False)['Date'].max()
print(out)

# Output
  Name       Date
0  ABC 2022-01-01
1  DEF        NaT
  • Related