Suppose we had the following dataframe-
How can I create the fourth column 'Invalid dates' as specified below using the first three columns in the dataframe?
Name Date1 Date2 Invalid dates
0 A 01-02-2022 03-04-2000 None
1 B 23 12-12-2012 Date1
2 C 18-04-1993 abc Date2
3 D 45 qcf Date1, Date2
CodePudding user response:
You can select the Dates column with filter
(or any other method, including a manual list), compute a Series of invalid dates by converting to_datetime
and sub-selecting the NaN values (i.e. invalid dates) with isna
,then stack
and join
to the original DataFrame:
s = (df
.filter(like='Date') # keep only "Date" columns
# convert to datetime, NaT will be invalid dates
.apply(lambda s: pd.to_datetime(s, format='%d-%m-%Y', errors='coerce'))
.isna()
# reshape to long format (Series)
.stack()
)
out = (df
.join(s[s].reset_index(level=1) # keep only invalid dates
.groupby(level=0)['level_1'] # for all initial indices
.agg(','.join) # join the column names
.rename('Invalid Dates')
)
)
alternative with melt
to reshape the DataFrame:
cols = df.filter(like='Date').columns
out = df.merge(
df.melt(id_vars='Name', value_vars=cols, var_name='Invalid Dates')
.assign(value=lambda d: pd.to_datetime(d['value'], format='%d-%m-%Y',
errors='coerce'))
.loc[lambda d: d['value'].isna()]
.groupby('Name')['Invalid Dates'].agg(','.join),
left_on='Name', right_index=True, how='left'
)
output:
Name Date1 Date2 Invalid Dates
0 A 01-02-2022 03-04-2000 NaN
1 B 23 12-12-2012 Date1
2 C 18-04-1993 abc Date2
3 D 45 qcf Date1,Date2
CodePudding user response:
Use DataFrame.filter
for filter columns with substring Date
, then convert to datetimes by to_datetime
all columns of df1
with errors='coerce'
for missing values if no match, so possible test them by DataFrame.isna
and by DataFrame.dot
extract columnsnames separated by ,
:
df1 = df.filter(like='Date')
df['Invalid dates']=((df1.apply(lambda x:pd.to_datetime(x,format='%d-%m-%Y',errors='coerce'))
.isna() & df1.notna())
.dot(df1.columns ',')
.str[:-1]
.replace('', np.nan))
print (df)
Name Date1 Date2 Invalid dates
0 A 01-02-2022 03-04-2000 NaN
1 B 23 12-12-2012 Date1
2 C 18-04-1993 abc Date2
3 D 45 qcf Date1,Date2