I have the following:
import pandas as pd
file = pd.DataFrame()
file['CASH RECIEVED DATE'] = ['2018-07-23', '2019-09-26', '2017-05-02']
and I need to create a column called Cash Received Date
file['Cash Received Date']
such as if [CASH_RECIEVED_DATE]
is not null && [CASH RECIEVED_DATE]
<= 2022-09-01 then [Cash Received Date]
will be 2019-09-01
, otherwise it will be the value of [CASH_RECIEVED_DATE]
, so the output would be:
file['Cash Received Date'] = ['2019-09-01', '2019-09-26', '2019-09-01']
How do I achieve this by creating a function?
Many thanks, Rafa
CodePudding user response:
def compare_date(x):
if pd.to_datetime(x) > pd.to_datetime('2019-09-01'):
return pd.to_datetime(x)
else:
return pd.to_datetime('2019-09-01')
file['Cash Received Date'] = file['CASH RECIEVED DATE'].apply(lambda x: compare_date(x))
gives file
as :
CASH RECIEVED DATE Cash Received Date
0 2018-07-23 2019-09-01
1 2019-09-26 2019-09-26
2 2017-05-02 2019-09-01
P.S. It's best practice to give columns and dataframes more distinct names so as to prevent confusion with very similar or vague variable/column names.
CodePudding user response:
using .mask
first step is to ensure your values are datetime values.
df['Cash Received Date'] = pd.to_datetime(df['Cash Received Date'])
df['new_date'] = df['Cash Received Date'].mask(
df['Cash Received Date'].dropna().le('2019-09-01'),'2019-01-01')
Cash Received Date new_date
0 2019-09-01 2019-01-01
1 2019-09-26 2019-09-26
2 2019-09-01 2019-01-01