I've a dateset like this:
date | Condition |
---|---|
20-01-2015 | 1 |
20-02-2015 | 1 |
20-03-2015 | 2 |
20-04-2015 | 2 |
20-05-2015 | 2 |
20-06-2015 | 1 |
20-07-2015 | 1 |
20-08-2015 | 2 |
20-09-2015 | 2 |
20-09-2015 | 1 |
I want a new column date_new which should look at the condition in next column. If condition is one, do nothing. If condition is 2, add a day to the date and store in date_new. Additional condition- There should be 3 continuous 2's for this to work.
The final output should look like this.
date | Condition | date_new |
---|---|---|
20-01-2015 | 1 | |
20-02-2015 | 1 | |
20-03-2015 | 2 | 21-02-2015 |
20-04-2015 | 2 | |
20-05-2015 | 2 | |
20-06-2015 | 1 | |
20-07-2015 | 1 | |
20-08-2015 | 2 | |
20-09-2015 | 2 | |
20-09-2015 | 1 |
Any help is appreciated. Thank you.
CodePudding user response:
This solution is a little bit different. If condition is 1 put None, otherwise I add condition value -1 to the date
df['date_new'] = np.where(df['condition'] == 1, None, (df['date'] pd.to_timedelta(df['condition']-1,'d')).dt.strftime('%d-%m-%Y') )
CodePudding user response:
Ok, so I've edited my answer and transform it into a function:
def newdate(df):
L = df.Condition
res = [i for i, j, k in zip(L, L[1:], L[2:]) if i == j == k]
if 2 in res:
df['date'] = pd.to_datetime(df['date'])
df['new_date'] = df.apply(lambda x: x["date"] pd.DateOffset(days=2) if x["Condition"]==2 else pd.NA, axis=1)
df['new_date'] = pd.to_datetime(df['new_date'])
df1 = df
return df1
#output:
index | date | Condition | new_date |
---|---|---|---|
0 | 2015-01-20 00:00:00 | 1 | NaT |
1 | 2015-02-20 00:00:00 | 1 | NaT |
2 | 2015-03-20 00:00:00 | 2 | 2015-03-22 00:00:00 |
3 | 2015-04-20 00:00:00 | 2 | 2015-04-22 00:00:00 |
4 | 2015-05-20 00:00:00 | 2 | 2015-05-22 00:00:00 |
5 | 2015-06-20 00:00:00 | 1 | NaT |
6 | 2015-07-20 00:00:00 | 1 | NaT |
7 | 2015-08-20 00:00:00 | 2 | 2015-08-22 00:00:00 |
8 | 2015-09-20 00:00:00 | 2 | 2015-09-22 00:00:00 |
9 | 2015-09-20 00:00:00 | 1 | NaT |