I have a problem. I am calculating when the next day will arrive. I want to take into account that if today's date is a Saturday, it should subtract -1 from lastindays
, as no parcels arrive on Sundays.
I have tried this, but I don't know how to specify two values in my Apply function: lastindays
which should be manipulated and returned and week
on which the if statement happens.
Dataframe
customerId fromDate otherInformation
0 1 2021-02-22 Cat
1 1 2021-02-20 Dog
2 1 2021-03-18 Elefant
3 1 2021-03-18 Cat
4 1 2021-03-18 Cat
5 1 2021-03-22 Cat
6 1 2021-02-10 Cat
7 1 2021-09-07 Cat
8 1 None Elefant
9 1 2022-01-18 Fish
10 2 2021-05-17 Fish
Code
import pandas as pd
d = {'customerId': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2],
'fromDate': ['2021-02-22','2021-02-20', '2021-03-18','2021-03-18', '2021-03-18', '2021-03-22',
'2021-02-10', '2021-09-07', None, '2022-01-18', '2021-05-17'],
'otherInformation': ['Cat', 'Dog', 'Elefant', 'Cat', 'Cat','Cat', 'Cat', 'Cat', 'Elefant', 'Fish', 'Fish']
}
df = pd.DataFrame(data=d)
print(df)
df['fromDate'] = pd.to_datetime(df['fromDate'], errors='coerce')
#df['lastindays'] = df['fromDate'] - df.sort_values('fromDate').groupby('customerId')['fromDate'].shift()
#print(df)
df['fromDate'] = pd.to_datetime(df['fromDate'], errors='coerce')
df['lastindays'] = df['fromDate'] - df.sort_values('fromDate').drop_duplicates(['customerId','fromDate']).groupby('customerId')['fromDate'].shift()
df['lastindays'] = df.groupby(['customerId','fromDate'])['lastindays'].ffill()
df = df[df['lastindays'].notna()]
df['lastindays_int'] = df['lastindays'].dt.days.astype('int')
df['week'] = df['fromDate'].dt.day_of_week
df['lastindays'] = df['lastindays'].apply(lambda x: x-1 if x == 5 else x)
df
What I have
customerId fromDate otherInformation lastindays lastindays_int week
0 1 2021-02-22 Cat 2 days 2 0
1 1 2021-02-20 Dog 10 days 10 5
2 1 2021-03-18 Elefant 24 days 24 3
3 1 2021-03-18 Cat 24 days 24 3
4 1 2021-03-18 Cat 24 days 24 3
5 1 2021-03-22 Cat 4 days 4 0
7 1 2021-09-07 Cat 169 days 169 1
9 1 2022-01-18 Fish 133 days 133 1
What I want
customerId fromDate otherInformation lastindays lastindays_int week
0 1 2021-02-22 Cat 2 days 2 0
1 1 2021-02-20 Dog 10 days 10 4 # from 5 -> 4
2 1 2021-03-18 Elefant 24 days 24 3
3 1 2021-03-18 Cat 24 days 24 3
4 1 2021-03-18 Cat 24 days 24 3
5 1 2021-03-22 Cat 4 days 4 0
7 1 2021-09-07 Cat 169 days 169 1
9 1 2022-01-18 Fish 133 days 133 1
CodePudding user response:
You can use axis =1 to acess full row at once.
def date_check(cell):
return pd.to_datetime(cell, format='%Y-%m-%d').strftime('%a') == 'Sat'
def claculate(row):
check = date_check(row['fromDate'])
if check:
row['week']-1
return row['week']
df['week'] = df['week'].astype(int)
df['week'] = df.apply(lambda row: claculate(row), axis=1)
CodePudding user response:
df['week'] = df['fromDate'].dt.dayofweek
def func_data(x):
if x == 5:
x -=1
return x
df['week'] = df['week'].apply(func_data)
Output
customerId fromDate otherInformation lastindays lastindays_int week
0 1 2021-02-22 Cat 2 days 2 0
1 1 2021-02-20 Dog 10 days 10 4
2 1 2021-03-18 Elefant 24 days 24 3
3 1 2021-03-18 Cat 24 days 24 3
4 1 2021-03-18 Cat 24 days 24 3
5 1 2021-03-22 Cat 4 days 4 0
7 1 2021-09-07 Cat 169 days 169 1
9 1 2022-01-18 Fish 133 days 133 1
Or use your variant, but instead of 'dt.day_of_week' use 'dt.dayofweek '. And also in apply you need to take the column ' df['week']'.
df['week'] = df['fromDate'].dt.dayofweek
df['week'] = df['week'].apply(lambda x: x-1 if x == 5 else x)