I have a dataframe like as shown below
df = pd.DataFrame(
{'supplier_id':[1,1,1,1],
'prod_id':[123,456,789,342],
'country' : ['UK', 'UK', 'UK','US'],
'transaction_date' : ['13/11/2020', '10/1/2018','11/11/2017', '27/03/2016'],
'industry' : ['STA','STA','PSA','STA'],
'segment' : ['testa','testb','testa','testc'],
'label':[1,1,1,0]})
My objective is to find out answers to the below questions
a) from the current row, How many times prior (previously), the same supplier has succeeded and failed in the same country? (use supplier_id
and country
column). here column label = 1 means success and label=0 means failure
Similarly, I would like to compute the success and failure count based on industry
, country
and segment
as well.
Note that 1st transaction will always starts with 0 because supplier will have no previous transactions associated with that column.
As we are looking at chronological order of business done, we need to first sort the dataframe based on transaction_date
.
So, I tried the below
df.sort_values(by=['supplier_id','transaction_date'],inplace=True)
df['prev_biz_country_success_count'] = df.groupby(['supplier_id', 'country']).cumcount()
df['prev_biz_country_failure_count'] = df.groupby(['supplier_id', 'country']).cumcount()
but as you can see, am not sure how to include the label
column value. Meaning, we need to count based on label=1 and label=0.
I expect my output to be like as shown below
CodePudding user response:
We can group the dataframe by supplier_id
and country
column then apply
transformation function shift
cumsum
on label
column to get the count of rows where the criteria in previous rows
g = df.groupby(['supplier_id', 'country'])
for criteria, label in dict(success=1, failure=0).items():
df[f'prev_biz_country_{criteria}_count'] =\
g['label'].apply(lambda s: s.eq(label).shift(fill_value=0).cumsum())
supplier_id prod_id country transaction_date industry segment label prev_biz_country_success_count prev_biz_country_failure_count
1 1 456 UK 10/1/2018 STA testb 1 0 0
2 1 789 UK 11/11/2017 PSA testa 1 1 0
0 1 123 UK 13/11/2020 STA testa 1 2 0
3 1 342 US 27/03/2016 STA testc 0 0 0