I have written the following code however I am unable to understand how to name the rsuffix and lsuffix parameters
dfs_list = []
for cycle in email_df.cycle_end_date.unique():
temp = email_df[email_df.cycle_end_date == cycle].transpose()\
.join(flash_df[flash_df.cycle_end_date == cycle].transpose(), how='outer', lsuffix='email', rsuffix='flash')\
.join(sms_df[sms_df.cycle_end_date == cycle].transpose(), how='outer', lsuffix='email', rsuffix='flash')\
.join(upi_df[upi_df.cycle_end_date==cycle].transpose(),how='outer',lsuffix='lsuf', rsuffix='rsuf')\
.join(ivr_df[ivr_df.cycle_end_date==cycle].transpose(),how='outer',lsuffix='lsuff', rsuffix='rsuff')
dfs_list.append(temp)
All my dfs have same column names
example:
cycle_end_date | triggered | delivered | cost | payment_value | delivery%
2021-15-01 | 34 | 32 | 4 | 7899 | 5%
2021-31-01 | 45 | 49 | 8 | 1500 | 4%
When I am printing dfs_list[2].reset_index()
I do get my expected output but I am unable to comprehend the suffix names. How do we define it?
output:
**index | 2email | 1lsuff | 2flash | 2 | 1rsuff**
0 absolute_cost 3.00 9.40 9.40 0.00 6.00
1 bill_paid_percent 3.28 0.33 1.87 68139.72 0.28
2 bill_paid_using_reminder 21.20 0.70 9.45 1.78 0.64
3 bounced_email 5018 NaN NaN NaN NaN
4 clicked_email 13385 NaN NaN NaN NaN
5 cycle_end_date 2022-02-28 2022-02-28 2022-02-28 2022-02-28 2022-02-28
Can someone throw light on how to name the suffix to get the exact numbers of the dataframe considered?
CodePudding user response:
lsuffix
and rsuffix
only work when there are overlapping columns between joined dataframes.
Let's look at the continous join
in your script
temp = email_df[email_df.cycle_end_date == cycle].transpose()\
.join(flash_df[flash_df.cycle_end_date == cycle].transpose(), how='outer', lsuffix='email', rsuffix='flash')\
.join(sms_df[sms_df.cycle_end_date == cycle].transpose(), how='outer', lsuffix='email', rsuffix='flash')\
.join(upi_df[upi_df.cycle_end_date==cycle].transpose(),how='outer',lsuffix='lsuf', rsuffix='rsuf')\
.join(ivr_df[ivr_df.cycle_end_date==cycle].transpose(),how='outer',lsuffix='lsuff', rsuffix='rsuff')
Since you do a transpose
to all your joined dataframe, so actually you are using dataframe where column names are original index.
1
cycle_end_date 2021-31-01
triggered 45
delivered 49
cost 8
payment_value 1500
delivery% 4%
After you join email_df
with flash_df
, the 1
might overlap, so the joined df is
1email 1flash
cycle_end_date 2021-15-01 2021-15-01
triggered 34 34
delivered 32 32
cost 4 4
payment_value 7899 7899
delivery% 5% 5%
In the next join with sms_df
whose column name is index doesn't overlap with above joined df, so the output might like
0email 0flash 1
cycle_end_date 2021-15-01 2021-15-01 2021-15-01
triggered 34 34 34
delivered 32 32 32
cost 4 4 4
payment_value 7899 7899 7899
delivery% 5% 5% 5%
And this process goes on...