I have two dataframes. The first one, grouper
contains the mean sales quantity per month by item. Most items have values for all 12 months, because they have been sold for > 1 year. But the items that have been on sale < 1 year do not have values for all months. Example: grouper[grouper['Product'] == 'IT984359570']
Product Month Sales Quantity [QTY]
4190 IT984359570 4 35.0
4191 IT984359570 5 208.0
4192 IT984359570 6 208.0
4193 IT984359570 7 233.0
4194 IT984359570 8 191.0
The second dataframe is a pivot table that displays the cumulative sum of sales by product, pivot_table
. This also takes into account new orders (hence the positive amount in some cells). pivot_table[pivot_table['Product'] == 'IT984359570']
returns:
Date Product 2022-05-01 2022-06-01 2022-07-01 2022-08-01 2022-09-01 2022-10-01 2022-11-01
412 IT984359570 -208.0 -416.0 -649.0 -840.0 2019.0 NaN NaN
I would like to avoid dropping all rows that have NaN values. I want to fill all NaN values with the mean of all entries for the specific product from grouper
. For product IT984359570
: Fill all NaN values in row 412 with 175, as (35 208 208 233 191)/5=175.
I have tried doing that with the code
pivot_table = pivot_table.fillna(lambda row: grouper.loc[grouper['Product'] == row['Product'], 'Sales Quantity [QTY]'].mean())
However, I do not get the desired output. My output:
Date Product 2022-05-01 2022-06-01 2022-07-01 2022-08-01 2022-09-01 2022-10-01 2022-11-01
412 IT984359570 -208.0 -416.0 -649.0 -840.0 2019.0 <function <lambda> at 0x0000023221232320> <function <lambda> at 0x0000023221232320>
What am I doing wrong?
Edit:
pivot_table
uses .cumsum(), so the desired output looks like this:
Date Product 2022-05-01 2022-06-01 2022-07-01 2022-08-01 2022-09-01 2022-10-01 2022-11-01
412 IT984359570 -208.0 -416.0 -649.0 -840.0 2019.0 1844.0 1669.0
CodePudding user response:
Just adding onto the above contributions, I think adding the axis parameter would complete the code. Hope it helps.
pivot_table.apply(lambda x: x.fillna(grouper.loc[grouper['Product'] == x['Product'], 'Sales Quantity'].mean()), axis=1)
CodePudding user response:
To get the desired output:
Date Product 2022-05-01 2022-06-01 2022-07-01 2022-08-01 2022-09-01 2022-10-01 2022-11-01
412 IT984359570 -208.0 -416.0 -649.0 -840.0 2019.0 1844.0 1669.0
I used:
pivot_table = (final_output.groupby(['Product', 'Date'])['Quantity'].sum().reset_index()
.pivot_table(index=['Product'], columns='Date', values='Quantity').reset_index()
)
pivot_table = pivot_table.apply(lambda x: x.fillna(grouper.loc[grouper['Product'] == x['Product'], 'Sales Quantity [QTY]'].mean()), axis=1)
pivot_table.loc[:, ~pivot_table.columns.isin(['Product'])] = pivot_table.loc[:, ~pivot_table.columns.isin(['Product'])].cumsum(axis=1)
While it worked, I am not sure if this is the most pythonic way. Please advise if you have any idea how to achieve this with less code...