Home > Software engineering >  Pandas fillna with custom lamda function - output not displayed correctly
Pandas fillna with custom lamda function - output not displayed correctly

Time:09-15

I have two dataframes. The first one, grouper contains the mean sales quantity per month by item. Most items have values for all 12 months, because they have been sold for > 1 year. But the items that have been on sale < 1 year do not have values for all months. Example: grouper[grouper['Product'] == 'IT984359570']

        Product       Month   Sales Quantity [QTY]
4190    IT984359570   4       35.0
4191    IT984359570   5       208.0
4192    IT984359570   6       208.0
4193    IT984359570   7       233.0
4194    IT984359570   8       191.0

The second dataframe is a pivot table that displays the cumulative sum of sales by product, pivot_table. This also takes into account new orders (hence the positive amount in some cells). pivot_table[pivot_table['Product'] == 'IT984359570'] returns:

Date    Product     2022-05-01  2022-06-01  2022-07-01  2022-08-01  2022-09-01  2022-10-01  2022-11-01
412     IT984359570 -208.0     -416.0       -649.0      -840.0      2019.0      NaN         NaN  

I would like to avoid dropping all rows that have NaN values. I want to fill all NaN values with the mean of all entries for the specific product from grouper. For product IT984359570: Fill all NaN values in row 412 with 175, as (35 208 208 233 191)/5=175.

I have tried doing that with the code

pivot_table = pivot_table.fillna(lambda row: grouper.loc[grouper['Product'] == row['Product'], 'Sales Quantity [QTY]'].mean())

However, I do not get the desired output. My output:

Date    Product      2022-05-01  2022-06-01  2022-07-01  2022-08-01  2022-09-01  2022-10-01   2022-11-01    
412     IT984359570  -208.0      -416.0      -649.0      -840.0      2019.0      <function <lambda> at 0x0000023221232320>   <function <lambda> at 0x0000023221232320>

What am I doing wrong?

Edit:

pivot_table uses .cumsum(), so the desired output looks like this:

Date    Product     2022-05-01  2022-06-01  2022-07-01  2022-08-01  2022-09-01  2022-10-01  2022-11-01
412     IT984359570 -208.0     -416.0       -649.0      -840.0      2019.0      1844.0      1669.0  

CodePudding user response:

Just adding onto the above contributions, I think adding the axis parameter would complete the code. Hope it helps.

pivot_table.apply(lambda x: x.fillna(grouper.loc[grouper['Product'] == x['Product'], 'Sales Quantity'].mean()), axis=1)

CodePudding user response:

To get the desired output:

Date    Product     2022-05-01  2022-06-01  2022-07-01  2022-08-01  2022-09-01  2022-10-01  2022-11-01
412     IT984359570 -208.0     -416.0       -649.0      -840.0      2019.0      1844.0      1669.0  

I used:

pivot_table = (final_output.groupby(['Product', 'Date'])['Quantity'].sum().reset_index()
                           .pivot_table(index=['Product'], columns='Date', values='Quantity').reset_index()
)
                           
pivot_table = pivot_table.apply(lambda x: x.fillna(grouper.loc[grouper['Product'] == x['Product'], 'Sales Quantity [QTY]'].mean()), axis=1)
pivot_table.loc[:, ~pivot_table.columns.isin(['Product'])] = pivot_table.loc[:, ~pivot_table.columns.isin(['Product'])].cumsum(axis=1)

While it worked, I am not sure if this is the most pythonic way. Please advise if you have any idea how to achieve this with less code...

  • Related