So basically my POS reports don't add up split bills.
If you look at df.Item
there are items with fractions (1/2, 1/3, etc). I want to drop those lines but add the sales to the proper row.
Item Outlet1 Outlet2 Outlet3 Outlet4
2 AIR GIN 162.0 NaN 189.0 54.0
3 AIR GIN 1/3 NaN NaN NaN 9.0
4 AIR VODKA 468.0 NaN 585.0 144.0
5 AIR VODKA 1/2 NaN NaN 18.0 NaN
Example output:
Item Outlet1 Outlet2 Outlet3 Outlet4
2 AIR GIN 162.0 NaN 189.0 63.0
3 AIR VODKA 468.0 NaN 603.0 144.0
I'm not sure where to start, New to python.
CodePudding user response:
I'm assuming the data doesn't contain any duplicate items. It looks like it's total sales over a certain period, but just the itemization is messed up.
In that case, you can simply remove the fractions with .str.replace()
, then group and sum.
df['Item'] = df['Item'].str.replace(r'\s \d /\d $', '', regex=True)
df.groupby('Item').sum(min_count=1) # `min_count` to respect NaNs
Outlet1 Outlet2 Outlet3 Outlet4
Item
AIR GIN 162.0 NaN 189.0 63.0
AIR VODKA 468.0 NaN 603.0 144.0
Then you can .reset_index()
if you want.
P.S. Before your edit, you also had two other columns that weren't really relevant to the problem, but change the process. You could just include them in the groupby:
df.groupby(['Category', 'SubCategory', 'Item'])