df_batches = pd.DataFrame({"Item":['Can','Liquid','Label','Liquid2', 'Can', 'Sugar'],
"Unit_cost": [.14, .45, .17, 1, .16, 23],
"Product":['Prod1', 'Prod1', 'Prod1', 'Prod1', 'Prod1', 'Prod1'],
"Batch":['5,100','5,100','5,100','5,101','5,101','5,101'],
"Total_cost": [452, 789.34, 11115.28, 3220.98, 3542, 512.34],
"Year": ['2019', '2019', '2019', '2019', '2019', '2019'],
})
I would like to create a df using these rules:
If any item in the batch str.contains('Label')
, then get those batches in new groupby df.
Tried this:
label = df['Item'].str.contains('Label')
label_df = df[label].groupby(['Product', 'Batch', 'Item', 'Year'])[['Total_cost']].sum().reset_index()
But this only gets the str.contains('Label')
rows.
I am assuming I need to use .transform()
somewhere, but can't figure it out.
Expected output would be this:
df_output = pd.DataFrame({"Item":['Can','Liquid','Label'],
"Unit_cost": [.14, .45, .17],
"Product":['Prod1', 'Prod1', 'Prod1'],
"Batch":['5,100','5,100','5,100',],
"Total_cost": [452, 789.34, 11115.28],
"Year": ['2019', '2019', '2019'],
})
label_df = df_output.groupby(['Product','Batch', 'Year', 'Item'])[['Total_cost']].sum().reset_index()
Basically, if the 'Batch' doesn't have and item containing label it is filtered out.
CodePudding user response:
IIUC you want to retrieve the Batch number:
s = df.loc[df['Item'].str.contains('Label'), "Batch"].iat[0]
print (df.loc[df["Batch"]==s])
Item Unit_cost Product Batch Total_cost Year
0 Can 0.14 Prod1 5,100 452.00 2019
1 Liquid 0.45 Prod1 5,100 789.34 2019
2 Label 0.17 Prod1 5,100 11115.28 2019
If you have more than 1 Label:
s = df.loc[df['Item'].str.contains('Label'), "Batch"]
print (df.loc[df["Batch"].isin(s)])