I'd like to create a table from a data frame with subtotals per business, totals per business type, and columns summing multiple value columns. Long term is to create a selection tool based on the ingested Excel sheet for whichever month's summary I bring in to compare month summaries (e.g. did minerals item 26 from BA3 disappear the next month) but I believe that is best saved for another question.
For now, I am having trouble figuring out how to summarize the data.
I have a dataframe in Pandas that contains the following:
Business | Business Type | ID | Value-Q1 | Value-Q2 | Value-Q3 | Value-Q4 | Value-FY |
--------- --------------- ---- ---------- ---------- ---------- ---------- ----------
BA1 | Widgets | 1 | 7 | 0 | 0 | 8 | 15 |
BA1 | Widgets | 2 | 7 | 0 | 0 | 8 | 15 |
BA1 | Cups | 3 | 9 | 10 | 0 | 0 | 19 |
BA1 | Cups | 4 | 9 | 10 | 0 | 0 | 19 |
BA1 | Cups | 5 | 9 | 10 | 0 | 0 | 19 |
BA1 | Snorkels | 6 | 0 | 0 | 8 | 8 | 16 |
BA1 | Snorkels | 7 | 0 | 0 | 8 | 8 | 16 |
BA1 | Snorkels | 8 | 0 | 0 | 8 | 8 | 16 |
BA2 | Widgets | 9 | 100 | 0 | 7 | 0 | 107 |
BA2 | Widgets | 10 | 100 | 0 | 7 | 0 | 107 |
BA2 | Widgets | 11 | 100 | 0 | 7 | 0 | 107 |
BA2 | Widgets | 12 | 100 | 0 | 7 | 0 | 107 |
BA2 | Bread | 13 | 0 | 0 | 0 | 1 | 1 |
BA2 | Bread | 14 | 0 | 0 | 0 | 1 | 1 |
BA2 | Bread | 15 | 0 | 0 | 0 | 1 | 1 |
BA2 | Bread | 16 | 0 | 0 | 0 | 1 | 1 |
BA2 | Cat Food | 17 | 504 | 0 | 0 | 500 | 1004 |
BA2 | Cat Food | 18 | 504 | 0 | 0 | 500 | 1004 |
BA2 | Cat Food | 19 | 504 | 0 | 0 | 500 | 1004 |
BA2 | Cat Food | 20 | 504 | 0 | 0 | 500 | 1004 |
BA2 | Cat Food | 21 | 504 | 0 | 0 | 500 | 1004 |
BA3 | Gravel | 22 | 7 | 7 | 7 | 7 | 28 |
BA3 | Gravel | 23 | 7 | 7 | 7 | 7 | 28 |
BA3 | Gravel | 24 | 7 | 7 | 7 | 7 | 28 |
BA3 | Rocks | 25 | 3 | 2 | 0 | 0 | 5 |
BA3 | Minerals | 26 | 1 | 1 | 0 | 1 | 3 |
BA3 | Minerals | 27 | 1 | 1 | 0 | 1 | 3 |
BA4 | Widgets | 28 | 6 | 4 | 0 | 0 | 10 |
BA4 | Widgets | 29 | 6 | 4 | 0 | 0 | 10 |
BA4 | Widgets | 30 | 6 | 4 | 0 | 0 | 10 |
BA4 | Widgets | 31 | 6 | 4 | 0 | 0 | 10 |
BA4 | Widgets | 32 | 6 | 4 | 0 | 0 | 10 |
BA4 | Something | 33 | 1000 | 0 | 0 | 2 | 1002 |
BA5 | Bonbons | 34 | 60 | 40 | 10 | 0 | 110 |
BA5 | Bonbons | 35 | 60 | 40 | 10 | 0 | 110 |
BA5 | Gummy Bears | 36 | 7 | 0 | 0 | 9 | 16 |
(Imagine each ID has different values as well)
My goal is to slice the data to get the total occurrences of a given business type (e.g. BA 1 has 2 widgets, 3 cups, and 3 snorkels which each have a unique ID) as well as the total values:
Occurrence | Q1 Sum | Q2 Sum | Q3 Sum | Q4 Sum | FY Sum |
BA 1 8 | 41 | 30 | 24 | 40 | 135 |
Widgets 2 | 14 | 0 | 0 | 16 | 30 |
Cups 3 | 27 | 30 | 0 | 0 | 57 |
Snorkels 3 | 0 | 0 | 24 | 24 | 48 |
BA 2 Subtotal of BA2 items below
Widgets Repeat Above
Bread Repeat Above
Cat Food Repeat Above
I have more columns that mirror the Q1-FY columns with other fields (e.g. Value 2 Q1-FY) per line as well that I would like to include on the summary but I imagine I could just repeat whatever process is used to grab the current Value cuts.
I have a list of unique Businesses businesses = [BA1, BA2, BA3, BA4, BA5] and a list of unique Business Types [Widgets, Cups, Snorkels, Bread, Cat Food, Gravel, Rocks, Minerals, Something, Bonbons, Gummy Bears] and finally a list of the Values values = [Value-Q1, Value-Q2, Value-Q3, Value-Q4, Value-FY]
and I tried doing a for loop off of the lists
maybe I need to make the dataframe values be on their own individual lines? I tried the following for at least the sum of FY
for b in businesses
for bt in business types
df_sums = df.loc['Business' == b, 'Business Type' == bt, 'Value-FY'].sum()
but it didn't quite give me what I was hoping for
I'm sure there's a better way to at least grab the values (I managed to get FY values per business into a dictionary) for totals but not totals per business per business type (which is also unique per business).
If anyone has any advice or can point me in the right direction I'd really appreciate it!
CodePudding user response:
You should try to use the group_by method for this. Group_by allows for several grouping options. I have attached a link to the documentation on the method. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html