Home > OS >  Pivoting on-sale products
Pivoting on-sale products

Time:05-01

Let's say I have data like:

df = pd.DataFrame(
    {
        "product": np.random.choice(["p1", "p2", "p3"], 1000),
        "sales": np.random.randint(0, 1000, 1000),
        "april": np.random.randint(0, 2, 1000),
        "march": np.random.randint(0, 2, 1000),
        "february": np.random.randint(0, 2, 1000),
    }
)

where:

  • "product" is product category
  • "sales" are total sales for the product
  • "april", "march", "february", ... are flags showing if a specific product was on sale for a specific month.

and I have billions of rows like these.

Then I aggregate data like this through SQL:

df.groupby(["april", "march", "february", "product"]).agg({"sales": "sum"})

and pass it to pandas.

Is there a nice way to pivot data, so that I can display:

  • product names as rows
  • months as columns
  • and active sales as values

without explicit iterating through months and joining results one-by-one?

CodePudding user response:

What you can do is melt your DF (turning it long rather than wide) keeping the product and sales as columns row but having your month column broken down into a separate column that you can then use to pivot accordingly, and filter out only for that month where the product was on sale, eg:

pt = (
    df.melt(id_vars=['product', 'sales'], var_name='month')
    .query('value == 1')
    .pivot_table(index='product', columns='month', values='sales', aggfunc='sum')
)

This given the random sample I ended up having gives you:

month    april  february   march
product                         
p1       83867     76609   80505
p2       74135     67823   64843
p3       94081     84000  107928
  • Related