Pivoting on-sale products-CodePudding

Let's say I have data like:

df = pd.DataFrame(
    {
        "product": np.random.choice(["p1", "p2", "p3"], 1000),
        "sales": np.random.randint(0, 1000, 1000),
        "april": np.random.randint(0, 2, 1000),
        "march": np.random.randint(0, 2, 1000),
        "february": np.random.randint(0, 2, 1000),
    }
)

where:

"product" is product category
"sales" are total sales for the product
"april", "march", "february", ... are flags showing if a specific product was on sale for a specific month.

and I have billions of rows like these.

Then I aggregate data like this through SQL:

df.groupby(["april", "march", "february", "product"]).agg({"sales": "sum"})

and pass it to pandas.

Is there a nice way to pivot data, so that I can display:

product names as rows
months as columns
and active sales as values

without explicit iterating through months and joining results one-by-one?

CodePudding user response：

What you can do is melt your DF (turning it long rather than wide) keeping the product and sales as columns row but having your month column broken down into a separate column that you can then use to pivot accordingly, and filter out only for that month where the product was on sale, eg:

pt = (
    df.melt(id_vars=['product', 'sales'], var_name='month')
    .query('value == 1')
    .pivot_table(index='product', columns='month', values='sales', aggfunc='sum')
)

This given the random sample I ended up having gives you:

month    april  february   march
product                         
p1       83867     76609   80505
p2       74135     67823   64843
p3       94081     84000  107928