Home > OS >  Python Pandas: Transforming dataframe according to a specific column
Python Pandas: Transforming dataframe according to a specific column

Time:05-31

I have a dataframe with offer data that I want to transform into portion data. The column quantity tells how many portions each offer has. If an offer has three portions out of which two were sold, that row should appear three times in the new dataframe, so that the sold column has value 1 on two rows and value 0 on one row.

The actual dataframe contains more columns that should stay intact in the transformed dataframe. Date column works as an example.

My input:

    Offer name     Quantity   Sold   Date
  0 A              3          2      2022-05 
  1 B              2          1      2022-01 
  2 C              1          1      2022-04 
  3 D              1          1      2022-04 
  4 E              1          1      2022-05

The desired output:

    Offer name     Quantity   Sold   Date
  0 A              1          1      2022-05 
  1 A              1          1      2022-05 
  2 A              1          0      2022-05 
  3 B              1          1      2022-01 
  4 B              1          0      2022-01
  and so on...

Thank you!!

CodePudding user response:

Solution

s = df.reindex(df.index.repeat(df['Quantity']))
s['Quantity'] = 1
s['Sold'] = s.groupby(level=0).cumcount().lt(s['Sold']).astype(int)

Explained

Repeat the index of dataframe exactly Quantity times

>>> s = df.reindex(df.index.repeat(df['Quantity']))
>>> s

  Offer name  Quantity  Sold     Date
0          A         3     2  2022-05
0          A         3     2  2022-05
0          A         3     2  2022-05
1          B         2     1  2022-01
1          B         2     1  2022-01
2          C         1     1  2022-04
3          D         1     1  2022-04
4          E         1     1  2022-05

Group the above dataframe on index and use cumcount to create a sequential counter,

>>> s.groupby(level=0).cumcount()
0    0
0    1
0    2
1    0
1    1
2    0
3    0
4    0
dtype: int64

Flag the rows where the sequential counter is less than Sold quantity then change the dtype to int and assign the result to Sold column

>>> s['Sold'] = s.groupby(level=0).cumcount().lt(s['Sold']).astype(int)
>>> s

  Offer name  Quantity  Sold     Date
0          A         1     1  2022-05
0          A         1     1  2022-05
0          A         1     0  2022-05
1          B         1     1  2022-01
1          B         1     0  2022-01
2          C         1     1  2022-04
3          D         1     1  2022-04
4          E         1     1  2022-05

CodePudding user response:

You can try apply on rows

def repeat(row):
    offer = [row['Offer name']] * row['Quantity']
    sold = [1] * row['Sold']   [0] * (row['Quantity']-row['Sold'])
    return offer, sold, 1

df[['Offer name', 'Sold', 'Quantity']] = df.apply(repeat, axis=1, result_type='expand')
df = df.explode(['Offer name', 'Sold'])
print(df)

  Offer name  Quantity Sold     Date
0          A         1    1  2022-05
0          A         1    1  2022-05
0          A         1    0  2022-05
1          B         1    1  2022-01
1          B         1    0  2022-01
2          C         1    1  2022-04
3          D         1    1  2022-04
4          E         1    1  2022-05
  • Related