I have a dataframe with offer data that I want to transform into portion data. The column quantity tells how many portions each offer has. If an offer has three portions out of which two were sold, that row should appear three times in the new dataframe, so that the sold column has value 1 on two rows and value 0 on one row.
The actual dataframe contains more columns that should stay intact in the transformed dataframe. Date column works as an example.
My input:
Offer name Quantity Sold Date
0 A 3 2 2022-05
1 B 2 1 2022-01
2 C 1 1 2022-04
3 D 1 1 2022-04
4 E 1 1 2022-05
The desired output:
Offer name Quantity Sold Date
0 A 1 1 2022-05
1 A 1 1 2022-05
2 A 1 0 2022-05
3 B 1 1 2022-01
4 B 1 0 2022-01
and so on...
Thank you!!
CodePudding user response:
Solution
s = df.reindex(df.index.repeat(df['Quantity']))
s['Quantity'] = 1
s['Sold'] = s.groupby(level=0).cumcount().lt(s['Sold']).astype(int)
Explained
Repeat the index of dataframe exactly Quantity
times
>>> s = df.reindex(df.index.repeat(df['Quantity']))
>>> s
Offer name Quantity Sold Date
0 A 3 2 2022-05
0 A 3 2 2022-05
0 A 3 2 2022-05
1 B 2 1 2022-01
1 B 2 1 2022-01
2 C 1 1 2022-04
3 D 1 1 2022-04
4 E 1 1 2022-05
Group the above dataframe on index and use cumcount
to create a sequential counter,
>>> s.groupby(level=0).cumcount()
0 0
0 1
0 2
1 0
1 1
2 0
3 0
4 0
dtype: int64
Flag the rows where the sequential counter is less than Sold
quantity then change the dtype to int
and assign the result to Sold
column
>>> s['Sold'] = s.groupby(level=0).cumcount().lt(s['Sold']).astype(int)
>>> s
Offer name Quantity Sold Date
0 A 1 1 2022-05
0 A 1 1 2022-05
0 A 1 0 2022-05
1 B 1 1 2022-01
1 B 1 0 2022-01
2 C 1 1 2022-04
3 D 1 1 2022-04
4 E 1 1 2022-05
CodePudding user response:
You can try apply on rows
def repeat(row):
offer = [row['Offer name']] * row['Quantity']
sold = [1] * row['Sold'] [0] * (row['Quantity']-row['Sold'])
return offer, sold, 1
df[['Offer name', 'Sold', 'Quantity']] = df.apply(repeat, axis=1, result_type='expand')
df = df.explode(['Offer name', 'Sold'])
print(df)
Offer name Quantity Sold Date
0 A 1 1 2022-05
0 A 1 1 2022-05
0 A 1 0 2022-05
1 B 1 1 2022-01
1 B 1 0 2022-01
2 C 1 1 2022-04
3 D 1 1 2022-04
4 E 1 1 2022-05