I have a dataframe df:
Date Product value Offer
2022-01-01 x_00_02 0.16 5
2022-01-01 x_00_02 0.16 5
2022-01-01 x_00_02 0.16 5
2022-01-01 x_00_02 0.16 5
2022-01-01 x_00_02 0.18 6
2022-01-01 x_00_02 0.18 6
2022-01-01 x_00_02 0.18 6
2022-01-01 x_00_02 0.18 6
2022-01-01 x_02_04 0.32 10
2022-01-01 x_02_04 0.32 10
2022-01-01 x_02_04 0.32 10
2022-01-01 x_02_04 0.32 10
2022-01-01 x_04_06 0.45 11
2022-01-01 x_04_06 0.45 11
2022-01-01 x_04_06 0.45 11
2022-01-01 x_04_06 0.45 11
2022-01-01 x_04_06 0.47 7
2022-01-01 x_04_06 0.47 7
2022-01-01 x_04_06 0.47 7
2022-01-01 x_04_06 0.47 7
...
Each individual "product" is duplicated 4 times (as this was achieved by previous code) and can be identified based on the value column. df is already sorted by date, product and value.
What I want is to add a new column that counts each "product" based on the Product column so that:
- the count begins at: (minimum value * 2) 1
- the count ends at: maximum value * 2
- count is denoted as: val_0xx, where xx represent the count values
In the case of the product x_00_02 (minimum value = 00 and maximum value = 02):
- count starts at: (0*2) 1 = val_001
- count ends at: 2*2 = val_004
Expected output:
Date Product value Offer Product_2
2022-01-01 x_00_02 0.16 5 val_001
2022-01-01 x_00_02 0.16 5 val_002
2022-01-01 x_00_02 0.16 5 val_003
2022-01-01 x_00_02 0.16 5 val_004
2022-01-01 x_00_02 0.18 6 val_001
2022-01-01 x_00_02 0.18 6 val_002
2022-01-01 x_00_02 0.18 6 val_003
2022-01-01 x_00_02 0.18 6 val_004
2022-01-01 x_02_04 0.32 10 val_005
2022-01-01 x_02_04 0.32 10 val_006
2022-01-01 x_02_04 0.32 10 val_007
2022-01-01 x_02_04 0.32 10 val_008
2022-01-01 x_04_06 0.45 11 val_009
2022-01-01 x_04_06 0.45 11 val_010
2022-01-01 x_04_06 0.45 11 val_011
2022-01-01 x_04_06 0.45 11 val_012
2022-01-01 x_04_06 0.47 7 val_009
2022-01-01 x_04_06 0.47 7 val_010
2022-01-01 x_04_06 0.47 7 val_011
2022-01-01 x_04_06 0.47 7 val_012
CodePudding user response:
You can use:
df['Product_2'] = (df.drop_duplicates()['Product'].str.split('_')
.map(lambda x: range(int(x[1])*2 1, int(x[2])*2 1))
.explode().astype(str).str.zfill(3).radd('val_').tolist())
print(df)
# Output
Date Product value Offer Product_2
0 2022-01-01 x_00_02 0.16 5 val_001
1 2022-01-01 x_00_02 0.16 5 val_002
2 2022-01-01 x_00_02 0.16 5 val_003
3 2022-01-01 x_00_02 0.16 5 val_004
4 2022-01-01 x_00_02 0.18 6 val_001
5 2022-01-01 x_00_02 0.18 6 val_002
6 2022-01-01 x_00_02 0.18 6 val_003
7 2022-01-01 x_00_02 0.18 6 val_004
8 2022-01-01 x_02_04 0.32 10 val_005
9 2022-01-01 x_02_04 0.32 10 val_006
10 2022-01-01 x_02_04 0.32 10 val_007
11 2022-01-01 x_02_04 0.32 10 val_008
12 2022-01-01 x_04_06 0.45 11 val_009
13 2022-01-01 x_04_06 0.45 11 val_010
14 2022-01-01 x_04_06 0.45 11 val_011
15 2022-01-01 x_04_06 0.45 11 val_012
16 2022-01-01 x_04_06 0.47 7 val_009
17 2022-01-01 x_04_06 0.47 7 val_010
18 2022-01-01 x_04_06 0.47 7 val_011
19 2022-01-01 x_04_06 0.47 7 val_012