Home > Mobile >  Adding a column list based on another column values in pandas
Adding a column list based on another column values in pandas

Time:01-19

I have a dataframe df:

Date          Product  value    Offer
2022-01-01    x_00_02   0.16    5
2022-01-01    x_00_02   0.16    5
2022-01-01    x_00_02   0.16    5
2022-01-01    x_00_02   0.16    5
2022-01-01    x_00_02   0.18    6
2022-01-01    x_00_02   0.18    6
2022-01-01    x_00_02   0.18    6
2022-01-01    x_00_02   0.18    6
2022-01-01    x_02_04   0.32    10
2022-01-01    x_02_04   0.32    10
2022-01-01    x_02_04   0.32    10
2022-01-01    x_02_04   0.32    10
2022-01-01    x_04_06   0.45    11
2022-01-01    x_04_06   0.45    11
2022-01-01    x_04_06   0.45    11
2022-01-01    x_04_06   0.45    11
2022-01-01    x_04_06   0.47    7
2022-01-01    x_04_06   0.47    7
2022-01-01    x_04_06   0.47    7
2022-01-01    x_04_06   0.47    7
...

Each individual "product" is duplicated 4 times (as this was achieved by previous code) and can be identified based on the value column. df is already sorted by date, product and value.

What I want is to add a new column that counts each "product" based on the Product column so that:

  • the count begins at: (minimum value * 2) 1
  • the count ends at: maximum value * 2
  • count is denoted as: val_0xx, where xx represent the count values

In the case of the product x_00_02 (minimum value = 00 and maximum value = 02):

  • count starts at: (0*2) 1 = val_001
  • count ends at: 2*2 = val_004

Expected output:

Date          Product  value    Offer  Product_2
2022-01-01    x_00_02   0.16    5      val_001 
2022-01-01    x_00_02   0.16    5      val_002 
2022-01-01    x_00_02   0.16    5      val_003 
2022-01-01    x_00_02   0.16    5      val_004 
2022-01-01    x_00_02   0.18    6      val_001 
2022-01-01    x_00_02   0.18    6      val_002 
2022-01-01    x_00_02   0.18    6      val_003 
2022-01-01    x_00_02   0.18    6      val_004 
2022-01-01    x_02_04   0.32    10     val_005
2022-01-01    x_02_04   0.32    10     val_006
2022-01-01    x_02_04   0.32    10     val_007
2022-01-01    x_02_04   0.32    10     val_008
2022-01-01    x_04_06   0.45    11     val_009
2022-01-01    x_04_06   0.45    11     val_010
2022-01-01    x_04_06   0.45    11     val_011
2022-01-01    x_04_06   0.45    11     val_012
2022-01-01    x_04_06   0.47    7      val_009
2022-01-01    x_04_06   0.47    7      val_010
2022-01-01    x_04_06   0.47    7      val_011
2022-01-01    x_04_06   0.47    7      val_012

CodePudding user response:

You can use:

df['Product_2'] = (df.drop_duplicates()['Product'].str.split('_')
                     .map(lambda x: range(int(x[1])*2 1, int(x[2])*2 1))
                     .explode().astype(str).str.zfill(3).radd('val_').tolist())
print(df)

# Output
          Date  Product  value  Offer Product_2
0   2022-01-01  x_00_02   0.16      5   val_001
1   2022-01-01  x_00_02   0.16      5   val_002
2   2022-01-01  x_00_02   0.16      5   val_003
3   2022-01-01  x_00_02   0.16      5   val_004
4   2022-01-01  x_00_02   0.18      6   val_001
5   2022-01-01  x_00_02   0.18      6   val_002
6   2022-01-01  x_00_02   0.18      6   val_003
7   2022-01-01  x_00_02   0.18      6   val_004
8   2022-01-01  x_02_04   0.32     10   val_005
9   2022-01-01  x_02_04   0.32     10   val_006
10  2022-01-01  x_02_04   0.32     10   val_007
11  2022-01-01  x_02_04   0.32     10   val_008
12  2022-01-01  x_04_06   0.45     11   val_009
13  2022-01-01  x_04_06   0.45     11   val_010
14  2022-01-01  x_04_06   0.45     11   val_011
15  2022-01-01  x_04_06   0.45     11   val_012
16  2022-01-01  x_04_06   0.47      7   val_009
17  2022-01-01  x_04_06   0.47      7   val_010
18  2022-01-01  x_04_06   0.47      7   val_011
19  2022-01-01  x_04_06   0.47      7   val_012
  • Related