Home > OS >  Pandas df set dynamic column values by filtered rows
Pandas df set dynamic column values by filtered rows

Time:12-26

Here is my dataset:

import pandas as pd
data = { 
    'xProductNumber': ['0000',
              '3505',
              '1056',
              '3501'], 
    'xy_0000': [1,
            0,
            0,
            0], 
    'xy_3613': [0,
            0,
            0,
            0],
    'xy_3505': [0,
            1,
            0,
            0],
    'xy_3671': [0,
            0,
            0,
            1],
    'xy_1056': [1,
            0,
            1,
            0],
    'xy_3070': [1,
            0,
            0,
            0],   
}

C =('0000', '3505', '1056', '1182')

df = pd.DataFrame(data)

I want to do something like this:

df.loc[df.apply(lambda x: (x.xProductNumber in C) and (eval('x.xy_' str(x.xProductNumber)) == 1), axis=1 ) ,  'xy_'  str(df['xProductNumber'])] = 11 

That will update dynamically the correct columns - and not add new columns.

The output result should be:

xProductNumber xy_0000 xy_3613 xy_3505 xy_3671 xy_1056 xy_3070
0000 11 0 0 0 1 1
3505 0 0 11 0 0 0
1056 0 0 0 0 11 0
3501 0 0 0 1 0 0

I tried many combination, but to no avail. Any help will be very appreciated!

CodePudding user response:

Not sure of a vectorized approach but we can utilize stack and unstack to work on each row at a time which is what apply does anyways.

stacked = df.stack()

for n in range(len(df)):
    stack = stacked[n]
    pnum = stack.iloc[0]
    if pnum not in C:
        continue
    key = f"xy_{pnum}"
    try:
        val = stack.loc[key]
    except KeyError:
        continue
    if val == 1:
        stack.loc[key] = 11

stacked.unstack()

  xProductNumber xy_0000 xy_3613 xy_3505 xy_3671 xy_1056 xy_3070
0           0000      11       0       0       0       1       1
1           3505       0       0      11       0       0       0
2           1056       0       0       0       0      11       0
3           3501       0       0       0       1       0       0

CodePudding user response:

You can make a MultiIndex, and stack your df temporarily to select the area where you want to apply your operation. No explicit loops needed:

# index where we want to modify df
idx = pd.MultiIndex.from_tuples([(i, f'xy_{i}') for i in C])

# or, alternatively:
a = pd.Index(C)
idx = pd.MultiIndex.from_arrays([a, 'xy_'   a])


# temporary stacked df, to make the operation
tmp = df.set_index('xProductNumber').stack()

# do the operation
tmp.loc[idx.intersection(tmp.index)] = 11

# unstack back to df
df = tmp.unstack().reset_index()

And now:

>>> df
  xProductNumber  xy_0000  xy_3613  xy_3505  xy_3671  xy_1056  xy_3070
0           0000       11        0        0        0        1        1
1           3505        0        0       11        0        0        0
2           1056        0        0        0        0       11        0
3           3501        0        0        0        1        0        0

For sake of understanding, it's worth taking a look at tmp and idx:

>>> tmp.head(10)
xProductNumber         
0000            xy_0000    11
                xy_3613     0
                xy_3505     0
                xy_3671     0
                xy_1056     1
                xy_3070     1
3505            xy_0000     0
                xy_3613     0
                xy_3505    11
                xy_3671     0
dtype: int64

>>> idx
MultiIndex([('0000', 'xy_0000'),
            ('3505', 'xy_3505'),
            ('1056', 'xy_1056'),
            ('1182', 'xy_1182')],
           )
  • Related