Here is my dataset:
import pandas as pd
data = {
'xProductNumber': ['0000',
'3505',
'1056',
'3501'],
'xy_0000': [1,
0,
0,
0],
'xy_3613': [0,
0,
0,
0],
'xy_3505': [0,
1,
0,
0],
'xy_3671': [0,
0,
0,
1],
'xy_1056': [1,
0,
1,
0],
'xy_3070': [1,
0,
0,
0],
}
C =('0000', '3505', '1056', '1182')
df = pd.DataFrame(data)
I want to do something like this:
df.loc[df.apply(lambda x: (x.xProductNumber in C) and (eval('x.xy_' str(x.xProductNumber)) == 1), axis=1 ) , 'xy_' str(df['xProductNumber'])] = 11
That will update dynamically the correct columns - and not add new columns.
The output result should be:
xProductNumber | xy_0000 | xy_3613 | xy_3505 | xy_3671 | xy_1056 | xy_3070 |
---|---|---|---|---|---|---|
0000 | 11 | 0 | 0 | 0 | 1 | 1 |
3505 | 0 | 0 | 11 | 0 | 0 | 0 |
1056 | 0 | 0 | 0 | 0 | 11 | 0 |
3501 | 0 | 0 | 0 | 1 | 0 | 0 |
I tried many combination, but to no avail. Any help will be very appreciated!
CodePudding user response:
Not sure of a vectorized approach but we can utilize stack
and unstack
to work on each row at a time which is what apply
does anyways.
stacked = df.stack()
for n in range(len(df)):
stack = stacked[n]
pnum = stack.iloc[0]
if pnum not in C:
continue
key = f"xy_{pnum}"
try:
val = stack.loc[key]
except KeyError:
continue
if val == 1:
stack.loc[key] = 11
stacked.unstack()
xProductNumber xy_0000 xy_3613 xy_3505 xy_3671 xy_1056 xy_3070
0 0000 11 0 0 0 1 1
1 3505 0 0 11 0 0 0
2 1056 0 0 0 0 11 0
3 3501 0 0 0 1 0 0
CodePudding user response:
You can make a MultiIndex, and stack your df temporarily to select the area where you want to apply your operation. No explicit loops needed:
# index where we want to modify df
idx = pd.MultiIndex.from_tuples([(i, f'xy_{i}') for i in C])
# or, alternatively:
a = pd.Index(C)
idx = pd.MultiIndex.from_arrays([a, 'xy_' a])
# temporary stacked df, to make the operation
tmp = df.set_index('xProductNumber').stack()
# do the operation
tmp.loc[idx.intersection(tmp.index)] = 11
# unstack back to df
df = tmp.unstack().reset_index()
And now:
>>> df
xProductNumber xy_0000 xy_3613 xy_3505 xy_3671 xy_1056 xy_3070
0 0000 11 0 0 0 1 1
1 3505 0 0 11 0 0 0
2 1056 0 0 0 0 11 0
3 3501 0 0 0 1 0 0
For sake of understanding, it's worth taking a look at tmp
and idx
:
>>> tmp.head(10)
xProductNumber
0000 xy_0000 11
xy_3613 0
xy_3505 0
xy_3671 0
xy_1056 1
xy_3070 1
3505 xy_0000 0
xy_3613 0
xy_3505 11
xy_3671 0
dtype: int64
>>> idx
MultiIndex([('0000', 'xy_0000'),
('3505', 'xy_3505'),
('1056', 'xy_1056'),
('1182', 'xy_1182')],
)