First of all, hi everyone! This is the first time I am actually posting a question on StackOverflow, so if I am too specific / too general, I would appreciate receiving advise :).
I have a Pandas DataFrame containing some SAP Authorization Data, in which a column can contain something like "placeholder values" which shall be resolved to their corresponding values. And at this point, I really run out of Ideas..
For Example, in the DataFrame shown below I have two Roles, each containing the (authorization) object F_BKPF_BUK
with the Fields ACTVT
and ABC
. ABC
is characterized with the "placeholder value" $EKGRP
for LOW
.
ROLE OBJECT FIELD LOW HIGH
0 D:AS:MY_FANCY_ROLE_A F_BKPF_BUK ACTVT 03 NaN
1 D:AS:MY_FANCY_ROLE_A F_BKPF_BUK ABC $EKGRP NaN
2 D:AS:MY_FANCY_ROLE_B F_BKPF_BUK ACTVT 03 NaN
3 D:AS:MY_FANCY_ROLE_B F_BKPF_BUK ABC $EKGRP NaN
Now, the tricky thing is, that the placeholder $EKGRP
usually resolves to role dependent (!) multiple values. The DataFrame for $EKGRP is as follows:
ROLE VARBL LOW HIGH
0 D:AS:MY_FANCY_ROLE_A $EKGRP U01 U99
1 D:AS:MY_FANCY_ROLE_A $EKGRP P01 P99
2 D:AS:MY_FANCY_ROLE_A $EKGRP P01 P29
3 D:AS:MY_FANCY_ROLE_B $EKGRP P01 P00
4 D:AS:MY_FANCY_ROLE_B $EKGRP N01 N99
5 D:AS:MY_FANCY_ROLE_B $EKGRP I01 I99
So the final result I would like to achieve is to substitute all occurrences of a placeholder with its corresponding values for both columns LOW
and HIGH
:
ROLE OBJECT FIELD LOW HIGH
0 D:AS:MY_FANCY_ROLE_A F_BKPF_BUK ACTVT 03 NaN
1 D:AS:MY_FANCY_ROLE_A F_BKPF_BUK ABC U01 U99
2 D:AS:MY_FANCY_ROLE_A F_BKPF_BUK ABC P01 P99
3 D:AS:MY_FANCY_ROLE_A F_BKPF_BUK ABC P01 P29
4 D:AS:MY_FANCY_ROLE_B F_BKPF_BUK ACTVT 03 NaN
5 D:AS:MY_FANCY_ROLE_B F_BKPF_BUK ABC P01 P00
6 D:AS:MY_FANCY_ROLE_B F_BKPF_BUK ABC N01 N99
7 D:AS:MY_FANCY_ROLE_B F_BKPF_BUK ABC I01 I99
Started using Pandas only a few weeks ago, I soon reached a point where I ran out of ideas for this particular problem. My latest guess was to maybe use df.apply(...)
to check for a placeholder, but this approach would not solve the issue that, once a placeholder is found, the original line has to be duplicated several times with their LOW
and HIGH
values getting changed to the corresponding values.
Which pandas function would you recommend me to take a closer look at? I would like to avoid row-by-row iterations as far as possible and get to know the "best practices" for those kind of problems.
CodePudding user response:
If possible use outer join by column LOW
from df1
with ROLE
first copy LOW
to VARBL
in DataFrame.assign
, then replace missing values in DataFrame.fillna
(necessary remove _
in columns for match) and last remove unnecesary columns:
df = (df1.assign(VARBL = df1['LOW'])
.merge(df2, on=['ROLE','VARBL'], how='outer', suffixes=('_','')))
df[['LOW','HIGH']] = (df[['LOW','HIGH']].fillna(df[['LOW_','HIGH_']]
.rename(columns=lambda x: x.strip('_'))))
df = df.drop(['LOW_','HIGH_','VARBL'], axis=1)
print (df)
ROLE OBJECT FIELD LOW HIGH
0 D:AS:MY_FANCY_ROLE_A F_BKPF_BUK ACTVT 03 NaN
1 D:AS:MY_FANCY_ROLE_A F_BKPF_BUK ABC U01 U99
2 D:AS:MY_FANCY_ROLE_A F_BKPF_BUK ABC P01 P99
3 D:AS:MY_FANCY_ROLE_A F_BKPF_BUK ABC P01 P29
4 D:AS:MY_FANCY_ROLE_B F_BKPF_BUK ACTVT 03 NaN
5 D:AS:MY_FANCY_ROLE_B F_BKPF_BUK ABC P01 P00
6 D:AS:MY_FANCY_ROLE_B F_BKPF_BUK ABC N01 N99
7 D:AS:MY_FANCY_ROLE_B F_BKPF_BUK ABC I01 I99