As input I have a table like this example:
ID_EXAMPLE | KRED_NUMMER | SYSTEM | BUKRS |
---|---|---|---|
1 | 100506 | PPL | 0002 (PG 1030);#1025;#1340 |
2 | 1656082 | PASM | 3333 |
3 | 1656099 | DFMG | 0716;#6008;#6104 |
And as you can see last column BUKRS can have different value formats. And I'd like for each code (4 digits) in BUKRS to create a separate row, but with the same values in other columns. Based on 2 conditions:
- duplicate row for first 4 digits in BUKRS (if there is a string in the brackets, ignore it)
- duplicate row for each 4 digits in BUKRS after #
The expected result:
ID_EXAMPLE | KRED_NUMMER | SYSTEM | BUKRS | My Comment |
---|---|---|---|---|
1 | 100506 | PPL | 0002 | always take first 4 digits in BUKRS |
1 | 100506 | PPL | 1025 | 4 digits after each # (after first #) |
1 | 100506 | PPL | 1340 | 4 digits after each # (after second #) |
2 | 1656082 | PASM | 3333 | always take first 4 digits in field |
3 | 1656099 | DFMG | 0716 | always take first 4 digits in field |
3 | 1656099 | DFMG | 6008 | 4 digits after each # (after first #) |
3 | 1656099 | DFMG | 6104 | 4 digits after each # (after second #) |
CodePudding user response:
You can use str.findall
and a short regex to get the numbers, then explode
:
out = (df
.assign(BUKRS=df['BUKRS'].str.findall(r'(?:^|#)(\d{4})'))
.explode('BUKRS')
)
output:
ID_EXAMPLE KRED_NUMMER SYSTEM BUKRS
0 1 100506 PPL 0002
0 1 100506 PPL 1025
0 1 100506 PPL 1340
1 2 1656082 PASM 3333
2 3 1656099 DFMG 0716
2 3 1656099 DFMG 6008
2 3 1656099 DFMG 6104