I have a pandas dataframe. in which i have a column against each ID which is delimited by some delimiter.
I want to create some combination of those delimited strings by group.
The raw data is like below image.
ID paths
1 [ test1 ]--[ test2 ]--[ test3 ]--[ test4 ]--[ test5 ]--[ test6 ]--[ tets7 ]
2 [ test1 ]--[ test2 ]--[ test3 ]--[ test4 ]
I want the output which will look like this.
ID combination
1 [ test1 ]--[ test2 ]
1 [ test2 ]--[ test3 ]
1 [ test3 ]--[ test4 ]
1 [ test4 ]--[ test5 ]
1 [ test5 ]--[ test6 ]
1 [ test6 ]--[ tets7 ]
2 [ test1 ]--[ test2 ]
2 [ test2 ]--[ test3 ]
2 [ test3 ]--[ test4 ]
Can anyone help.
TIA
CodePudding user response:
You can use itertools.pairwise
to generate the pairs, then explode
into new rows.
NB. pairwise
requires python ≥ 3.10, for older versions, there is a recipe for it in the documentation
from itertools import pairwise
(df.assign(paths=[list(map('--'.join, pairwise(s.split('--'))))
for s in df['paths']])
.explode('paths')
)
output:
ID paths
0 1 [ test1 ]--[ test2 ]
0 1 [ test2 ]--[ test3 ]
0 1 [ test3 ]--[ test4 ]
0 1 [ test4 ]--[ test5 ]
0 1 [ test5 ]--[ test6 ]
0 1 [ test6 ]--[ tets7 ]
1 2 [ test1 ]--[ test2 ]
1 2 [ test2 ]--[ test3 ]
1 2 [ test3 ]--[ test4 ]
alternative using groupby
shift
:
(df.assign(paths=df['paths'].str.split('--'))
.explode('paths')
.assign(paths=lambda d: d['paths'].groupby(d['ID']).shift() '--' d['paths'])
.dropna(subset=['paths'])
)