Home > OS >  create combination of delimited strings by group in pandas
create combination of delimited strings by group in pandas

Time:02-16

I have a pandas dataframe. in which i have a column against each ID which is delimited by some delimiter.

I want to create some combination of those delimited strings by group.

The raw data is like below image.

ID  paths
1   [ test1 ]--[ test2 ]--[ test3 ]--[ test4 ]--[ test5 ]--[ test6 ]--[ tets7 ]
2   [ test1 ]--[ test2 ]--[ test3 ]--[ test4 ]

I want the output which will look like this.

ID  combination
1   [ test1 ]--[ test2 ]
1   [ test2 ]--[ test3 ]
1   [ test3 ]--[ test4 ]
1   [ test4 ]--[ test5 ]
1   [ test5 ]--[ test6 ]
1   [ test6 ]--[ tets7 ]
2   [ test1 ]--[ test2 ]
2   [ test2 ]--[ test3 ]
2   [ test3 ]--[ test4 ]

Can anyone help.

TIA

CodePudding user response:

You can use itertools.pairwise to generate the pairs, then explode into new rows.

NB. pairwise requires python ≥ 3.10, for older versions, there is a recipe for it in the documentation

from itertools import pairwise

(df.assign(paths=[list(map('--'.join, pairwise(s.split('--'))))
                  for s in df['paths']])
   .explode('paths')
)

output:

   ID                 paths
0   1  [ test1 ]--[ test2 ]
0   1  [ test2 ]--[ test3 ]
0   1  [ test3 ]--[ test4 ]
0   1  [ test4 ]--[ test5 ]
0   1  [ test5 ]--[ test6 ]
0   1  [ test6 ]--[ tets7 ]
1   2  [ test1 ]--[ test2 ]
1   2  [ test2 ]--[ test3 ]
1   2  [ test3 ]--[ test4 ]
alternative using groupby shift:
(df.assign(paths=df['paths'].str.split('--'))
   .explode('paths')
   .assign(paths=lambda d: d['paths'].groupby(d['ID']).shift() '--' d['paths'])
   .dropna(subset=['paths'])
)
  • Related