want to apply merge function on column A-CodePudding

How can I apply merge function or any other method on column A. For example in layman term I want to convert this string "(A|B|C,D)|(A,B|C|D)|(B|C|D)" into a "(D A|D B|D C)|(A B|A C|A D)|(B|C|D)"

This (B|C|D) will remain same as it doesn't have comma value to merge in it. Basically I want to merge the values which are in commas to rest of its other values.

I have below data frame.

import pandas as pd

data = {'A': [ '(A|B|C,D)|(A,B|C|D)|(B|C|D)'],
        'B(Expected)': [ '(D A|D B|D C)|(A B|A C|A D)|(B|C|D)']
        }

df = pd.DataFrame(data)

print (df)

My expected result is mentioned in column B(Expected)

Below method I tried:- (1)

df['B(Expected)'] = df['A'].apply(lambda x: x.replace("|", " ").replace(",", "|") if "|" in x and "," in x else x)

(2)

# Split the string by the pipe character
df['string'] = df['string'].str.split('|')
df['string'] = df['string'].apply(lambda x: '|'.join([' '.join(i.split(' ')) for i in x]))

CodePudding user response：

You can use a regex to extract the values in parentheses, then a custom function with itertools.product to reorganize the values:

from itertools import product

def split(s):
    return '|'.join([' '.join(x) for x in product(*[x.split('|') for x in s.split(',')])])

df['B'] = df['A'].str.replace(r'([^()] )', lambda m: split(m.group()), regex=True)

print(df)

Note that this requires non-nested parentheses.

Output:

                             A                                    B
0  (A|B|C,D)|(A,B|C|D)|(B|C|D)  (A D|B D|C D)|(A B|A C|A D)|(B|C|D)