How can I apply merge function or any other method on column A. For example in layman term I want to convert this string "(A|B|C,D)|(A,B|C|D)|(B|C|D)" into a "(D A|D B|D C)|(A B|A C|A D)|(B|C|D)"
This (B|C|D) will remain same as it doesn't have comma value to merge in it. Basically I want to merge the values which are in commas to rest of its other values.
I have below data frame.
import pandas as pd
data = {'A': [ '(A|B|C,D)|(A,B|C|D)|(B|C|D)'],
'B(Expected)': [ '(D A|D B|D C)|(A B|A C|A D)|(B|C|D)']
}
df = pd.DataFrame(data)
print (df)
My expected result is mentioned in column B(Expected)
Below method I tried:- (1)
df['B(Expected)'] = df['A'].apply(lambda x: x.replace("|", " ").replace(",", "|") if "|" in x and "," in x else x)
(2)
# Split the string by the pipe character
df['string'] = df['string'].str.split('|')
df['string'] = df['string'].apply(lambda x: '|'.join([' '.join(i.split(' ')) for i in x]))
CodePudding user response:
You can use a regex to extract the values in parentheses, then a custom function with itertools.product
to reorganize the values:
from itertools import product
def split(s):
return '|'.join([' '.join(x) for x in product(*[x.split('|') for x in s.split(',')])])
df['B'] = df['A'].str.replace(r'([^()] )', lambda m: split(m.group()), regex=True)
print(df)
Note that this requires non-nested parentheses.
Output:
A B
0 (A|B|C,D)|(A,B|C|D)|(B|C|D) (A D|B D|C D)|(A B|A C|A D)|(B|C|D)