Substitute commas after a certain amount of pipes-CodePudding

I have the following string

s = 'AAA\nA|A33, 3|BB,C|CC,C|CC555|AVENUE ,STREET ,POTATO ,JOSPH'
s = 'AAA\nA|A33, 3|BB,C|CC,C|STREET ,POTATO ,JOSPH'

What I want to do is take the values after the "last pipeline". And substitute all the commas for '|'. Important infos, there is a chance of having empty spaces and commas before handed, yes the pipeline varies the amount. (Just noticed now)

My earlier attempt:

print(re.sub(r'[|]{5}',"|",s))

CodePudding user response：

You may use this re.sub with a lambda:

import re

s = 'AAAA,LTD|A333|BBC|CCC|CC555|AVENUE ,STREET ,POTATO ,JOSPH'

print (re.sub(r'^((?:[^|]*\|){5})(.*)', lambda m: m[1]   m[2].replace(',', '|'), s))

Output:

AAAA,LTD|A333|BBC|CCC|CC555|AVENUE |STREET |POTATO |JOSPH

RegEx Breakup:

^: Start
(: Start capture group #1
- (?:: Start non-capture group
  - [^|]*: Match 0 or more of any char that is not |
  - \|: Match a |
- ){5}: End non-capture group. Repeat this group 5 times
): End capture group #1
(.*): Match and capture remaining text in capture group #2
In lambda code we replace , with | in 2nd capture group only

CodePudding user response：

You can try this code without regex

s = 'AAAA|A333|BBC|CCC|CC555|AVENUE ,STREET ,POTATO ,JOSPH'
s.split('|')[5].replace(',', '|')

CodePudding user response：

Split the string at the | characters. Do the comma replacements in the 6th element of that list, then join them back together.

fields = s.split('|')
fields[5] = fields[5].replace('|', ',')
s = '|'.join(fields)

CodePudding user response：

One alternative -I assume you want the first part to remain as it is. This will work for any number of commas or white spaces. Example strings -

s1= r'AAA\nA|A33, 3|BB,C|CC,C|CC555|AVENUE ,STREET , POTATO ,JOSPH'
s2=r'AAA\nA|A33, 3|BB,C|CC,C|CC555|AVENUE ,STREET ,,,, POTATO ,JOSPH'
s3=r'AAA\nA|A33, 3|BB,C|CC,C|CC555|AVENUE ,STREET ,             POTATO ,JOSPH'

Code :

m=re.sub('[ ]{0,}[,]{1,}[ ]{0,}',r'|',re.search(r'[^|] $',s)[0])
o=re.search('(.*)[|]',s)[0]
print(o m)

Output:

AAA\nA|A33, 3|BB,C|CC,C|CC555|AVENUE|STREET|POTATO|JOSPH