Home > Software design >  Problem with finding the correct match with regex
Problem with finding the correct match with regex

Time:04-05

I have some data, which I'm trying to process. Basically I want to change all the commas , to semicolon ;, but some fields contain text, usernames or passwords that also contain commas. How do I change all the commas except the ones inclosed in "?

Test data:

Secret Name,URL,Username,Password,Notes,Folder,TOTP Key,TOTP Backup Codes
test1,,username,"pass,word",These are the notes,\Some\Folder,,
test2,,"user1, user2, user3","pass,word","Hello, I'm mr Notes",\Some\Folder,,
test3,http://1.2.3.4/ucsm/ucsm.jnlp,"xxxx\n(use Drop down, select Hello)",password,Use the following\nServer1\nServer2,\Some\Folder,,

What have I tried?

secrets = """Secret Name,URL,Username,Password,Notes,Folder,TOTP Key,TOTP Backup Codes
test1,,username,"pass,word",These are the notes,\Some\Folder,,
test2,,"user1, user2, user3","pass,word","Hello, I'm mr Notes",\Some\Folder,,
test3,http://1.2.3.4/ucsm/ucsm.jnlp,"xxxx\n(use Drop down, select Hello)",password,Use the following\nServer1\nServer2,\Some\Folder,,
"""

test = re.findall(r'(. ?\")(. )(\". )', secrets)

for line in test:
    part1, part2, part3 = line
    processed = "".join([part1.replace(",", ";"), part2, part3.replace(",", ";")])
    print(processed)

Result:

test1;;username;"pass,word";These are the notes;\Some\Folder;;
test2;;"user1, user2, user3","pass,word","Hello, I'm mr Notes";\Some\Folder;;

It works fine, when there's only one occurence of "" in the line and no line breaks, but when there are more or there's a line break within the quotations, it's broken. How can I fix this?

FYI: Notes can contain multiple line breaks.

CodePudding user response:

You don't need a regex here, take advantage of a CSV parser:

import csv, io

inp = csv.reader(io.StringIO(secrets), # or use file as input
                 quotechar='"', delimiter=',', quoting=csv.QUOTE_ALL)
with open('out.csv', 'w') as out:
    csv.writer(out, delimiter=';').writerows(inp)

output file:

Secret Name;URL;Username;Password;Notes;Folder;TOTP Key;TOTP Backup Codes
test1;;username;pass,word;These are the notes;\Some\Folder;;
test2;;user1, user2, user3;pass,word;Hello, I'm mr Notes;\Some\Folder;;
test3;http://1.2.3.4/ucsm/ucsm.jnlp;"xxxx
(use Drop down, select Hello)";password;Use the following
Server1
Server2;\Some\Folder;;

Optionally, use the quoting=csv.QUOTE_ALL parameter in csv.writer.

CodePudding user response:

This should do I believe:

import re
print( re.sub(r'("[^"]*")|,', lambda x: x.group(1) if x.group(1) else x.group().replace(",", ";"), secrets))
  • Related