I have some data, which I'm trying to process. Basically I want to change all the commas ,
to semicolon ;
, but some fields contain text, usernames or passwords that also contain commas. How do I change all the commas except the ones inclosed in "
?
Test data:
Secret Name,URL,Username,Password,Notes,Folder,TOTP Key,TOTP Backup Codes
test1,,username,"pass,word",These are the notes,\Some\Folder,,
test2,,"user1, user2, user3","pass,word","Hello, I'm mr Notes",\Some\Folder,,
test3,http://1.2.3.4/ucsm/ucsm.jnlp,"xxxx\n(use Drop down, select Hello)",password,Use the following\nServer1\nServer2,\Some\Folder,,
What have I tried?
secrets = """Secret Name,URL,Username,Password,Notes,Folder,TOTP Key,TOTP Backup Codes
test1,,username,"pass,word",These are the notes,\Some\Folder,,
test2,,"user1, user2, user3","pass,word","Hello, I'm mr Notes",\Some\Folder,,
test3,http://1.2.3.4/ucsm/ucsm.jnlp,"xxxx\n(use Drop down, select Hello)",password,Use the following\nServer1\nServer2,\Some\Folder,,
"""
test = re.findall(r'(. ?\")(. )(\". )', secrets)
for line in test:
part1, part2, part3 = line
processed = "".join([part1.replace(",", ";"), part2, part3.replace(",", ";")])
print(processed)
Result:
test1;;username;"pass,word";These are the notes;\Some\Folder;;
test2;;"user1, user2, user3","pass,word","Hello, I'm mr Notes";\Some\Folder;;
It works fine, when there's only one occurence of ""
in the line and no line breaks, but when there are more or there's a line break within the quotations, it's broken. How can I fix this?
FYI: Notes can contain multiple line breaks.
CodePudding user response:
You don't need a regex here, take advantage of a CSV parser:
import csv, io
inp = csv.reader(io.StringIO(secrets), # or use file as input
quotechar='"', delimiter=',', quoting=csv.QUOTE_ALL)
with open('out.csv', 'w') as out:
csv.writer(out, delimiter=';').writerows(inp)
output file:
Secret Name;URL;Username;Password;Notes;Folder;TOTP Key;TOTP Backup Codes
test1;;username;pass,word;These are the notes;\Some\Folder;;
test2;;user1, user2, user3;pass,word;Hello, I'm mr Notes;\Some\Folder;;
test3;http://1.2.3.4/ucsm/ucsm.jnlp;"xxxx
(use Drop down, select Hello)";password;Use the following
Server1
Server2;\Some\Folder;;
Optionally, use the quoting=csv.QUOTE_ALL
parameter in csv.writer
.
CodePudding user response:
This should do I believe:
import re
print( re.sub(r'("[^"]*")|,', lambda x: x.group(1) if x.group(1) else x.group().replace(",", ";"), secrets))