I have a CSV file which contains lines of sql entries. Each sql entry, contains message which are SOH ("\x01"
) delimited and are tag=value pairs.
8=Fix1.1<SOH>9=70<SOH>35=AE<SOH>10=237
8=Fix1.1<SOH>9=71<SOH>35=AE<SOH>10=238
8=Fix1.1<SOH>9=72<SOH>35=AE<SOH>10=239
8=Fix1.1<SOH>9=73<SOH>35=AE<SOH>10=240
(<SOH>
is a placeholder for the actual character because Stack Overflow wouldn't let me include the \x01
character in the text)
Issue:
- The below code snippet removes SOH to "," as expected, however, having trouble removing the tag part from the lines.
# Read in the file
with open('file.txt', 'r') as file :
filedata = file.read()
# Replace the target string
filedata = filedata.replace('x01', ',')
filedata2 = filedata.replace("=", ",")
# Write the file out again
with open('file.txt', 'w') as file:
file.write(filedata2)
Output:
8,Fix1.1,9,70,35,AE,10,237
8,Fix1.1,9,71,35,AE,10,238
8,Fix1.1,9,72,35,AE,10,239
8,Fix1.1,9,73,35,AE,10,240
I've also tried regex = re.compile ("[=]") and then loop into line reader and modify, but just returns all [=] in the print.
Desired output:
Fix1.1,70,AE,237
Fix1.1,71,AE,238
Fix1.1,72,AE,239
Fix1.1,73,AE,240
CodePudding user response:
Use regex re.sub to replace all occurrences of e.g. <SOH>9=
with a comma ,
import re
# Other patterns: r"(^|<SOH>)\d =", r"(^|<.*?>).*?="
to_replace = re.compile(r"[^\w]*\w =")
with open('file.txt', 'r') as file:
for line in file:
line_updated = to_replace.sub(",", line).lstrip(",")
print(line_updated)
Output
Fix1.1,70,AE,237
Fix1.1,71,AE,238
Fix1.1,72,AE,239
Fix1.1,73,AE,240
CodePudding user response:
Use csv.reader
with delimiter="\x01"
to split by the SOH
character.
Then, as you read each line, split each element by "="
and keep only the values.
import csv
filedata = []
with open('file.txt', 'r') as file:
reader = csv.reader(file, delimiter="\x01")
for row in reader:
# Split each item in row
# Keep only second element of each split
values = [item.split("=", 1)[1] for item in row]
filedata.append(values)
print(filedata)
which gives
[['Fix1.1', '70', 'AE', '237'],
['Fix1.1', '71', 'AE', '238'],
['Fix1.1', '72', 'AE', '239'],
['Fix1.1', '73', 'AE', '240']]