How to replace string occurrences in python file?-CodePudding

I have a CSV file which contains lines of sql entries. Each sql entry, contains message which are SOH ("\x01") delimited and are tag=value pairs.

8=Fix1.1<SOH>9=70<SOH>35=AE<SOH>10=237
8=Fix1.1<SOH>9=71<SOH>35=AE<SOH>10=238
8=Fix1.1<SOH>9=72<SOH>35=AE<SOH>10=239
8=Fix1.1<SOH>9=73<SOH>35=AE<SOH>10=240

(<SOH> is a placeholder for the actual character because Stack Overflow wouldn't let me include the \x01 character in the text)

Issue:

The below code snippet removes SOH to "," as expected, however, having trouble removing the tag part from the lines.

# Read in the file
with open('file.txt', 'r') as file :
  filedata = file.read()

# Replace the target string
filedata = filedata.replace('x01', ',') 
filedata2 = filedata.replace("=", ",") 
# Write the file out again
with open('file.txt', 'w') as file:
  file.write(filedata2)

Output:

8,Fix1.1,9,70,35,AE,10,237
8,Fix1.1,9,71,35,AE,10,238
8,Fix1.1,9,72,35,AE,10,239
8,Fix1.1,9,73,35,AE,10,240

I've also tried regex = re.compile ("[=]") and then loop into line reader and modify, but just returns all [=] in the print.

Desired output:

Fix1.1,70,AE,237

Fix1.1,71,AE,238

Fix1.1,72,AE,239

Fix1.1,73,AE,240

CodePudding user response：

Use regex re.sub to replace all occurrences of e.g. <SOH>9= with a comma ,

import re

# Other patterns: r"(^|<SOH>)\d =", r"(^|<.*?>).*?="
to_replace = re.compile(r"[^\w]*\w =")

with open('file.txt', 'r') as file:
    for line in file:
        line_updated = to_replace.sub(",", line).lstrip(",")
        print(line_updated)

Output

Fix1.1,70,AE,237
Fix1.1,71,AE,238
Fix1.1,72,AE,239
Fix1.1,73,AE,240

CodePudding user response：

Use csv.reader with delimiter="\x01" to split by the SOH character. Then, as you read each line, split each element by "=" and keep only the values.

import csv

filedata = []
with open('file.txt', 'r') as file:
    reader = csv.reader(file, delimiter="\x01")
    for row in reader:
        # Split each item in row
        # Keep only second element of each split
        values = [item.split("=", 1)[1] for item in row]
        filedata.append(values)

print(filedata)

which gives

[['Fix1.1', '70', 'AE', '237'], 
 ['Fix1.1', '71', 'AE', '238'], 
 ['Fix1.1', '72', 'AE', '239'], 
 ['Fix1.1', '73', 'AE', '240']]