Home > Mobile >  Remove lines and variables in file using a regex (python)
Remove lines and variables in file using a regex (python)

Time:09-17

I have a file that I need to read. The file I'm reading is as follows:

module first(x,y,z);
   output x;
   input y;
   input [0:7] z;
# z is later used in the file and occurs more than twice
endmodule
.....

After reading the above file I need to look for any lines that follow the pattern of input identifier; or input [digit:digit] identifier; For that I have made a regular expression as follows:

pattern = re.compile(r'(input)(\s*)(\[*\d*\d*\d*:*\d*\d*\d*\]*)(\s )(\w )')

I then need to find the variable name (identifier). In the above example in input y; . I would need to find the y. If this variable occurs twice or less I need to remove it. So, if y occurred only twice in the file I would remove the input y line and y in the parenthesis. So the file would change to

module first(x,z);
   output x;
   input [0:7] z;
# z occurs more than twice in the file
endmodule
.....

The code I have written is as follows:

pattern = re.compile(r'(input)(\s*)(\[*\d*\d*\d*:*\d*\d*\d*\]*)(\s )(\w )')
with open('filename' ,'r') as f:
   data = f.read()
result = re.search(pattern, data)
identifier = search_result.group(5) 

The above code allows me to only get the first variable name in an input, in the file and that's it. I tried putting it in a for loop but that didn't work either.

After I find all the variable names declared as input I want to check if the variable name occurred twice or less in the file. If it occurs less than twice, then I would have to remove both it's declaration line as in input y and in parenthesis.

How would I go about this?

CodePudding user response:

pattern = re.compile(r'(input)(\s*)([\d\d*\d*:\d\d*\d*]*)(\s )(\w )')

If all we need is the variable name part in the end of the string, then we can skip already matching those brackets as we can just match the end part. Then, we also need another regex to match the module lines.

You can iterate over the file contents twice:

  • first iteration to count the occurrences of the variable names with the "input" code
  • then second iteration to delete/reconstruct the lines where the variable names with a count of less than or equal 2 exists in the line
from collections import Counter
import re

# Read the input file
with open('input.txt') as file_input:
    contents = file_input.readlines()

# Initialize the counters for the variable names declared in both the <input> and <module> lines
input_names = Counter()
module_names = Counter()

# Prepare the regex matchers to capture the variable names declared in both the <input> and <module> lines
input_re = re.compile(r"^.*input.*?(\w );$")
module_re = re.compile(r"^module(.*?)\((.*)\);$")

# Iterate over each line in the file content to count all the variable names. No deletions yet, just counting.
for line in contents:
    if match := input_re.match(line):
        # If the line is an <input> line, extract and count the variable name.
        input_names[match.group(1)]  = 1
    elif match := module_re.match(line):
        # If the line is a <module> line, extract and count the variable names separated by a comma.
        splits = match.group(2).split(',')
        module_names  = Counter(splits)

# Get the names to delete, which are the variables in <input> that are only used once or twice.
names_to_delete = set()
for key in input_names:
    if input_names[key]   module_names.get(key, 0) <= 2:
        # So if the variable in <input> is only used twice or less, mark it for deletion.
        names_to_delete.add(key)

# Re-iterate over each line. This time, delete the variable names to be deleted.
contents_updated = ""
for line in contents:
    if match := input_re.match(line):
        if match.group(1) in names_to_delete:
            # If the line is an <input> line and contains the variable to delete, remove the whole line.
            line = ""
    elif match := module_re.match(line):
        # If the line is a <module> line, remove the variables that are to be deleted. Reconstruct the line without those variables.
        splits = match.group(2).split(',')
        line = module_re.sub(
            r'module\1('   ",".join(filter(lambda value: value not in names_to_delete, splits))   ");",
            line,
        )

    contents_updated  = line

# Write the updated contents to a new file
with open('output.txt', 'w') as file_output:
    file_output.write(contents_updated)

print("Deleted names:", names_to_delete)

input.txt

module first(x,y,z);
    output x;
    input y;
    input [0:7] z;
input [1:2] abc;
input [3:4] jkl;
module second(abc,def,ghi,jkl,mno);
    input mno;
module third(z,jkl);
module fourth(pq);
    input pq;
module fourth(rstu);
module fifth(def,x);
input v;

output.txt

module first(x,z);
    output x;
    input [0:7] z;
input [3:4] jkl;
module second(def,ghi,jkl);
module third(z,jkl);
module fourth();
module fourth(rstu);
module fifth(def,x);

Execution output:

Deleted names: {'pq', 'y', 'v', 'abc', 'mno'}

enter image description here

  • Related