Home > Software design >  Filter specific rows where blankspaces are found in two continuous rows
Filter specific rows where blankspaces are found in two continuous rows

Time:02-12

I have an input file (File A) as shown below:

1015090    3919032                         
2115090    3919032      3919032
3215090    3919032      3919032
4315090    3919032      3919032
5415090    3919032      3919032
6515090    3919032      3919032
7615090    3919032      3919032
8715090    3919032      3919032
9815069    3919032              <----- This row needs to be found
2015089    3919032              <----- This row needs to be found 
2115089    3919032      3919032
2215069    3919032      3919032
2315069    3919032      3919032
2415089    3919032      3919032
2515089    3919032      3919032
2615089    3919032      3919032
2715069    3919032      3919032
2815069    3919032      3919032
2915069    3919032      
3015069    3919032      3919032
3115069    3919032      3919032

I need to find blankspace in 3rd column only when it comes continuously in 2 rows together. When 2 rows are found together, then I need to write this specific rows(column 1 and column 2) to an output file.

**Output:**

9815069    3919032             
2015089    3919032   
   

I am new to Python and Could anyone tell me, how it can be done using python. Thanks in Advance

CodePudding user response:

I prefer to use a readline method and a stack just in case the file size is very large.

import re

pat = re.compile(r'^\d \s \d \s*$')

with open('File_A.txt', 'r') as fr:
    with open('File_A_Out.txt', 'a ') as fw:
        stack = []
        line = True
        while line:
            line = fr.readline()
            if pat.search(line): 
                if stack:
                    fw.write(stack.pop()   line)
                else:
                    stack.append(line)
            else:
                stack = []

CodePudding user response:

One regex approach would be to read all lines into a string, then search for lines matching the pattern ^\d \s \d \s*$:

with open('input.txt') as f:
    lines = f.read()

matches = re.findall(r'^\d \s \d \s*?\r?\n\d \s \d \s*$', lines, flags=re.M)
  • Related