Python: While reading the file and counting the words in a line, I want to count words coming betwee-CodePudding

I have a file in which I have to count the number of words in each line, but there is a trick, whatever comes in between ' ' or " ", should be count as a single word.

Example file:

TopLevel  
    DISPLAY "In TopLevel. Starting to run program"  
    PERFORM OneLevelDown  
    DISPLAY "Back in TopLevel."  
    STOP RUN.

For above file the count of words in each line has to be as below:

Line: 1 has: 1 words  
Line: 2 has: 2 words  
Line: 3 has: 2 words  
Line: 4 has: 2 words  
Line: 5 has: 2 words

But I am getting as below:

Line: 1 has: 1 words  
Line: 2 has: 7 words  
Line: 3 has: 2 words  
Line: 4 has: 4 words  
Line: 5 has: 2 words

from os import listdir
from os.path import isfile, join

srch_dir = r'C:\Users\sagrawal\Desktop\File'

onlyfiles = [srch_dir '\\' f for f in listdir(srch_dir) if isfile(join(srch_dir, f))]

for i in onlyfiles:
index = 0
    with open(i,mode='r') as file:
        lst = file.readlines()
        for line in lst:
            cnt = 0
            index  = 1
            linewrds=line.split()
            for lwrd in linewrds:
                if lwrd:
                    cnt = cnt  1
            print('Line:',index,'has:',cnt,' words')

CodePudding user response：

If you only have this simple format (no nested quotes or escaped quotes), you could use a simple regex:

lines = '''TopLevel  
    DISPLAY "In TopLevel. Starting to run program"  
    PERFORM OneLevelDown  
    DISPLAY "Back in TopLevel."  
    STOP RUN.'''.split('\n')

import re
counts = [len(re.findall('\'.*?\'|".*?"|\w ', l))
          for l in lines]
# [1, 2, 2, 2, 2]

If not, you have to write a parser

CodePudding user response：

It seems that the code attached above doesn't care about ' or ". And here is the definition of str.split in Python here.

If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a None separator returns [].