I have a file in which I have to count the number of words in each line, but there is a trick, whatever comes in between ' ' or " ", should be count as a single word.
Example file:
TopLevel
DISPLAY "In TopLevel. Starting to run program"
PERFORM OneLevelDown
DISPLAY "Back in TopLevel."
STOP RUN.
For above file the count of words in each line has to be as below:
Line: 1 has: 1 words
Line: 2 has: 2 words
Line: 3 has: 2 words
Line: 4 has: 2 words
Line: 5 has: 2 words
But I am getting as below:
Line: 1 has: 1 words
Line: 2 has: 7 words
Line: 3 has: 2 words
Line: 4 has: 4 words
Line: 5 has: 2 words
from os import listdir
from os.path import isfile, join
srch_dir = r'C:\Users\sagrawal\Desktop\File'
onlyfiles = [srch_dir '\\' f for f in listdir(srch_dir) if isfile(join(srch_dir, f))]
for i in onlyfiles:
index = 0
with open(i,mode='r') as file:
lst = file.readlines()
for line in lst:
cnt = 0
index = 1
linewrds=line.split()
for lwrd in linewrds:
if lwrd:
cnt = cnt 1
print('Line:',index,'has:',cnt,' words')
CodePudding user response:
If you only have this simple format (no nested quotes or escaped quotes), you could use a simple regex:
lines = '''TopLevel
DISPLAY "In TopLevel. Starting to run program"
PERFORM OneLevelDown
DISPLAY "Back in TopLevel."
STOP RUN.'''.split('\n')
import re
counts = [len(re.findall('\'.*?\'|".*?"|\w ', l))
for l in lines]
# [1, 2, 2, 2, 2]
If not, you have to write a parser
CodePudding user response:
It seems that the code attached above doesn't care about '
or "
.
And here is the definition of str.split
in Python here.
If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a None separator returns [].