I would like my code to consider [colour's] as 2 words [colour] & [s] and take the count for it in python. I tried doing in this way but causes many errors
import sys
from pathlib import Path
import re
text_file = Path(sys.argv[1])
if text_file.exists() and text_file.is_file():
read = text_file.read_text()
length = len(read.split())
addi = len(re.search(r'*.[["a-zA-Z"]]', text_file))
length = addi
print(f'{text_file} has', length, 'words')
else:
print(f'File not found: {text_file}')
CodePudding user response:
Perhaps you could use the function .split()
and re.findall
for your purpose.. With the latter function, you could count the number of words (with [color's] as 2 words) instead of looking for the individual words in group. For example
import re
read = "today is Color's birthday"
print(read.split())
print(len(read.split()))
read2 = re.findall(r'[a-zA-Z] ', read)
print(read2)
print(len(read2))
Output:
['today', 'is', "Color's", 'birthday']
4
['today', 'is', 'Color', 's', 'birthday']
5
CodePudding user response:
You can replace the apostrophe with some arbitrary whitespace character then count the length of the list created by string.split()
However, you may not want to replace all apostrophes. You almost certainly only want to replace apostrophes that are bounded by letters.
Therefore with a combination of re and string.split() you could do this:
import re
import sys
def word_count(filename):
with open(filename) as infile:
text = infile.read()
data = re.sub("(?<=[A-Za-z])['] (?=[A-Za-z])", ' ', text)
return len(data.split())
if len(sys.argv) > 1:
print(word_count(sys.argv[1]))