Program for considering a word such as colour's as 2 words-CodePudding

I would like my code to consider [colour's] as 2 words [colour] & [s] and take the count for it in python. I tried doing in this way but causes many errors

import sys
from pathlib import Path
import re

text_file = Path(sys.argv[1])

if text_file.exists() and text_file.is_file():
    read = text_file.read_text()
    length = len(read.split())
    addi = len(re.search(r'*.[["a-zA-Z"]]', text_file))
    length  = addi
    print(f'{text_file} has', length, 'words')
else:
    print(f'File not found: {text_file}')

CodePudding user response：

Perhaps you could use the function .split() and re.findall for your purpose.. With the latter function, you could count the number of words (with [color's] as 2 words) instead of looking for the individual words in group. For example

import re

read = "today is Color's birthday"
print(read.split())
print(len(read.split()))

read2 = re.findall(r'[a-zA-Z] ', read)
print(read2)
print(len(read2))

Output:

['today', 'is', "Color's", 'birthday']
4
['today', 'is', 'Color', 's', 'birthday']
5

CodePudding user response：

You can replace the apostrophe with some arbitrary whitespace character then count the length of the list created by string.split()

However, you may not want to replace all apostrophes. You almost certainly only want to replace apostrophes that are bounded by letters.

Therefore with a combination of re and string.split() you could do this:

import re
import sys

def word_count(filename):
    with open(filename) as infile:
        text = infile.read()
        data = re.sub("(?<=[A-Za-z])['] (?=[A-Za-z])", ' ', text)
        return len(data.split())

if len(sys.argv) > 1:
    print(word_count(sys.argv[1]))