I've embarked on a reasonably dumb linguistics project to learn regular expressions in Python. I'm pretty sure I could avoid the multiple passes over the same string, and find a more "compact" and "pythonic" way to do what I'm trying to do, which is: calculate using regex whether 'Y|y' in a word is a vowel or a consonant. At the bottom of the code segment, I've put in a comment block 20 words containing 12 vowel y's and 9 consonant y's. Seems like the code could be simplified and the re.compile lines merged together.
import re
vowelRegex = re.compile(r'[aeiouAEIOU]')
consoRegex = re.compile(r'[b-df-hj-np-tv-xzB-DF-HJ-NP-TV-XZ]')
yconsRegex = re.compile(r'[aeiou]y[aeiou]')
ycon2Regex = re.compile(r'\bY')
yVowlRegex = re.compile(r'[b-df-hj-np-tv-xzB-DF-HJ-NP-TV-XZ]y[b-df-hj-np-tv-xz]')
yVow2Regex = re.compile(r'y\b')
#thestring = 'Sky Family Yurt Germany Crypt Day New York Pennsylvania Myth Hungry Yolk Year Bayou Yak Silly Beyond Dynamite Mystery Yacht Yoda'
#thestring = 'Crypt Pennsylva Myth Dynamite Mystery'
#thestring='RoboCop eats baby food. Pennsylvania Baby Food in the bayou. And, New York is where I\'d Rather be!'
thestring='violent irrational intolerant allied to racism and ' \
'tribalism bigotry invested in ignorance and hostile to free '\
'inquiry contemptuous of women and coercive towards children ' \
'organized religion ought to have a great deal on its conscience ' \
'Yak yacht beyond mystery'
fun=vowelRegex.findall(thestring)
nofun=consoRegex.findall(thestring)
funny = yVowlRegex.findall(thestring)
foony = []
for f in funny:
foony.append (f[1])
fun = foony
fun = yVow2Regex.findall(thestring)
notfunny = yconsRegex.findall(thestring)
foony = []
for f in notfunny:
foony.append (f[1])
nofun = foony
nofun = ycon2Regex.findall(thestring)
print(thestring)
print('Vowels:',''.join(fun), len(''.join(fun)))
print('Consos:',''.join(nofun), len(''.join(nofun)))
'''
Sky Vowel; endswith 1
Family Vowel; endswith 2
Yurt Consonant; begswith 1
Germany Vowel; endswith 3
Crypt Vowel; sandwiched 1
Day Vowel; endswith 4
New York Consonant; begswith 2
Pennsylva Vowel; sandwiched 2
Myth Vowel; sandwiched 3
Hungry Vowel; endswith 5
Yolk Consonant; begswith 3
Year Consonant; begswith 4
Bayou Consonanwich 1
Yak Consonant; begswith 5
Silly Vowel; endswith 6
Beyond Consonanwich 2
Dynamite Vowel; sandwiched 4
Mystery Vowel; sandwiched, Vowel; endswith!
Yacht Consonant; begswith 6
Yoda Consonant; begswith 7
'''
CodePudding user response:
You can use an or operator in regex, that could reduce it a bit. For example:
yVowlRegex = re.compile(r'[b-df-hj-np-tv-xzB-DF-HJ-NP-TV-XZ]y[b-df-hj-np-tv-xz]|y\b')
now includes both yVowl and yVow2
CodePudding user response:
@Joshua-Lewis answer led me to the following way to streamline the code above:
import re
vowelRegex = re.compile(r'[aeiouAEIOU]|[b-df-hj-np-tv-xzB-DF-HJ-NP-TV-XZ]y[b-df-hj-np-tv-xz]|y\b')
consoRegex = re.compile(r'[b-df-hj-np-tv-xzB-DF-HJ-NP-TV-XZ]|[aeiou]y[aeiou]|\bY')
vowelRescan = re.compile(r'[aeiouyAEIOUY]')
consoRescan = re.compile(r'[b-df-hj-np-tv-xyzB-DF-HJ-NP-TV-XYZ]')
thestring='any and every religion is violent irrational intolerant '\
'allied to racism and tribalism bigotry invested in ignorance and '\
'hostile to free inquiry contemptuous of women and coercive towards '\
'children organized religion ought to have a great deal on its '\
'conscience why it continues toward the 22nd century ACE is a mystery '\
'known only to New Yorkers and lovers of the bayou'
fun=vowelRegex.findall(thestring)
funn=''.join(fun)
fun = ''.join(vowelRescan.findall(funn))
nofun=consoRegex.findall(thestring)
nofunn=''.join(nofun)
nofun=''.join(consoRescan.findall(nofunn))
print(thestring)
print('Vowels:',fun, len(fun))
print('Consos:',nofun, len(nofun))
'''
Sky Vowel; endswith 1
Family Vowel; endswith 2
Yurt Consonant; begswith 1
Germany Vowel; endswith 3
Crypt Vowel; sandwiched 1
Day Vowel; endswith 4
New York Consonant; begswith 2
Pennsylva Vowel; sandwiched 2
Myth Vowel; sandwiched 3
Hungry Vowel; endswith 5
Yolk Consonant; begswith 3
Year Consonant; begswith 4
Bayou Consonanwich 1
Yak Consonant; begswith 5
Silly Vowel; endswith 6
Beyond Consonanwich 2
Dynamite Vowel; sandwiched 4
Mystery Vowel; sandwiched, Vowel; endswith!
Yacht Consonant; begswith 6
Yoda Consonant; begswith 7
'''