I applied to a data engineer job not too long ago, I got a Python question that didn’t meet all the edge cases and it had been haunting me since, I used .endswith()
at that time and I feel like that’s what failed in my code
I have been trying to recode it and here is what I have so far:
x = 'cars that ran up and opened a
tattooaged car dealership educated'
# create a program to remove 'ed' from
# any word that ends with ed but not
# the word 'opened'
# also, every word must be less than
# 8 letters long
suffix= 'ed'
def check_ed_lt8(x):
x_list=x.split(" ")
for index,var in enumerate(x_list):
if suffix in var != 'opened':
new_word = var[:-len(suffix)].strip('suffix')
x_list[index] = new_word
elif len(var) >= 8:
shorter_word = var[:8]
x_list[index] = shorter_word
return(' '.join(x_list))
print(check_ed_lt8(x))
I get the desired output:
cars that ran up and opened a tatooag car dealersh educat
But the technical question had examples before it, like some words ending in ‘ly’ and I started wondering if I maybe just had to loop through a list of suffixes, and that’s why I don’t pass the edge cases so I modified my code but now, every time I add on to the list, I lose manipulation over one of the last items in the list
suffixes = ['ed', 'an']
def check_ed_lt8(x):
x_list=x.split(" ")
for index,var in enumerate(x_list):
for suffix in suffixes:
if suffix in var != 'opened':
new_word = var[:-len(suffix)].strip('suffix')
x_list[index] = new_word
elif len(var) >= 8:
shorter_word = var[:8]
x_list[index] = shorter_word
return(' '.join(x_list))
print(check_ed_lt8(x))
Returns:
cars that r up a opened a tattoag car dealersh educated
In this return, I lost manipulation over the last item AND I didn’t mean for “and” to lose “nd”. I know it lost it because of a combination of “d” and “n” from each prefix but I don’t know why
I lose more manipulation over the last few items the more items I place inside of the prefixes, for example if I add “ars” to the prefixes the outcome becomes:
c that r up a opened a tattoag car dealership educated
What am I doing wrong?
CodePudding user response:
I would suggest using re.sub for removing the ed at the end. Here is a one-liner:
import re
x = 'cars that ran up and opened a tattoo aged car dealership educated'
y = ' '.join([w if w == "opened" else re.sub(r'ed$', '', w)[:8] for w in x.split(' ')])
If you want to remove multiple suffixes, extend your regexp accordingly:
y = ' '.join([w if w == "opened" else re.sub(r'(ed|an)$', '', w)[:8] for w in x.split(' ')])
Of course you can also build the regexp based on a list of suffixes:
suffixes = ['ed','an']
pattern = re.compile('(' '|'.join(suffixes) ')$')
y = ' '.join([w if w == "opened" else pattern.sub('', w)[:8] for w in x.split(' ')])