The break
here is bothering me; after extensive research I want to ask if there is a pythonic way to convert this to a while
loop:
import re
file = open('parse.txt', 'r')
html = file.readlines()
def cleanup():
result = []
for line in html:
if "<li" and "</li>" in line:
stripped = re.sub(r'[\n\t]*<[^<] ?>', '', line).rstrip()
quoted = f'"{stripped}"'
result.append(quoted)
elif "INSTRUCTIONS" in line:
break
return ",\n".join(result)
I really am trying to practice designing more efficient loops.
added parse.txt
<p style="text-align:justify"><strong><span style="background-color:#ecf0f1">INGREDIENTS</span></strong></p>
<li style="text-align:justify"><span style="background-color:#ecf0f1">3 lb ground beef (80/20)</span></li>
<ul>
<li style="text-align:justify"><span style="background-color:#ecf0f1">1 large onion, chopped</span></li>
<li style="text-align:justify"><span style="background-color:#ecf0f1">2-3 cloves garlic, minced</span></li>
<li style="text-align:justify"><span style="background-color:#ecf0f1">2 jalapeño peppers, roasted, peeled, de-seeded, chopped</span></li>
<li style="text-align:justify"><span style="background-color:#ecf0f1">4-5 roma tomatoes, roasted peeled, chopped</span></li>
<li style="text-align:justify"><span style="background-color:#ecf0f1">1 15 oz can kidney beans, strained and washed</span></li>
<li style="text-align:justify"><span style="background-color:#ecf0f1">2 tsp salt</span></li>
<li style="text-align:justify"><span style="background-color:#ecf0f1">2 tsp black pepper</span></li>
<li style="text-align:justify"><span style="background-color:#ecf0f1">2 tsp cumin</span></li>
<li style="text-align:justify"><span style="background-color:#ecf0f1">¼ - ½ tsp cayenne pepper</span></li>
<li style="text-align:justify"><span style="background-color:#ecf0f1">1 tsp garlic powder</span></li>
<li style="text-align:justify"><span style="background-color:#ecf0f1">1 tsp Mexican oregano</span></li>
<li style="text-align:justify"><span style="background-color:#ecf0f1">1 tsp paprika</span></li>
<li style="text-align:justify"><span style="background-color:#ecf0f1">1 tsp smoked paprika</span></li>
<li style="text-align:justify"><span style="background-color:#ecf0f1">3 cups chicken broth</span></li>
<li style="text-align:justify"><span style="background-color:#ecf0f1">2 tbsp tomato paste</span></li>
</ul>
<p style="text-align:justify"><strong>INSTRUCTIONS</strong></p>
<ol>
<li style="text-align:justify">Heat a large put or Dutch oven over medium-high heat and brown the beef, while stirring to break it up. Cook until no longer pink. Drain out the liquid.</li>
<li style="text-align:justify">Stir in onions and cook for about 5 minutes until they are pale and soft. Add in minced garlic and jalapeño peppers, stirring for another minute.</li>
<li style="text-align:justify">Stir in the chopped tomatoes, all the spices, and tomato paste until well-distributed and tomato paste has broken up, then follow with the broth. Allow the pot to come to a gentle boil over medium heat, uncovered for about 20 minutes.</li>
<li style="text-align:justify">Reduce heat to low, cover and simmer for at least 3 hours, until liquid has reduced.</li>
<li style="text-align:justify">During the last 20-30 minutes of cook time, add in the kidney beans; uncover and allow liquid to reduce further during this time.</li>
<
li style="text-align:justify">Serve hot with jalapeño cornbread muffins, shredded cheese, avocado chunks, chopped cilantro, chopped green onion, tortilla chips.</li>
</ol>
CodePudding user response:
We could move the break
into another function,
if you feel it improves the clarity of the top-level function.
The core notion is that we've no need to lavish
attention on a few million irrelevant lines that
might follow an occurrence of the "INSTRUCTIONS" terminator.
Two observations on the topic of "laziness":
- Notice that we compile the regex just once.
- We invoke the regex only on the subset of lines where it's needed.
def up_through_instructions(filespec):
with open(filespec) as f:
for line in f:
yield line
if "INSTRUCTIONS" in line:
break
def cleanup():
pattern = re.compile("[\n\t]*<[^<] ?>")
result = []
for line in up_through_instructions("parse.txt"):
if "<li" in line and "</li>" in line:
stripped = pattern.sub("", line).rstrip()
result.append(f'"{stripped}"')
return ",\n".join(result)
Ok, fine, let's say for some crazy reason break
is not allowed to be part of the design space.
We've a Turing machine at our disposal. It's not that hard to code it in an alternate way. But what matters is: "Would future engineers find the alternate approach easier to maintain?"
(Spoiler: my vote is "no!")
def alternate_up_through_instructions(filespec):
with open(filespec) as f:
done = False
while not done:
try:
line = next(f)
yield line
done = "INSTRUCTIONS" in line
except EOFError:
done = True
I really do not recommend adopting this approach.
The try
is expensive, and we're doing it many many
times. Not only is a for
iteration the more
natural pythonic way to approach this, but it
is far more efficient.
Notice that there are two terminators we must respect: "INSTRUCTIONS" and EOF. One we test for, the other is detected via an exception.
CodePudding user response:
You can use the underlying iterator Python provides to turn your for-loop into a while-loop without a break:
def cleanup():
result = []
i = iter(html)
line = ""
while i.__length_hint__() and not "INSTRUCTIONS" in line:
line = next(i)
if "<li" in line and "</li>" in line:
stripped = re.sub(r'[\n\t]*<[^<] ?>', '', line).rstrip()
quoted = f'"{stripped}"'
result = [quoted]
return ",\n".join(result)
The call to __length_hint__()
might slow your loop down as it tries to determine the length of a list. Using except StopIteration
as suggested by iter()
solutions can be way faster but complicates the code more:
def cleanup():
result = []
i = iter(html)
line = ""
try:
while not "INSTRUCTIONS" in line:
line = next(i)
stripped = re.sub(r'[\n\t]*<[^<] ?>', '', line).rstrip()
quoted = f'"{stripped}"'
if "<li" and "</li>" in line:
result.append(quoted)
except StopIteration:
pass
return ",\n".join(result)