convert for loop to while (remove break <-- this is key)-CodePudding

The break here is bothering me; after extensive research I want to ask if there is a pythonic way to convert this to a while loop:

import re

file = open('parse.txt', 'r')
html = file.readlines()

    def cleanup():
    result = []
    for line in html:
        if "<li" and "</li>" in line:
            stripped = re.sub(r'[\n\t]*<[^<] ?>', '', line).rstrip()
            quoted = f'"{stripped}"'
            result.append(quoted)
        elif "INSTRUCTIONS" in line:
            break
    return ",\n".join(result)

I really am trying to practice designing more efficient loops.

added parse.txt

<p style="text-align:justify"><strong><span style="background-color:#ecf0f1">INGREDIENTS</span></strong></p>

    <li style="text-align:justify"><span style="background-color:#ecf0f1">3 lb ground beef (80/20)</span></li>
<ul>
    <li style="text-align:justify"><span style="background-color:#ecf0f1">1 large onion, chopped</span></li>
    <li style="text-align:justify"><span style="background-color:#ecf0f1">2-3 cloves garlic, minced</span></li>
    <li style="text-align:justify"><span style="background-color:#ecf0f1">2 jalapeño peppers, roasted, peeled, de-seeded, chopped</span></li>
    <li style="text-align:justify"><span style="background-color:#ecf0f1">4-5 roma tomatoes, roasted peeled, chopped</span></li>
    <li style="text-align:justify"><span style="background-color:#ecf0f1">1 15 oz can kidney beans, strained and washed</span></li>
    <li style="text-align:justify"><span style="background-color:#ecf0f1">2 tsp salt</span></li>
    <li style="text-align:justify"><span style="background-color:#ecf0f1">2 tsp black pepper</span></li>
    <li style="text-align:justify"><span style="background-color:#ecf0f1">2 tsp cumin</span></li>
    <li style="text-align:justify"><span style="background-color:#ecf0f1">¼ - ½ tsp cayenne pepper</span></li>
    <li style="text-align:justify"><span style="background-color:#ecf0f1">1 tsp garlic powder</span></li>
    <li style="text-align:justify"><span style="background-color:#ecf0f1">1 tsp Mexican oregano</span></li>
    <li style="text-align:justify"><span style="background-color:#ecf0f1">1 tsp paprika</span></li>
    <li style="text-align:justify"><span style="background-color:#ecf0f1">1 tsp smoked paprika</span></li>
    <li style="text-align:justify"><span style="background-color:#ecf0f1">3 cups chicken broth</span></li>
    <li style="text-align:justify"><span style="background-color:#ecf0f1">2 tbsp tomato paste</span></li>
</ul>

<p style="text-align:justify"><strong>INSTRUCTIONS</strong></p>

<ol>
    <li style="text-align:justify">Heat a large put or Dutch oven over medium-high heat and brown the beef, while stirring to break it up. Cook until no longer pink. Drain out the liquid.</li>
    <li style="text-align:justify">Stir in onions and cook for about 5 minutes until they are pale and soft. Add in minced garlic and jalapeño peppers, stirring for another minute.</li>
    <li style="text-align:justify">Stir in the chopped tomatoes, all the spices, and tomato paste until well-distributed and tomato paste has broken up, then follow with the broth. Allow the pot to come to a gentle boil over medium heat, uncovered for about 20 minutes.</li>
    <li style="text-align:justify">Reduce heat to low, cover and simmer for at least 3 hours, until liquid has reduced.</li>
    <li style="text-align:justify">During the last 20-30 minutes of cook time, add in the kidney beans; uncover and allow liquid to reduce further during this time.</li>
    <
    li style="text-align:justify">Serve hot with jalapeño cornbread muffins, shredded cheese, avocado chunks, chopped cilantro, chopped green onion, tortilla chips.</li>
</ol>

CodePudding user response：

We could move the break into another function, if you feel it improves the clarity of the top-level function. The core notion is that we've no need to lavish attention on a few million irrelevant lines that might follow an occurrence of the "INSTRUCTIONS" terminator.

Two observations on the topic of "laziness":

Notice that we compile the regex just once.
We invoke the regex only on the subset of lines where it's needed.

def up_through_instructions(filespec):
    with open(filespec) as f:
        for line in f:
            yield line
            if "INSTRUCTIONS" in line:
                break

def cleanup():
    pattern = re.compile("[\n\t]*<[^<] ?>")
    result = []
    for line in up_through_instructions("parse.txt"):
        if "<li" in line and "</li>" in line:
            stripped = pattern.sub("", line).rstrip()
            result.append(f'"{stripped}"')
    return ",\n".join(result)

Ok, fine, let's say for some crazy reason break is not allowed to be part of the design space.

We've a Turing machine at our disposal. It's not that hard to code it in an alternate way. But what matters is: "Would future engineers find the alternate approach easier to maintain?"

(Spoiler: my vote is "no!")

def alternate_up_through_instructions(filespec):
    with open(filespec) as f:
        done = False
        while not done:
            try:
                line = next(f)
                yield line
                done = "INSTRUCTIONS" in line
            except EOFError:
                done = True

I really do not recommend adopting this approach. The try is expensive, and we're doing it many many times. Not only is a for iteration the more natural pythonic way to approach this, but it is far more efficient.

Notice that there are two terminators we must respect: "INSTRUCTIONS" and EOF. One we test for, the other is detected via an exception.

CodePudding user response：

You can use the underlying iterator Python provides to turn your for-loop into a while-loop without a break:

def cleanup():
    result = []
    i = iter(html)
    line = ""
    while i.__length_hint__() and not "INSTRUCTIONS" in line:
        line = next(i)
        if "<li" in line and "</li>" in line:
          stripped = re.sub(r'[\n\t]*<[^<] ?>', '', line).rstrip()
          quoted = f'"{stripped}"'
          result  = [quoted]
    return ",\n".join(result)

The call to __length_hint__() might slow your loop down as it tries to determine the length of a list. Using except StopIteration as suggested by iter() solutions can be way faster but complicates the code more:

def cleanup():
    result = []
    i = iter(html)
    line = ""
    try:
      while not "INSTRUCTIONS" in line:
        line = next(i)
        stripped = re.sub(r'[\n\t]*<[^<] ?>', '', line).rstrip()
        quoted = f'"{stripped}"'
        if "<li" and "</li>" in line:
            result.append(quoted)
    except StopIteration:
      pass
    return ",\n".join(result)