Home > Software engineering >  Regex split the string at \n but skip the first one if it is \n\n
Regex split the string at \n but skip the first one if it is \n\n

Time:08-04

I want to split some strings on Python by separating at \n and use them in that format, but some of those strings have unexpected newlines and I want to ignore them.

TO CLARIFY: Both examples have only one string.

For example this is a regular string with no unexpected newlines:

Step 1
Cut peppers into strips.
Step 2
Heat a non-stick skillet over medium-high heat. Add peppers and cook on stove top for about 5 minutes.
Step 3
Toast the wheat bread and then spread hummus, flax seeds, and spinach on top
Step 4
Lastly add the peppers. Enjoy!

but some of them are like this:

Step 1
Using a fork, mash up the tuna really well until the consistency is even.

Step 2
Mix in the avocado until smooth.

Step 3
Add salt and pepper to taste. Enjoy!

I have to say I am new at regex and if the solution is obvious, please forgive

Edit: Here is my regex

    stepOrder = []
    # STEPS
    txtSteps = re.split("\n",directions.text)
    listOfLists = [[] for i in range(len(txtSteps)) if i % 2 == 0]
    for i in range(len(listOfLists)):
        listOfLists[i] = [txtSteps[i*2],txtSteps[i*2 1]]
    recipe["steps"] = listOfLists
    print(listOfLists)

directions.text is every one of these examples I gave. I can share what it is too, but I think it's irrelevant.

CodePudding user response:

f = open("your_file_name")
content = f.read()
f.close()

for line in content.split("\n"):
    if re.match("^&",line):
        continue
    print(line)

CodePudding user response:

You can solve this problem by splitting on the following regex:

(?<=\d\n).*

Basically it will get any character in the same line .* which is preceeded by one digit \d and one new line character \n.

Check the regex demo here.


Your whole Python snippet then becomes simplified as follows:

stepOrder = []
# STEPS
recipe["steps"] = re.findall("(?<=\d\n).*", directions.text)
  • Related