I want to split some strings on Python by separating at \n and use them in that format, but some of those strings have unexpected newlines and I want to ignore them.
TO CLARIFY: Both examples have only one string.
For example this is a regular string with no unexpected newlines:
Step 1
Cut peppers into strips.
Step 2
Heat a non-stick skillet over medium-high heat. Add peppers and cook on stove top for about 5 minutes.
Step 3
Toast the wheat bread and then spread hummus, flax seeds, and spinach on top
Step 4
Lastly add the peppers. Enjoy!
but some of them are like this:
Step 1
Using a fork, mash up the tuna really well until the consistency is even.
Step 2
Mix in the avocado until smooth.
Step 3
Add salt and pepper to taste. Enjoy!
I have to say I am new at regex and if the solution is obvious, please forgive
Edit: Here is my regex
stepOrder = []
# STEPS
txtSteps = re.split("\n",directions.text)
listOfLists = [[] for i in range(len(txtSteps)) if i % 2 == 0]
for i in range(len(listOfLists)):
listOfLists[i] = [txtSteps[i*2],txtSteps[i*2 1]]
recipe["steps"] = listOfLists
print(listOfLists)
directions.text is every one of these examples I gave. I can share what it is too, but I think it's irrelevant.
CodePudding user response:
f = open("your_file_name")
content = f.read()
f.close()
for line in content.split("\n"):
if re.match("^&",line):
continue
print(line)
CodePudding user response:
You can solve this problem by splitting on the following regex:
(?<=\d\n).*
Basically it will get any character in the same line .*
which is preceeded by one digit \d
and one new line character \n
.
Check the regex demo here.
Your whole Python snippet then becomes simplified as follows:
stepOrder = []
# STEPS
recipe["steps"] = re.findall("(?<=\d\n).*", directions.text)