I'm trying to scrape information from a website. I put the info into a list, but when I print the list, it looks something like this:
list = ['text \n\n more text (1) \n\n even more text \n\n']
As you can see, nothing is separated. I want the list to look something like this:
list = ['text','more text (1)', 'even more text']
I tried doing list = [i.split('\n\n') for i in list]
but that didn't work. The result was :
list = [text ',' more text (1) ',' even more text]
How can I fix this?
Thank you in advance for taking the time to read my question and help in any way you can. I appreciate it
CodePudding user response:
Try this code maybe:
import re
list = ['text \n\n more text (1) \n\n even more text \n\n']
list[0] = list[0].replace(' \n\n ', '#').replace(' \n\n', '#')
list = re.split('#',list[0])
if list[len(list) - 1] == '':
list.pop(len(list) - 1)
print(list)
Output:
['text', 'more text (1)', 'even more text']
First we replace every instance of ' \n\n '
and ' \n\n'
with '#'
. This is because even though the elements are separated by ' \n\n '
, the code ends without a space after it, so we need a unique separator for that instance.
Afterwards, we split the list by every instance of '#'
, and pop the final element if it was a black space caused by an ending ' \n\n '
or ' \n\n '
.
I hope this helped! Please let me know if you need any further clarification or details :)
CodePudding user response:
Here is a way to do it. I first split each string of your list and then remove any trailing or leading space using the split
method.
info = []
for i in liste:
if i[-2:] == "\n\n":
i = i[:-2]
untrimmed = i.split("\n\n")
trimmed = [j.strip() for j in untrimmed]
info.append(trimmed)
The if
statement permits to get rid of any empty string if your input ends with "\n\n"
.
CodePudding user response:
You're almost there... If you to the following you should be there:
the_list = ['text \n\n more text (1) \n\n even more text \n\n']
final_list = list(filter(None, [i.strip() for i in the_list[0].split('\n\n')]))
The reason why it failed in my previous answer was that we defined the_list
as a list of length 1. Secondly, I put the split in the wrong location.
I've also added the filter to "squeeze" an empty result at the end in case you want to remove those.
CodePudding user response:
list1 = ['text \n\n more text (1) \n\n even more text \n\n']
print(list1)
list1
joined = "".join(list1)
joined = joined.replace('\n\n',',')
words = [x.strip() for x in joined.split(',')]
print(words)
while("" in words) :
words.remove("")
print(words)
CodePudding user response:
Please, try this:
list = ['text \n\n more text (1) \n\n even more text \n\n']
aux = lista[0].split('\n\n')
list_final = [e.strip() for e in aux]
list_final.remove('')
CodePudding user response:
I tried this code it's worked for me may be it helps you.
`lst = "text \n\n more text (1) \n\n even more text"
x=lst.split("\n\n")
print("list=",x)`