How to split items in list?-CodePudding

I'm trying to scrape information from a website. I put the info into a list, but when I print the list, it looks something like this:

list =  ['text  \n\n  more text (1)  \n\n  even more text  \n\n']

As you can see, nothing is separated. I want the list to look something like this:

list = ['text','more text (1)', 'even more text']

I tried doing list = [i.split('\n\n') for i in list] but that didn't work. The result was :

list = [text  ','  more text (1)  ','  even more text]

How can I fix this?

Thank you in advance for taking the time to read my question and help in any way you can. I appreciate it

CodePudding user response：

Try this code maybe:

import re
list =  ['text  \n\n  more text (1)  \n\n  even more text  \n\n']
list[0] = list[0].replace('  \n\n  ', '#').replace('  \n\n', '#')
list = re.split('#',list[0])

if list[len(list) - 1] == '':
  list.pop(len(list) - 1)

print(list)

Output:

['text', 'more text (1)', 'even more text']

First we replace every instance of ' \n\n ' and ' \n\n' with '#'. This is because even though the elements are separated by ' \n\n ', the code ends without a space after it, so we need a unique separator for that instance.

Afterwards, we split the list by every instance of '#', and pop the final element if it was a black space caused by an ending ' \n\n ' or ' \n\n '.

I hope this helped! Please let me know if you need any further clarification or details :)

CodePudding user response：

Here is a way to do it. I first split each string of your list and then remove any trailing or leading space using the split method.

info = []
for i in liste:
    if i[-2:] == "\n\n":
        i = i[:-2]
    untrimmed = i.split("\n\n")
    trimmed = [j.strip() for j in untrimmed]
    info.append(trimmed)

The if statement permits to get rid of any empty string if your input ends with "\n\n".

CodePudding user response：

You're almost there... If you to the following you should be there:

the_list = ['text  \n\n  more text (1)  \n\n  even more text  \n\n']
final_list = list(filter(None, [i.strip() for i in the_list[0].split('\n\n')]))

The reason why it failed in my previous answer was that we defined the_list as a list of length 1. Secondly, I put the split in the wrong location.

I've also added the filter to "squeeze" an empty result at the end in case you want to remove those.

CodePudding user response：

list1 =  ['text  \n\n  more text (1)  \n\n  even more text  \n\n']
print(list1)
list1
joined = "".join(list1)
joined = joined.replace('\n\n',',')
words = [x.strip() for x in joined.split(',')]
print(words)
while("" in words) :
    words.remove("")
print(words)

CodePudding user response：

Please, try this:

list =  ['text  \n\n  more text (1)  \n\n  even more text  \n\n']
aux = lista[0].split('\n\n')
list_final = [e.strip() for e in aux]
list_final.remove('')

CodePudding user response：

I tried this code it's worked for me may be it helps you.

`lst = "text \n\n more text (1) \n\n even more text"

x=lst.split("\n\n")

print("list=",x)`