Home > Enterprise >  How to split items in list?
How to split items in list?

Time:02-25

I'm trying to scrape information from a website. I put the info into a list, but when I print the list, it looks something like this:

list =  ['text  \n\n  more text (1)  \n\n  even more text  \n\n']

As you can see, nothing is separated. I want the list to look something like this:

list = ['text','more text (1)', 'even more text']

I tried doing list = [i.split('\n\n') for i in list] but that didn't work. The result was :

list = [text  ','  more text (1)  ','  even more text]

How can I fix this?

Thank you in advance for taking the time to read my question and help in any way you can. I appreciate it

CodePudding user response:

Try this code maybe:

import re
list =  ['text  \n\n  more text (1)  \n\n  even more text  \n\n']
list[0] = list[0].replace('  \n\n  ', '#').replace('  \n\n', '#')
list = re.split('#',list[0])

if list[len(list) - 1] == '':
  list.pop(len(list) - 1)

print(list)

Output:

['text', 'more text (1)', 'even more text']

First we replace every instance of ' \n\n ' and ' \n\n' with '#'. This is because even though the elements are separated by ' \n\n ', the code ends without a space after it, so we need a unique separator for that instance.

Afterwards, we split the list by every instance of '#', and pop the final element if it was a black space caused by an ending ' \n\n ' or ' \n\n '.

I hope this helped! Please let me know if you need any further clarification or details :)

CodePudding user response:

Here is a way to do it. I first split each string of your list and then remove any trailing or leading space using the split method.

info = []
for i in liste:
    if i[-2:] == "\n\n":
        i = i[:-2]
    untrimmed = i.split("\n\n")
    trimmed = [j.strip() for j in untrimmed]
    info.append(trimmed)

The if statement permits to get rid of any empty string if your input ends with "\n\n".

CodePudding user response:

You're almost there... If you to the following you should be there:

the_list = ['text  \n\n  more text (1)  \n\n  even more text  \n\n']
final_list = list(filter(None, [i.strip() for i in the_list[0].split('\n\n')]))

The reason why it failed in my previous answer was that we defined the_list as a list of length 1. Secondly, I put the split in the wrong location.

I've also added the filter to "squeeze" an empty result at the end in case you want to remove those.

CodePudding user response:

list1 =  ['text  \n\n  more text (1)  \n\n  even more text  \n\n']
print(list1)
list1
joined = "".join(list1)
joined = joined.replace('\n\n',',')
words = [x.strip() for x in joined.split(',')]
print(words)
while("" in words) :
    words.remove("")
print(words)

CodePudding user response:

Please, try this:

list =  ['text  \n\n  more text (1)  \n\n  even more text  \n\n']
aux = lista[0].split('\n\n')
list_final = [e.strip() for e in aux]
list_final.remove('')

CodePudding user response:

I tried this code it's worked for me may be it helps you.

`lst = "text \n\n more text (1) \n\n even more text"

x=lst.split("\n\n")

print("list=",x)`

  • Related