Multiline text splitting-CodePudding

Sooo, I have this problem in which I have to create a list of lists, that contain every word from each line that has a length greater then 4. The challenge is to solve this with a one-liner.

    text = '''My candle burns at both ends;
    It will not last the night;
But ah, my foes, and oh, my friends—
    It gives a lovely light!'''

So far I managed this res = [i for ele in text.splitlines() for i in ele.split(' ') if len(i) > 4] but it returns ['candle', 'burns', 'ends;', 'night;', 'foes,', 'friends—', 'gives', 'lovely', 'light!'] insetead of [['candle', 'burns', 'ends;'], ['night;'], ['foes,', 'friends—'], ['gives', 'lovely', 'light!']]

Any ideas? :D

CodePudding user response：

So in this case i would utilize some regular expressions to find your results. By doing a list comprehension as you did with a regular expression you end up automatically placing the matches into new lists.

This particular search pattern looks for any number or letter (both capital or not) in a recurrence of 4 or more times.

import re

text = '''My candle burns at both ends;
    It will not last the night;
But ah, my foes, and oh, my friends—
    It gives a lovely light!'''

results = [re.findall('\w{4,}', line) for line in text.split('\n')]
print(results)

Output:

[['candle', 'burns', 'both', 'ends'], ['will', 'last', 'night'], ['foes', 'friends'], ['gives', 'lovely', 'light']]

If you wish to keep the special characters you might want to look into expanding the regular expression so it includes all characters except whitespace.

There are great tools to play around with if you look for "online regular expression tools" so you get some more feedback when trying to build your own patterns.

CodePudding user response：

IIUC, this oneliner should work for you (without the use of additional packages):

[[w.strip(';,!—') for w in l.split() if len(w)>=4] for l in text.split('\n')]

Output:

[['candle', 'burns', 'both', 'ends'],
 ['will', 'last', 'night'],
 ['foes', 'friends'],
 ['gives', 'lovely', 'light']]