How to retain delimiter within list item python-CodePudding

I'm writing a program which jumbles clauses within a text using punctuation marks as delimiters for when to split the text.

At the moment my code has a large list where each item is a group of clauses.

import re
from random import shuffle
clause_split_content = []

text = ["this, is. a test?", "this: is; also. a test!"]

for i in text:
        clause_split = re.split('[,;:".?!]', i)
        clause_split.remove(clause_split[len(clause_split)-1])
        for x in range(0, len(clause_split)):
                clause_split_content.append(clause_split[x])
shuffle(clause_split_content)
print(*content, sep='')

at the moment the result jumbles the text without retaining the punctuation which is used as the delimiter to split it. The output would be something like this:

a test this also this is a test is

I want to retain the punctuation within the final output so it would look something like this:

a test! this, also. this: is. a test? is;

CodePudding user response：

Option 1: Shuffle words in each index and combine into sentence.

from random import shuffle

count = 0
sentence = ''
new_text = []
text = ["this, is. a test?", "this: is; also. a test!"]

while count < len(text):
    new_text.append(text[count].split())
    shuffle(new_text[count])
    count  = 1

for i in new_text:
    for j in i:
        sentence  = j   ' '

print(sentence)

Sample shuffled output:

test? this, a is. is; test! this: a also. 
test? a is. this, is; test! a this: also. 
is. test? a this, test! a this: also. is;

Option 2: Combine all elements in list into single element, then shuffle words and combine into a sentence.

import random
from random import shuffle

count = 0
sentence = ''
new_text = []
text_combined = []
text = ["this, is. a test?", "this: is; also. a test!"]

while count < len(text):
    new_text.append(text[count].split())
    count  = 1

for i in new_text:
    for j in i:
        text_combined.append(j)

shuffled_list = random.sample(text_combined, len(text_combined))        

for i in shuffled_list:
    sentence  = i   ' '
     
print(sentence)

Sample Ouput:

this, is; also. a this: is. a test? test! 
test! is. this: test? a this, a also. is; 
is. a a is; also. test! test? this, this:

CodePudding user response：

I think you are simply using the wrong function of re for your purpose. split() excludes your separator, but you can use another function e.g. findall() to manually select all words you want. For example with the following code I can create your desired output:

import re
from random import shuffle

clause_split_content = []

text = ["this, is. a test?", "this: is; also. a test!"]

for i in text:
    words_with_seperator = re.findall(r'([^,;:".?!]*[,;:".?!])\s?', i)
    clause_split_content.extend(words_with_seperator)
    
shuffle(clause_split_content)
print(*clause_split_content, sep=' ')

Output:

this, this: is. also. a test! a test? is;

The pattern ([^,;:".?!]*[,;:".?!])\s? simply takes all characters that are not a separator until a separator is seen. These characters are all in the matching group, which creates your result. The \s? is only to get rid of the space characters in between the words.

CodePudding user response：

Here's a way to do what you've asked:

import re
from random import shuffle
text = ["this, is. a test?", "this: is; also. a test!"]
content = [y for x in text for y in re.findall(r'([^,;:".?!]*[,;:".?!])', x)]
shuffle(content)
print(*content, sep=' ')

Output:

 is;  is.  also.  a test? this,  a test! this:

Explanation:

the regex pattern r'([^,;:".?!]*[,;:".?!])' matches 0 or more non-separator characters followed by a separator character, and findall() returns a list of all such non-overlapping matches
the list comprehension iterates over the input strings in list text and has an inner loop that iterates over the findall results for each input string, so that we create a single list of every matched pattern within every string.
shuffle and print are as in your original code.