Home > Software design >  Remove extra space when punctuation appears in the string
Remove extra space when punctuation appears in the string

Time:09-21

I have a list of tokenised sentences, for example :

text = ['Selegiline',
 '-',
 'induced',
 'postural',
 'hypotension',
 'in',
 'Parkinson',
 "'",
 's',
 'disease',
 ':',
 'a',
 'longitudinal',
 'study',
 'on',
 'the',
 'effects',
 'of',
 'drug',
 'withdrawal',
 '.']

I want to convert this list into a string, but when punctuation such as - or : appear, I want to remove the extra space, so the final output would look something like this:

Selegiline-induced postural hypotension in Parkinson's disease: a longitudinal study on the effects of drug withdrawal

I tried splitting the list into equal chunks and checking if pair of two objects are words then using a single space; otherwise, no space:

def chunks(xs, n):
    n = max(1, n)
    return (xs[i:i n] for i in range(0, len(xs), n))
data_first = list(chunks(text, 2))

def check(data):
  second_order = []
  for words in data:
    if all(c.isalpha() for c in words[0]) and all(c.isalpha() for c in words[1]):
      second_order.append(" ".join(words))
    else:
      second_order.append("".join(words))
  return second_order

check(data_first)

But I have to iterate it until the last word (recursive solution). Is there a better way to do this?

CodePudding user response:

One option might be creating a dictionary of punctuation and the replacement string since each punctuation seems to follow different rules (a colon should retain the space after itself, where a dash should not).

Something like:

punctdict={' - ':'-',' : ':': '," ' ":"'"}
sentence=' '.join(text)
for k,v in punctdict.items():
    sentence = sentence.replace(k, v)

CodePudding user response:

text = ['Selegiline',
 '-',
 'induced',
 'postural',
 'hypotension',
 'in',
 'Parkinson',
 "'",
 's',
 'disease',
 ':',
 'a',
 'longitudinal',
 'study',
 'on',
 'the',
 'effects',
 'of',
 'drug',
 'withdrawal',
 '.']
 
def txt_join(txt):
     ans=""
     for s in txt:
         if(s==".") or (s==":"):
           ans=ans.strip() s " "
         elif s=="'" or (s=="-"):
            ans=ans.strip() s
         else:
            ans=ans s " "
             
     return ans

print(txt_join(text))

As I understood this will give you the expected result. In this algo. It normaly loop through text list and according to the punctuation it will add spaces.(According to the punctuation have to add if/elif/else conditions.)

CodePudding user response:

What you're looking for is list comprehension. you can read more about it here you could do a list comprehension and then use the replace module to replace space with no space kind of like you've done with append in your solution. You may find this solution useful. It uses .strip instead of replace. I would always avoid using for loops on lists as list comprehension is much less complex and faster. Also this is my first answer so sorry if it's a bit confusing.

  • Related