How to create spaces between characters only for acronysms in Python-CodePudding

I am trying to add spaces between characters only for acronyms (all consecutive all-caps words) in Python.

INPUT:

"The PNUD, UN, UCALP and USA and U N."

DESIRED OUTPUT:

"The P N U D, U N, U C A L P and U S A and U N."

I have this solution so far, but I am looking for something more efficient/elegant:

import re
data = "The PNUD, UN, UCALP and USA and U N."
result = re.sub(r'(?=(?!^)[^[a-z]|\s |\W]*)', ' ', data)
result = re.sub(r'\s (\W)', '\g<1>', result)
print(result)

CodePudding user response：

I think the following regex is a lot more trivial solution for this problem

re.sub('([A-Z])(?=[A-Z])', '\\1 ', s)

I'm just using a positive lookahead and a backreference.

CodePudding user response：

Another solution re.sub with lambda function:

import re

data = "The PNUD, UN, UCALP and USA and U N."

result = re.sub(r"\b[A-Z] \b", lambda g: " ".join(g.group(0)), data)
print(result)

Prints:

The P N U D, U N, U C A L P and U S A and U N.

EDIT: Small benchmark

import re
from timeit import timeit


pat1 = re.compile(r"\b[A-Z] \b")
pat2 = re.compile(r"([A-Z])(?=[A-Z])")
pat3 = re.compile(r"[A-Z](?=[A-Z])")  # the same without capturing group

data = "The PNUD, UN, UCALP and USA and U N."


def fn1():
    return pat1.sub(lambda g: " ".join(g.group(0)), data)


def fn2():
    return pat2.sub(r"\g<1> ", data)


def fn3():
    return pat3.sub(r"\g<0> ", data)


t1 = timeit(fn1, number=10_000)
t2 = timeit(fn2, number=10_000)
t3 = timeit(fn3, number=10_000)

print(t1)
print(t2)
print(t3)

Prints:

0.05032820999622345
0.10462480317801237
0.10249458998441696

CodePudding user response：

You can use a single call to re.sub and match a single uppercase char and assert another one to the right.

In the replacement use the match followed by a space using \g<0>

[A-Z](?=[A-Z])

Regex demo

Example

result = re.sub('[A-Z](?=[A-Z])', r'\g<0> ', data)