There is an excellent discussion here Truncate a string without ending in the middle of a word on how to do a 'smart' string truncation in python. But the problem with the solutions proposed there is that if the width limit falls within a word, then this word is thrown off completely.
How can I truncate a string in python setting a 'soft' width limit, i.e. if the limit falls in the middle of the word, then this word is kept?
Example:
str = "it's always sunny in philadelphia"
trunc(str, 7)
>>> it's always...
My initial thinking is to slice the string up to the soft limit and then start checking every next character, adding it to the slice until I encounter a whitespace character. But this seems extremely inefficient.
CodePudding user response:
How about:
def trunc(ipt, length, suffix='...'):
if " " in ipt[length-1: length]:
# The given length puts us on a word boundary
return ipt[:length].rstrip(' ') suffix
# Otherwise add the "tail" of the input, up to just before the first space it contains
return ipt[:length] ipt[length:].partition(" ")[0] suffix
s = "it's always sunny in philadelphia" # Best to avoid 'str' as a variable name, it's a builtin
for n in (1, 4, 5, 6, 7, 12, 13):
print(f"{n}: {trunc(s, n)}")
which outputs:
1: it's...
4: it's...
5: it's...
6: it's always...
7: it's always...
12: it's always...
13: it's always sunny...
Note the behaviour of the 5 and 12 cases: this code assumes that you want to eliminate the space that would appear before the "...".
CodePudding user response:
Somehow I missed the answer provided in the linked post here by Markus Jarderot
def smart_truncate2(text, min_length=100, suffix='...'):
"""If the `text` is more than `min_length` characters long,
it will be cut at the next word-boundary and `suffix`will
be appended.
"""
pattern = r'^(.{%d,}?\S)\s.*' % (min_length-1)
return re.sub(pattern, r'\1' suffix, text)
It runs for
3.49 µs ± 25.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
@slothrop's solution runs for:
897 ns ± 3.27 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
which is quite faster