Home > front end >  How to truncate a string in python including a truncated word in the result?
How to truncate a string in python including a truncated word in the result?

Time:07-06

There is an excellent discussion here Truncate a string without ending in the middle of a word on how to do a 'smart' string truncation in python. But the problem with the solutions proposed there is that if the width limit falls within a word, then this word is thrown off completely.

How can I truncate a string in python setting a 'soft' width limit, i.e. if the limit falls in the middle of the word, then this word is kept?

Example:

str = "it's always sunny in philadelphia"
trunc(str, 7)
>>> it's always...

My initial thinking is to slice the string up to the soft limit and then start checking every next character, adding it to the slice until I encounter a whitespace character. But this seems extremely inefficient.

CodePudding user response:

How about:

def trunc(ipt, length, suffix='...'):
  if " " in ipt[length-1: length]:
    # The given length puts us on a word boundary
    return ipt[:length].rstrip(' ')   suffix

  # Otherwise add the "tail" of the input, up to just before the first space it contains
  return ipt[:length]   ipt[length:].partition(" ")[0]   suffix

s = "it's always sunny in philadelphia"  # Best to avoid 'str' as a variable name, it's a builtin
for n in (1, 4, 5, 6, 7, 12, 13):
  print(f"{n}: {trunc(s, n)}")

which outputs:

1: it's...
4: it's...
5: it's...
6: it's always...
7: it's always...
12: it's always...
13: it's always sunny...

Note the behaviour of the 5 and 12 cases: this code assumes that you want to eliminate the space that would appear before the "...".

CodePudding user response:

Somehow I missed the answer provided in the linked post here by Markus Jarderot

def smart_truncate2(text, min_length=100, suffix='...'):
    """If the `text` is more than `min_length` characters long,
    it will be cut at the next word-boundary and `suffix`will
    be appended.
    """
    pattern = r'^(.{%d,}?\S)\s.*' % (min_length-1)
    return re.sub(pattern, r'\1'   suffix, text)

It runs for

3.49 µs ± 25.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

@slothrop's solution runs for:

897 ns ± 3.27 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

which is quite faster

  • Related