Translate multi-command pipeline bash to python-CodePudding

This is what I have in my bash script:

example=$(echo $var | cut -c 40- | sed "/[a-zA-Z0-9]$/!d"). I am trying to translate it to python.

In Python, echo $var | cut -c 40- would give print(var[40]).

And I know that [a-zA-Z0-9] means any character in the range a-z, A-Z or 0-9. So does sed "/[a-zA-Z0-9]$/!d"delete every row that starts with [a-zA-Z0-9]?

CodePudding user response：

No, the d command in sed deletes any line which matches the address expression anywhere on the line. However, the regular expression in this case is anchored to the end of the line, by way of the $ anchor. So it deletes lines whose last character is one matched by the character class (any character in the ranges a-z, A-Z or 0-9).

Idiomatically, a better articulation of the cut would be to refactor it into the sed script; you generally want to minimize the number of subprocesses you create.

import re

example = "\n".join(line[40:] for line in var.splitlines()
    if len(line) > 40 and re.search(r'[a-zA-Z0-9]$', line))

You could avoid the regular expression by switching to line.endswith(...) but the ... will be quite a bit more verbose than this simple regex.

The proper way to print a variable includes quoting it: echo "$var". I am guessing the omission of the quotes is a mistake, and not intentional. In particular, if var contains multiple lines of text, the absence of quotes forces it all to be printed on a single line. There are other issues, too. See also When to wrap quotes around a shell variable

If the variable genuinely only contains one line of text, of course, you can avoid the splitlines() and the loop, and simply examine the variable directly.

The command substitution will trim any trailing newlines from the result, so the lack of a final newline in example is not a bug here. Still, if you want to print it to a text file, you'll want to add the final missing newline.