I have written a Python function for some post processing in my text recognition algorithm. It works fine, but it seems to be kind of lengthy and has many if
-else
-conditions. I am looking for a way to simplify my code:
def postProcessing(new_s): #new_s is a list
import string
new_s=removeFrontLetters(new_s) #This function has to be called first
if all(ch in string.digits for ch in new_s[:4]): #checking if first 4 elements are digits.
if all(ch in string.ascii_letters for ch in new_s[4:5]): #checking if the [4] and [5] are letters
if len(new_s)>=7:
if new_s[6]=='C':
new_s=new_s[:7] #if length>=7 and the [6] =='C' the reversed of the first 7 elements has to be returned.
new_s=list(reversed(new_s))
return(new_s)
else:
new_s=new_s[:6] #if length>=7 but the [6] =!'C' the reversed of the first 6 elements has to be returned.
new_s=list(reversed(new_s))
return(new_s)
else:
new_s=list(reversed(new_s)) #if length<7, the reversed of the given list has to be returned.
return(new_s)
else:
print('not valid') #if the [4] and [5] are not letters, it is not valid
else:
print('not valid') #if the [:4] are not digits, it is not valid
This seems very beginner-level and lengthy. I am a beginner, but I am trying to improve my function. Do you have suggestions?
CodePudding user response:
You can invert your if statements and use early returns to reduce the indentation of your code.
def postProcessing(new_s): # new_s is a list
import string
new_s = removeFrontLetters(new_s) # This function has to be called first
if not all(ch in string.digits for ch in new_s[:4]): # checking if first 4 elements are digits.
raise ValueError("First four elements must be digits")
if not all(ch in string.ascii_letters for ch in new_s[4:5]): # checking if the [4] and [5] are letters
raise ValueError("First elements 4 and 5 must be digits")
if len(new_s) <= 7:
new_s = list(reversed(new_s)) # if length<7, the reversed of the given list has to be returned.
return (new_s)
if new_s[6] == 'C':
new_s = new_s[
:7] # if length>=7 and the [6] =='C' the reversed of the first 7 elements has to be returned.
new_s = list(reversed(new_s))
return (new_s)
new_s = new_s[:6] # if length>=7 but the [6] =!'C' the reversed of the first 6 elements has to be returned.
new_s = list(reversed(new_s))
return (new_s)
CodePudding user response:
It's quite neat you ask 'the world' for advise. Did you know there's a dedicated stackexchange site for this? https://codereview.stackexchange.com/.
Unless you insist writing Python code for this, it seems that you need a regular expression here.
So some tips:
- use regex for pattern matching
- use variables to express what an expression means
- use exceptions instead of an 'invalid value' string
- separate the 'parsing' from the processing to keep functions small and focused
- use doctest to document and test small functions
def post_process(new_s):
"""
reverse a string (with a twist)
what follows is a doctest. You can run it with
$ python -m doctest my_python_file.py
>>> post_process('1234abCdef')
'Cba4321'
>>> post_process('1234abdef')
'ba4321'
"""
cmds = {
'C': cmd_c,
'': cmd_none
}
command, content = command_and_content(new_s)
process = cmds[command]
return process(content)
def cmd_c(content):
return 'C' "".join(reversed(content))
def cmd_none(content):
return "".join(reversed(content))
The command_and_content
function replaces the parsing logic:
def command_and_content(new_s):
# get help on https://regex101.com/ to find the
# regular expression for four digits and two letters
digits_then_ascii = re.compile(r"(\d{4}[a-z]{2})(C?)(.*)")
if match := digits_then_ascii.match(new_s):
content = match.group(1)
command = match.group(2)
return command, content
# pylint will ask you to not use an else clause after a return
# Also, Python advises exceptions for notifying erroneous input
raise ValueError(new_s)
CodePudding user response:
From the context you provided, I assume that all this processing can happen in-place (i.e. without the need to allocate additional memory). The benefit of lists is that they are mutable, so you can actually do all your operations in-place.
This adheres to the style conventions (PEP 8) and uses correct type annotations (PEP 484):
from string import digits, ascii_letters
def remove_front_letters(new_s: list[str]) -> None:
...
raise NotImplementedError
def post_processing(new_s: list[str]) -> None:
remove_front_letters(new_s)
if any(ch not in digits for ch in new_s[:4]):
raise ValueError("The first 4 characters must be digits")
if any(ch not in ascii_letters for ch in new_s[4:6]):
raise ValueError("The 5th and 6th characters must be letters")
if len(new_s) >= 7:
if new_s[6] == 'C':
del new_s[7:]
else:
del new_s[6:]
new_s.reverse()
If you do want a new list, you can just call this function with a .copy()
of your input list.
References: list methods; del
statement
PS: If you use Python version 3.8 or lower, instead of list[str]
you'll need to use typing.List[str]
.
Also someone mentioned the possibility of replacing the iteration via all()
(or any()
) with a "".join(...).isdigit()
for example. While this is certainly also correct and technically less code, I am not sure it is necessarily more readable. More importantly it creates a new string in the process, which I don't think is necessary.
By the way, you could even reduce that conditional deletion of list elements to a one liner like this:
...
if len(new_s) >= 7:
del new_s[7 if new_s[6] == 'C' else 6:]
new_s.reverse()
But I would argue that this is worse because it is less readable. Personal preference I guess.