Python regex pattern in order to find if a code line is finishing with a space or tab character-CodePudding

Sorry for putting such a low level question but I really tried to look for the answer before coming here... Basically I have a script which is searching inside .py files and reads line by line there code -> the object of the script is to find if a line is finishing with a space or a tab as in the below example

i = 5 
z = 25

Basically afte r the i variable we should have a \s and after z variable a \t . ( i hope the code format will not erase it)

def custom_checks(file, rule):
    """
    @param file: file: file in-which you search for a specific character
    @param rule: the specific character you search for
    @return: dict obj with the form { line number : character }
    """
    rule=re.escape(rule)
    logging.info(f"     File {os.path.abspath(file)} checked for {repr(rule)} inside it ")
    result_dict = {}

    file = fileinput.input([file])
    for idx, line in enumerate(file):
        if re.search(rule, line):
            result_dict[idx   1] = str(rule)

    file.close()
    if not len(result_dict):
        logging.info("Zero non-compliance found based on the rule:2 consecutive empty rows")
    else:
        logging.warning(f'Found the next errors:{result_dict}')

After that if i will check the logging output i will see this: checked for '\ s\\s\$' inside it i dont know why the \ are double Also basically i get all the regex from a config.json which is this one:

{
  "ends with tab":" \\t$",
  "ends with space":" s\\s$"

}

Could some one help me please in this direction-> I basically know that I may do in other ways such as reverse the line [::-1] get the first character and see if its \s etc but i really wanna do it with regex. Thanks!

CodePudding user response：

Try:

rules = {
  'ends with tab': re.compile(r'\t$'),
  'ends with space': re.compile(r' $'),
}

Note: while getting lines from iterating the file will leave newline ('\n') at the end of each string, $ in a regex matches the position before the first newline in the string. Thus, if using regex, you don't need to explicitly strip newlines.

if rule.search(line):
    ...

Personally, however, I would use line.rstrip() != line.rstrip('\n') to flag trailing spaces of any kind in one shot.

If you want to directly check for specific characters at the end of the line, you then need to strip any newline, and you need to check if the line isn't empty. For example:

char = '\t'
s = line.strip('\n')

if s and s[-1] == char:
    ...

Addendum 1: read rules from JSON config

# here from a string, but could be in a file, of course
json_config = """
{
    "ends with tab": "\\t$",
    "ends with space": " $"
}
"""

rules = {k: re.compile(v) for k, v in json.loads(json_config).items()}

Addendum 2: comments

The following shows how to comment out a rule, as well as a rule to detect comments in the file to process. Since JSON doesn't support comments, we can consider yaml instead:

yaml_config = """
ends with space: ' $'
ends with tab: \\t$
is comment: ^\\s*#
# ignore: 'foo'
"""

import yaml

rules = {k: re.compile(v) for k, v in yaml.safe_load(yaml_config).items()}

Note: 'is comment' is easy. A hypothetical 'has comment' is much harder to define -- why? I'll leave that as an exercise for the reader ;-)

Note 2: in a file, the yaml config would be without double backslash, e.g.:

cat > config.yml << EOF
ends with space: ' $'
ends with tab: \t$
is comment: ^\s*#
# ignore: 'foo'
EOF

Additional thought

You may want to give autopep8 a try.

Example:

cat > foo.py << EOF
# this is a comment   

text = """
# xyz  
bar  
"""
def foo(): 
    # to be continued  
    pass 

def bar():
  pass     

 
  
EOF

Note: to reveal the extra spaces:

cat foo.py | perl -pe 's/$/|/'
# this is a comment   |
|
text = """|
# xyz  |
bar  |
"""|
def foo(): |
    # to be continued  |
    pass |
|
def bar():|
  pass     |
|
 |
  |

There are several PEP8 issues with the above (extra spaces at end of lines, only 1 line between the functions, etc.). Autopep8 fixes them all (but correctly leaves the text variable unchanged):

autopep8 foo.py | perl -pe 's/$/|/'
# this is a comment|
|
text = """|
# xyz  |
bar  |
"""|
|
|
def foo():|
    # to be continued|
    pass|
|
|
def bar():|
    pass|