Home > database >  Sort a file with a specific line pattern in Python
Sort a file with a specific line pattern in Python

Time:06-20

Given a file with the following content:

enum class Fruits(id: String) {
   BANANA(id = "banana"),
   LEMON(id = "lemon"),
   DRAGON_FRUIT(id = "dragonFruit"),
   APPLE(id = "apple"); }

I want to sort this file given the pattern "id = ", and then replace these lines with the new sorted lines.

I wrote a piece of code in python that sorts the whole file, but I'm struggling with regex to read/find the pattern so I can sort it.

My python script:

import re

fruitsFile = '/home/genericpath/Fruits.txt'

def sortFruitIds():

# this is an attempt to get/find the pattern, but it return an AttributeError: 
# 'NoneType' object has no attribute 'group'

    with open(fruitsFile, "r ") as f:
        lines = sorted(f, key=lambda line: str(re.search(r"(?<=id = )\s ", line)))
        for line in lines:
            f.write(line)

When trying to find the pattern with regex, it returns an AttributeError: 'NoneType' object has no attribute 'group'

Any help is appreciated.

CodePudding user response:

Looks like your main issue is that your regex expects a space character \s but what you want to be looking for is any non-space character \S. With that in mind this should work:

import re

fruitsFile = 'Fruits.txt'

def sortFruitIds():

    with open(fruitsFile, "r ") as f:
        lines = f.readlines()
        lines_sorted = sorted(lines, key=lambda line: re.search(r"(?<=id = \")\S |$", line).group())
        for line in lines_sorted:
            f.write(line)

I also added |$ to the regex to return an empty string if there is no match, and added group() to grab the match.

CodePudding user response:

We can approach this by doing a regex find all for all entries in the enum. Then sort them alphabetically by the id string value, and join together the final enum code. Note that below I also extract the first line of the enum for use later in the output.

inp = '''enum class Fruits(id: String) {
   BANANA(id = "banana"),
   LEMON(id = "lemon"),
   DRAGON_FRUIT(id = "dragonFruit"),
   APPLE(id = "apple"); }'''
header = re.search(r'enum.*?\{', inp).group()
items = re.findall(r'\w \(id\s*=\s*".*?"\)', inp)
items.sort(key=lambda m: re.search(r'"(.*?)"', m).group(1))
output = header   '\n    '   ',\n    '.join(items)   '; }'
print(output)

This prints:

enum class Fruits(id: String) {
    APPLE(id = "apple"),
    BANANA(id = "banana"),
    DRAGON_FRUIT(id = "dragonFruit"),
    LEMON(id = "lemon"); }
  • Related