Home > OS >  regular expressions in python to process multiple strings
regular expressions in python to process multiple strings

Time:10-05

there is a big file for translation. It has elements that need to be translated. They are labeled "msgid". the desired text is enclosed between "msgid" and "msgstr". The problem is that I don't know how to parse text where the desired content is on multiple lines.

I need to parse so that all spaces and indents of the text are preserved. Next, I want to enter them into a dictionary for output to a table.

I tried to do it manually - it didn't work. Now I think I need regular expressions, but don't know how to do it Help, please

#: superset-frontend/src/explore/components/controls/DndColumnSelectControl/Option.tsx:68
#: superset-frontend/src/explore/components/controls/OptionControls/index.tsx:323
msgid ""
"\n"
"                This filter was inherited from the dashboard's context.\n"
"                It won't be saved when saving the chart.\n"
"              "
msgstr ""
"\n"
"                Фильтр был наследован из контекста дашборда.\n"
"                Это не будет сохранено при сохранении графика.\n"
"              "

#: superset/tasks/schedules.py:184
#, python-format
msgid ""
"\n"
"            <b><a href=\"%(url)s\">Explore in Superset</a></b><p></p>\n"
"            <img src=\"cid:%(msgid)s\">\n"
"            "
msgstr ""
"\n"
"            <b><a href=“%(url)s”>Исследовать в Superset</a></b><p></p>\n"
"            <img src=“cid:%(msgid)s”>\n"
"            "

#: superset/reports/notifications/email.py:60

here is my python code

def Parse(file : io.TextIOWrapper):
    text = ""
    for lines in file:
        if lines.startswith("msgid"):
            text  = f" {lines.strip()}"
        elif lines.startswith("msgstr"):
            text  = f" {lines.strip()}"
        else:
            text  = lines.strip()
    text.split()

    sourcetxt = {}

    for index, word in enumerate(text):
        if word = "msgid":

CodePudding user response:

Consider:

pairs = []

for line in ......
    line = line.strip()
    if not line or line.startswith('#'):
        continue
    if line.startswith('msgid'):
        pairs.append([[], None])
    elif line.startswith('msgstr'):
        pairs[-1][1] = []
    elif pairs[-1][1] is None:
        pairs[-1][0].append(line)
    else:
        pairs[-1][1].append(line)

This creates a list of [msgid, msgstr] pairs, where each part is a list of lines. In this form, it's easy to convert it to whatever is desired.

  • Related