Home > Software design >  Editing text between two strings from different lines
Editing text between two strings from different lines

Time:04-07

I want to edit atributte values from xml file and replace every "\n" to "(newline)\n". This is necesary because I need to keep structure of xml file while after parsing, it is standaraized and whitespaces are gone, so I want to, firstly, add string like "newline) before whitespaces so after parsing xml file, when it will be standarized and whitespaces will be gone, I will edit whole file and replace string "(newline)" back to single "\n".

Picture below shows what I want to do: This is how it should works

I tried to use regex to get text between "value=" and "/>" but I can get only values if it is in one line and I don't even know how to edit it after.

import re
with open("file", "r") as f:
    contents = f.readlines()
    for line in contents:
        result = re.search('value=(.*?) />', line)
        print(result)

There is my file:

<Module bs="Mainfile_1">
<object id="1000" name="namex" number="1">
    <item name="item0" value="100"/>
    <item name="item00" value="100
    
    100"/>
</object>
<object id="1001" name="namey" number="2">
    <item name="item1" value="100"/>
    <item name="item00" value="100"/>
</object>
<object id="1234" name="name1" number="3">
    <item name="item1" value="FAIL"/>
    <item name="item2" value="233"/>
    <item name="item3" value="233
    234
    246"/>
    <item name="item4" value="FAIL"/>
</object>
<object id="1238" name="name2" number="4">
    <item name="item8" value="FAIL"/>
    <item name="item9" value="233
    234
    
    245
    246
    267"/>
</object>
<object id="2345" name="name32" number="5">
    <item name="item1" value="111"/>
    <item name="item2" value="FAIL" />
</object>
<object id="2347" name="name4" number="6">
    <item name="item1" value="FAIL"/>
    <item name="item2" value="FAIL"/>
    <item name="item3" value="233"/>
    <item name="item4" value="FAIL"/>
</object>
</Module>

CodePudding user response:

You could use re.sub with a callback. The callback can then modify the matched string as you like. Here replace all \n with (newline)\n.

In your code you are iterating over the contents of the file line by line. This is why your regex can only match over one line. Use f.read() to get the full contents in one string.

Putting it all together:

import re
def newlinerepl(matchobj):
    return matchobj.group(0).replace("\n", "(newline)\n")

with open("file", "r") as f:
    contents = f.read()
    result = re.sub('value="([^"]*)"', newlinerepl, contents)
    print(result)
  • Related