Home > Back-end >  replace multiple text in large xml file
replace multiple text in large xml file

Time:07-15

I have below xml file where i want to replace multiple different text using sed or may be other command or using Python code.

I have 4GB xml file so performance should also be factor while replacing text.

For example replace text xmlns:leif="http://www.leif.org/concatenated-file/header-extension/2.0" xmlns:lei="http://www.leif.org/data/schema/leidata/2016" as empty

replace text lei: as empty

replace text leif: as empty

replace text xmlns:lei="http://www.leif.org/data/schema/leidata/2016" as empty

Can i do this in one sed command ?

Below is how xml file looks like:

<?xml version="1.0" encoding="UTF-8"?>
<lei:LEIData xmlns:leif="http://www.leif.org/concatenated-file/header-extension/2.0" xmlns:lei="http://www.leif.org/data/schema/leidata/2016">
<lei:LEIHeader>
<lei:ContentDate>2022-07-10T09:00:01Z</lei:ContentDate>
<lei:Originator>234234234234</lei:Originator>
<lei:FileContent>leif_FULL_PUBLISHED</lei:FileContent>
<lei:RecordCount>2166947</lei:RecordCount>
<lei:Extension>
<leif:Sources>
  <leif:Source>
    <leif:ContentDate>2022-07-09T11:01:36Z</leif:ContentDate>
    <leif:RecordCount>412</leif:RecordCount>
  </leif:Source>
  <leif:Source>
    <leif:ContentDate>2022-07-09T16:00:02Z</leif:ContentDate>
    <leif:RecordCount>3084</leif:RecordCount>
  </leif:Source>
</leif:Sources>
</lei:Extension>
</lei:LEIHeader>
<lei:LEIRecords>
<lei:LEIRecord xmlns:lei="http://www.leif.org/data/schema/leidata/2016">
  <lei:LEI>029200013A5N6ZD0F605</lei:LEI>
  <lei:Entity>
    <lei:LegalName xml:lang="en">AFRINVEST SECURITIES LIMITED</lei:LegalName>
    <lei:LegalAddress xml:lang="en">
      <lei:FirstAddressLine>27 GERRARD ROAD</lei:FirstAddressLine>
    </lei:LegalAddress>
    <lei:HeadquartersAddress xml:lang="en">
      <lei:FirstAddressLine>27 GERRARD ROAD</lei:FirstAddressLine>
    </lei:HeadquartersAddress>
    <lei:RegistrationAuthority>
      <lei:RegistrationAuthorityID>RA000469</lei:RegistrationAuthorityID>
    </lei:RegistrationAuthority>
    <lei:LegalJurisdiction>NG</lei:LegalJurisdiction>
    <lei:EntityCategory>GENERAL</lei:EntityCategory>
    <lei:LegalForm>
      <lei:EntityLegalFormCode>9999</lei:EntityLegalFormCode>
      <lei:OtherLegalForm>LIMITED</lei:OtherLegalForm>
    </lei:LegalForm>
    <lei:EntityStatus>ACTIVE</lei:EntityStatus>
    <lei:EntityCreationDate>2014-11-06T00:00:00Z</lei:EntityCreationDate>
  </lei:Entity>
  <lei:Registration>
    <lei:InitialRegistrationDate>2014-11-06T00:00:00Z</lei:InitialRegistrationDate>
    <lei:ValidationAuthority>
      <lei:ValidationAuthorityID>RA000469</lei:ValidationAuthorityID>
    </lei:ValidationAuthority>
  </lei:Registration>
</lei:LEIRecord>
</lei:LEIRecords>
</lei:LEIData>

CodePudding user response:

It's pretty easy to use rquery to handle this.

[ rquery]$ head myfile.txt | ./rq -q "select replace(replace(@raw,'leif:',''),'lei:','')"
<?xml version="1.0" encoding="UTF-8"?>
<LEIData xmlns:leif="http://www.leif.org/concatenated-file/header-extension/2.0" xmlns:lei="http://www.leif.org/data/schema/leidata/2016">
<LEIHeader>
<ContentDate>2022-07-10T09:00:01Z</ContentDate>
<Originator>234234234234</Originator>
<FileContent>leif_FULL_PUBLISHED</FileContent>
<RecordCount>2166947</RecordCount>
<Extension>
<Sources>
  <Source>

Download rquery from here. https://github.com/fuyuncat/rquery

CodePudding user response:

Using sed

$ sed -E 's/lei:|leif://g;s/ xmlns:lei=.*2016"| xmlns:leif=.*2016"//' input_file
<?xml version="1.0" encoding="UTF-8"?>
<LEIData>
<LEIHeader>
<ContentDate>2022-07-10T09:00:01Z</ContentDate>
<Originator>234234234234</Originator>
<FileContent>leif_FULL_PUBLISHED</FileContent>
<RecordCount>2166947</RecordCount>
<Extension>
<Sources>
  <Source>
    <ContentDate>2022-07-09T11:01:36Z</ContentDate>
    <RecordCount>412</RecordCount>
  </Source>
  <Source>
    <ContentDate>2022-07-09T16:00:02Z</ContentDate>
    <RecordCount>3084</RecordCount>
  </Source>
</Sources>
</Extension>
</LEIHeader>
<LEIRecords>
<LEIRecord>
  <LEI>029200013A5N6ZD0F605</LEI>
  <Entity>
    <LegalName xml:lang="en">AFRINVEST SECURITIES LIMITED</LegalName>
    <LegalAddress xml:lang="en">
      <FirstAddressLine>27 GERRARD ROAD</FirstAddressLine>
    </LegalAddress>
    <HeadquartersAddress xml:lang="en">
      <FirstAddressLine>27 GERRARD ROAD</FirstAddressLine>
    </HeadquartersAddress>
    <RegistrationAuthority>
      <RegistrationAuthorityID>RA000469</RegistrationAuthorityID>
    </RegistrationAuthority>
    <LegalJurisdiction>NG</LegalJurisdiction>
    <EntityCategory>GENERAL</EntityCategory>
    <LegalForm>
      <EntityLegalFormCode>9999</EntityLegalFormCode>
      <OtherLegalForm>LIMITED</OtherLegalForm>
    </LegalForm>
    <EntityStatus>ACTIVE</EntityStatus>
    <EntityCreationDate>2014-11-06T00:00:00Z</EntityCreationDate>
  </Entity>
  <Registration>
    <InitialRegistrationDate>2014-11-06T00:00:00Z</InitialRegistrationDate>
    <ValidationAuthority>
      <ValidationAuthorityID>RA000469</ValidationAuthorityID>
    </ValidationAuthority>
  </Registration>
</LEIRecord>
</LEIRecords>
</LEIData>
  • Related