I have below xml file where i want to replace multiple different text using sed
or may be other command or using Python code.
I have 4GB xml file so performance should also be factor while replacing text.
For example replace text xmlns:leif="http://www.leif.org/concatenated-file/header-extension/2.0" xmlns:lei="http://www.leif.org/data/schema/leidata/2016"
as empty
replace text lei:
as empty
replace text leif:
as empty
replace text xmlns:lei="http://www.leif.org/data/schema/leidata/2016"
as empty
Can i do this in one sed
command ?
Below is how xml file looks like:
<?xml version="1.0" encoding="UTF-8"?>
<lei:LEIData xmlns:leif="http://www.leif.org/concatenated-file/header-extension/2.0" xmlns:lei="http://www.leif.org/data/schema/leidata/2016">
<lei:LEIHeader>
<lei:ContentDate>2022-07-10T09:00:01Z</lei:ContentDate>
<lei:Originator>234234234234</lei:Originator>
<lei:FileContent>leif_FULL_PUBLISHED</lei:FileContent>
<lei:RecordCount>2166947</lei:RecordCount>
<lei:Extension>
<leif:Sources>
<leif:Source>
<leif:ContentDate>2022-07-09T11:01:36Z</leif:ContentDate>
<leif:RecordCount>412</leif:RecordCount>
</leif:Source>
<leif:Source>
<leif:ContentDate>2022-07-09T16:00:02Z</leif:ContentDate>
<leif:RecordCount>3084</leif:RecordCount>
</leif:Source>
</leif:Sources>
</lei:Extension>
</lei:LEIHeader>
<lei:LEIRecords>
<lei:LEIRecord xmlns:lei="http://www.leif.org/data/schema/leidata/2016">
<lei:LEI>029200013A5N6ZD0F605</lei:LEI>
<lei:Entity>
<lei:LegalName xml:lang="en">AFRINVEST SECURITIES LIMITED</lei:LegalName>
<lei:LegalAddress xml:lang="en">
<lei:FirstAddressLine>27 GERRARD ROAD</lei:FirstAddressLine>
</lei:LegalAddress>
<lei:HeadquartersAddress xml:lang="en">
<lei:FirstAddressLine>27 GERRARD ROAD</lei:FirstAddressLine>
</lei:HeadquartersAddress>
<lei:RegistrationAuthority>
<lei:RegistrationAuthorityID>RA000469</lei:RegistrationAuthorityID>
</lei:RegistrationAuthority>
<lei:LegalJurisdiction>NG</lei:LegalJurisdiction>
<lei:EntityCategory>GENERAL</lei:EntityCategory>
<lei:LegalForm>
<lei:EntityLegalFormCode>9999</lei:EntityLegalFormCode>
<lei:OtherLegalForm>LIMITED</lei:OtherLegalForm>
</lei:LegalForm>
<lei:EntityStatus>ACTIVE</lei:EntityStatus>
<lei:EntityCreationDate>2014-11-06T00:00:00Z</lei:EntityCreationDate>
</lei:Entity>
<lei:Registration>
<lei:InitialRegistrationDate>2014-11-06T00:00:00Z</lei:InitialRegistrationDate>
<lei:ValidationAuthority>
<lei:ValidationAuthorityID>RA000469</lei:ValidationAuthorityID>
</lei:ValidationAuthority>
</lei:Registration>
</lei:LEIRecord>
</lei:LEIRecords>
</lei:LEIData>
CodePudding user response:
It's pretty easy to use rquery to handle this.
[ rquery]$ head myfile.txt | ./rq -q "select replace(replace(@raw,'leif:',''),'lei:','')"
<?xml version="1.0" encoding="UTF-8"?>
<LEIData xmlns:leif="http://www.leif.org/concatenated-file/header-extension/2.0" xmlns:lei="http://www.leif.org/data/schema/leidata/2016">
<LEIHeader>
<ContentDate>2022-07-10T09:00:01Z</ContentDate>
<Originator>234234234234</Originator>
<FileContent>leif_FULL_PUBLISHED</FileContent>
<RecordCount>2166947</RecordCount>
<Extension>
<Sources>
<Source>
Download rquery from here. https://github.com/fuyuncat/rquery
CodePudding user response:
Using sed
$ sed -E 's/lei:|leif://g;s/ xmlns:lei=.*2016"| xmlns:leif=.*2016"//' input_file
<?xml version="1.0" encoding="UTF-8"?>
<LEIData>
<LEIHeader>
<ContentDate>2022-07-10T09:00:01Z</ContentDate>
<Originator>234234234234</Originator>
<FileContent>leif_FULL_PUBLISHED</FileContent>
<RecordCount>2166947</RecordCount>
<Extension>
<Sources>
<Source>
<ContentDate>2022-07-09T11:01:36Z</ContentDate>
<RecordCount>412</RecordCount>
</Source>
<Source>
<ContentDate>2022-07-09T16:00:02Z</ContentDate>
<RecordCount>3084</RecordCount>
</Source>
</Sources>
</Extension>
</LEIHeader>
<LEIRecords>
<LEIRecord>
<LEI>029200013A5N6ZD0F605</LEI>
<Entity>
<LegalName xml:lang="en">AFRINVEST SECURITIES LIMITED</LegalName>
<LegalAddress xml:lang="en">
<FirstAddressLine>27 GERRARD ROAD</FirstAddressLine>
</LegalAddress>
<HeadquartersAddress xml:lang="en">
<FirstAddressLine>27 GERRARD ROAD</FirstAddressLine>
</HeadquartersAddress>
<RegistrationAuthority>
<RegistrationAuthorityID>RA000469</RegistrationAuthorityID>
</RegistrationAuthority>
<LegalJurisdiction>NG</LegalJurisdiction>
<EntityCategory>GENERAL</EntityCategory>
<LegalForm>
<EntityLegalFormCode>9999</EntityLegalFormCode>
<OtherLegalForm>LIMITED</OtherLegalForm>
</LegalForm>
<EntityStatus>ACTIVE</EntityStatus>
<EntityCreationDate>2014-11-06T00:00:00Z</EntityCreationDate>
</Entity>
<Registration>
<InitialRegistrationDate>2014-11-06T00:00:00Z</InitialRegistrationDate>
<ValidationAuthority>
<ValidationAuthorityID>RA000469</ValidationAuthorityID>
</ValidationAuthority>
</Registration>
</LEIRecord>
</LEIRecords>
</LEIData>