Is there a way to remove all words that are before code=" and after "> in my file so I'm left with clearsky_night or cloudy, or sun etc?
I have tried grep -o -P '(?<=>).*(?=>)' but get an error message sating unknown option to 's'
I also tried grep -o -P '(?<=code=").*(?=" )' but that didn't work either. This is what's in my file:
<symbol id="Sun" number="1" code="clearsky_night"></symbol>
<symbol id="Sun" number="1" code="clearsky_night"></symbol>
<symbol id="Sun" number="1" code="clearsky_night"></symbol>
<symbol id="Cloud" number="4" code="cloudy"></symbol>
<symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
<symbol id="Cloud" number="4" code="cloudy"></symbol>
<symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
<symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
<symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
<symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
<symbol id="Cloud" number="4" code="cloudy"></symbol>
<symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
<symbol id="Cloud" number="4" code="cloudy"></symbol>
<symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
<symbol id="Cloud" number="4" code="cloudy"></symbol>
<symbol id="Cloud" number="4" code="cloudy"></symbol>
<symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
<symbol id="Cloud" number="4" code="cloudy"></symbol>
<symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
<symbol id="Cloud" number="4" code="cloudy"></symbol>
<symbol id="Cloud" number="4" code="cloudy"></symbol>
<symbol id="Cloud" number="4" code="cloudy"></symbol>
<symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
<symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
<symbol id="LightCloud" number="2" code="fair_night"></symbol>
<symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
<symbol id="Sun" number="1" code="clearsky_night"></symbol>
<symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
<symbol id="LightCloud" number="2" code="fair_night"></symbol>
<symbol id="LightCloud" number="2" code="fair_night"></symbol>
<symbol id="LightCloud" number="2" code="fair_night"></symbol>
<symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
<symbol id="LightCloud" number="2" code="fair_night"></symbol>
<symbol id="Cloud" number="4" code="cloudy"></symbol>
<symbol id="Cloud" number="4" code="cloudy"></symbol>
<symbol id="Cloud" number="4" code="cloudy"></symbol>
CodePudding user response:
Assuming you have valid XML per @cyrus comment above, you could use an XSLT transform via xsltproc
:
src.xslt
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" />
<xsl:strip-space elements="*"/>
<xsl:template match="symbol">
<xsl:for-each select="@code">
<xsl:value-of select="concat(., '
')"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Use xsltproc
to transform your xml:
xsltproc src.xslt src.xml
Output:
clearsky_night
clearsky_night
clearsky_night
cloudy
partlycloudy_night
cloudy
partlycloudy_night
partlycloudy_night
partlycloudy_night
partlycloudy_night
cloudy
partlycloudy_night
cloudy
partlycloudy_night
cloudy
cloudy
partlycloudy_night
cloudy
partlycloudy_night
cloudy
cloudy
cloudy
partlycloudy_night
partlycloudy_night
fair_night
partlycloudy_night
clearsky_night
partlycloudy_night
fair_night
fair_night
fair_night
partlycloudy_night
fair_night
cloudy
CodePudding user response:
How about this:
grep -o -P '(?<=code="). ?(?=")' input_file.xml
I checked with the lookaround for the usage of (?<=...)
and (?=...)
.
Or, use perl
my friend:
$ perl -pe 's:^. code="(. ?)". $:\1:' <input_file.xml
Explanation:
perl -pe
: runperl
with commands including in next string param.s:...:...:
: substitution."(. ?)"
: things inside""
, non-gready (?
).^. code="
: everything starting from line beginning tocode="
.". $
: everything from"
to end of line.
Of course, it is a quick dirty solution. A XML parser would be better.
(sorry for my broken English)