Home > Enterprise >  Is there a way to use sed or grep to remove all unnecessary text and show only one word?
Is there a way to use sed or grep to remove all unnecessary text and show only one word?

Time:11-26

Is there a way to remove all words that are before code=" and after "> in my file so I'm left with clearsky_night or cloudy, or sun etc?

I have tried grep -o -P '(?<=>).*(?=>)' but get an error message sating unknown option to 's'

I also tried grep -o -P '(?<=code=").*(?=" )' but that didn't work either. This is what's in my file:

    <symbol id="Sun" number="1" code="clearsky_night"></symbol>
    <symbol id="Sun" number="1" code="clearsky_night"></symbol>
    <symbol id="Sun" number="1" code="clearsky_night"></symbol>
   <symbol id="Cloud" number="4" code="cloudy"></symbol>
    <symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
    <symbol id="Cloud" number="4" code="cloudy"></symbol>
    <symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
    <symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
    <symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
    <symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
    <symbol id="Cloud" number="4" code="cloudy"></symbol>
    <symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
    <symbol id="Cloud" number="4" code="cloudy"></symbol>
    <symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
    <symbol id="Cloud" number="4" code="cloudy"></symbol>
    <symbol id="Cloud" number="4" code="cloudy"></symbol>
    <symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
    <symbol id="Cloud" number="4" code="cloudy"></symbol>
    <symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
    <symbol id="Cloud" number="4" code="cloudy"></symbol>
    <symbol id="Cloud" number="4" code="cloudy"></symbol>
    <symbol id="Cloud" number="4" code="cloudy"></symbol>
    <symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
    <symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
    <symbol id="LightCloud" number="2" code="fair_night"></symbol>
    <symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
    <symbol id="Sun" number="1" code="clearsky_night"></symbol>
    <symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
    <symbol id="LightCloud" number="2" code="fair_night"></symbol>
    <symbol id="LightCloud" number="2" code="fair_night"></symbol>
    <symbol id="LightCloud" number="2" code="fair_night"></symbol>
    <symbol id="PartlyCloud" number="3" code="partlycloudy_night"></symbol>
    <symbol id="LightCloud" number="2" code="fair_night"></symbol>
    <symbol id="Cloud" number="4" code="cloudy"></symbol>
    <symbol id="Cloud" number="4" code="cloudy"></symbol>
    <symbol id="Cloud" number="4" code="cloudy"></symbol>

CodePudding user response:

Assuming you have valid XML per @cyrus comment above, you could use an XSLT transform via xsltproc:

src.xslt

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text" />
  <xsl:strip-space elements="*"/>

  <xsl:template match="symbol">
    <xsl:for-each select="@code">
      <xsl:value-of select="concat(., '&#xA;')"/>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

Use xsltproc to transform your xml:

xsltproc src.xslt src.xml

Output:

clearsky_night
clearsky_night
clearsky_night
cloudy
partlycloudy_night
cloudy
partlycloudy_night
partlycloudy_night
partlycloudy_night
partlycloudy_night
cloudy
partlycloudy_night
cloudy
partlycloudy_night
cloudy
cloudy
partlycloudy_night
cloudy
partlycloudy_night
cloudy
cloudy
cloudy
partlycloudy_night
partlycloudy_night
fair_night
partlycloudy_night
clearsky_night
partlycloudy_night
fair_night
fair_night
fair_night
partlycloudy_night
fair_night
cloudy

CodePudding user response:

How about this:

grep -o -P '(?<=code="). ?(?=")' input_file.xml

I checked with the lookaround for the usage of (?<=...) and (?=...).

Or, use perl my friend:

$ perl -pe 's:^. code="(. ?)". $:\1:' <input_file.xml

Explanation:

  • perl -pe: run perl with commands including in next string param.
  • s:...:...:: substitution.
  • "(. ?)": things inside "", non-gready (?).
  • ^. code=": everything starting from line beginning to code=".
  • ". $: everything from " to end of line.

Of course, it is a quick dirty solution. A XML parser would be better.

(sorry for my broken English)

  •  Tags:  
  • bash
  • Related