Home > Enterprise >  Extracting data from API using grep
Extracting data from API using grep

Time:11-17

I'm trying to make a bash scraper, I've managed to extract the data, but struggle with fetching the lines for f.ex today's temperature using grep since the date and temperature is not on the same line. I would like the results to be outputted into a file.

I've tried grep -E -o '[2022]-[11]-[15]' | grep "celsius" | grep -E -o '[0-9]{1,2}.[0-9]{1,2}' > file.txt API result

`product >
<time datatype="forecast" from="2022-11-14T18:00:00Z" to="2022-11-14T18:00:00Z">
<location altitude="4" latitude="60.3913" longitude="5.3221">
<temperature id="TTT" unit="celsius" value="8.2"/>
<windDirection id="dd" deg="146.5" name="SE"/>
<windSpeed id="ff" mps="0.5" beaufort="1" name="Flau vind"/>
<windGust id="ff_gust" mps="1.2"/> 
<humidity unit="percent" value="82.5"/>
<pressure id="pr" unit="hPa" value="1014.5"/>
<cloudiness id="NN" percent="45.1"/> 
<fog id="FOG" percent="0.0"/>
<lowClouds id="LOW" percent="4.5"/>
<mediumClouds id="MEDIUM" percent="0.0"/>
<highClouds id="HIGH" percent="39.9"/>
<dewpointTemperature id="TD" unit="celsius" value="5.0"/>
</location>
</time>
<time datatype="forecast" from="2022-11-14T17:00:00Z" to="2022-11-14T18:00:00Z">
<location altitude="4" latitude="60.3913" longitude="5.3221">
<precipitation unit="mm" value="0.0" minvalue="0.0" maxvalue="0.0"/>
<symbol id="PartlyCloud" number="3" code="partlycloudy_night"/> 
</location>
</time>
<time datatype="forecast" from="2022-11-14T19:00:00Z" to="2022-11-14T19:00:00Z">
<location altitude="4" latitude="60.3913" longitude="5.3221"> 
<temperature id="TTT" unit="celsius" value="8.7"/>
<windDirection id="dd" deg="112.5" name="SE"/>
<windSpeed id="ff" mps="0.4" beaufort="1" name="Flau vind"/>
<windGust id="ff_gust" mps="0.8"/>
<humidity unit="percent" value="75.6"/>
<pressure id="pr" unit="hPa" value="1013.8"/>
<cloudiness id="NN" percent="57.5"/>
<fog id="FOG" percent="0.0"/>
<lowClouds id="LOW" percent="1.1"/>
<mediumClouds id="MEDIUM" percent="0.4"/>
<highClouds id="HIGH" percent="55.4"/>
<dewpointTemperature id="TD" unit="celsius" value="4.4"/>
</location>
</time>

Output to file should be.
8.2

CodePudding user response:

grep -A3 '2022-11-14' -m1 inputfile.txt | \
  grep -P -o "<temperature.*celsius.*\"\K\-?[0-9]{1,2}\.[0-9]{1,2}"
8.2
  • -A3 print 3 lines after match
  • -m1 Stop after first match
  • -P use Perl regex
  • -o grep only the match
  • \K ignore what is before
  • -? get - for negative temperature
  • [0-9]{1,2}.[0-9]{1,2} the temperature in celsius

You can also use xq:

$ date="2022-11-14"
$ xq -r '.product.time[0] | select (."@from" | contains("'$date'")) // null | '\
'.location|.temperature|(if ."@unit" == "celsius" then ."@value" else "error" end)' \
< input.html
8.2

Or as @AndyLester said, using xpath.

$ date="2022-11-14"
$ xmllint --xpath '//time[starts-with(@from,"'$date'")][1]'\
'//temperature[@unit="celsius"]/@value' input.txt  |\
grep -Po '[-]?\d \.\d '
  • Related