I have a directory with loads of XML files that are not formatted in the exact same way but the ones I am after contain the 3 same required fields. I need to know the SNPID of each library as they must be unique. For this I need to list each and their ID value.
Firstly I need to only get results from files containing the value="Kontakt" in the "PoweredBy" field. Then for each I need the Name, SNPID and RegKey to print into a 3 column CSV. eg. Mystica | 547 | Best Service - Mystica
The XML files are like this:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<ProductHints spec="1.0.16">
<Product version="2">
<UPID>70026fd5-8f6f-429e-b891-12c2f94bc566</UPID>
<Name>Mystica</Name>
<Type>Content</Type>
<NKSEnabled minVersion="1.1.0.0">true</NKSEnabled>
<Relevance maxVersion="1.0.9.0" minVersion="1.0.0.0">
<Application minVersion="5.0.0.0" nativeContent="true">kontakt</Application>
</Relevance>
<Relevance minVersion="1.1.0.0">
<Application minVersion="5.8.0.0" nativeContent="true">kontakt</Application>
<Application minVersion="2.6.5.0">maschine</Application>
<Application minVersion="1.8.2.0">kkontrol</Application>
</Relevance>
<PoweredBy>Kontakt</PoweredBy>
<Visibility maxVersion="1.0.9.0" minVersion="1.0.0.0" target="Standalone">1</Visibility>
<Company>Best Service</Company>
<AuthSystem>RAS2</AuthSystem>
<SNPID>547</SNPID>
<RegKey>Best Service - Mystica</RegKey>
<Icon>bestservice</Icon>
<ProductSpecific>
<HU>496B8CF4F8B1402C4A6650214DF2514C</HU>
<JDX>C9A2B6D9549FD159D8A3CFF054AAE934C2AC849EC74827847288DF07577A8F22</JDX>
<Visibility type="Number">3</Visibility>
</ProductSpecific>
</Product>
</ProductHints>
I've tried
Computer:~ user$ cd /Library/Application\ Support/Native\ Instruments/Service\ Center
Computer:Service Center
while read -r Name SNPID RegKey
do
echo " "
echo "Name: ${Name}"
echo "SNPID: ${SNPID}"
echo "RegKey: ${RegKey}"
awk -F '[<>]' '{if (FNR%2==1) {printf "%s: ",$3} else {print $3}}'
done
but the results are inconsistent (I know bash isn't great for parsing XML but the requirements are pretty basic).
find -name "*.xml" | xargs cat | tr -d "\n" | sed 's/<\/Name>/\n/g' | sed 's/<\/SNPID>/\n/g' | sed 's/<\/RegKey>/: /g' | sed 's/<[^>]*>//g' | egrep "Name:|SNPID:|RegKey:" | sed 's/Name: /---\nName: /g'
also results in error: find: illegal option -- n
Any BASH wizards out there that can help me please? (If possible I would prefer to use terminal without installing other things) Thanks
CodePudding user response:
Using sed
$ sed -n '/Name\|SNPID\|RegKey/{s/[^>]*>\([^<]*\).*/\1/;H;d};G;s/\n/ | /g;/^ |/{s/^ | \ | //p}' input_file
Mystica | 547 | Best Service - Mystica
CodePudding user response:
This might work for you (GNU sed):
sed -sE '/^\s*<(Name|SNPID|PoweredBy|RegKey)>(.*)<\/\1>\s*$/{s//\1:\2/;H}
$!d;g;/PoweredBy:Kontakt/!d
s/Name:([^\n]*)(.*)/\2\n\1/
s/SNPID:([^\n]*)(.*)/\2|\1/
s/RegKey:([^\n]*)(.*)/\2|\1/
s/.*\n//' file ...
Use the options -s
to treat each input file separately and -E
to simplify regexps.
Using alternation, extract the required fields into the hold buffer.
At the end of the file if the PoweredBy
field does not contain Kontakt
no further processing is required.
Otherwise, format the fields in the hold buffer into the required format and print the result.
CodePudding user response:
1. Locating required files using find
and grep
grep -l "PoweredBy>Kontakt" $(find . -name "*.xml")
Validate you get the correct file list.
2. Extracting data, and formating data from required files using awk
This awk
script, process only the required files one by one.
awk -F"[><]" '
$2 == "Name" {ret = $3 " | "}
$2 == "SNPID" {ret = ret $3 " | "}
$2 == "RegKey" {print ret $3}
' $(grep -l "PoweredBy>Kontakt" $(find . -name "*.xml"))
An alternative using awk
without grep
command.
awk -F"[><]" '
$2 == "PoweredBy" && $3 != "Kontakt" {nextfile}
$2 == "Name" {ret = $3 " | "}
$2 == "SNPID" {ret = ret $3 " | "}
$2 == "RegKey" {print ret $3}
' $(find . -name "*.xml")
CodePudding user response:
Highly recommeded: Don't try to parse xml with anything other than a proper xml parser. For that you can use either xmlstarlet
xml sel -T -t -m "//ProductHints//Product" -v //SNPID/. --nl -v //Name/. --nl -v //RegKey/. --nl your_file.xml
or, even simpler, xidel:
xidel your_file.xml -e '//ProductHints//Product//(Name,SNPID,RegKey)'
Output (based on the sample xml in your question) should be:
547
Mystica
Best Service - Mystica