I am trying to join line and get my data is in below form
CODE> AA
LINE1>ABCD
LINE2>AJDS
LINE3>AJDO
SYMBOL1>Q1
SYMBOL2>Q2
SYMBOL3>Q3
SYMBOL4>Q4
CODE> BB
LINE1>HFIN
LINE2>HPAD
LINE3>HIDF
LINE4>HINA
SYMBOL1>SA
SYMBOL2>SS
CODE> CC
and I am trying to get out something line below where code is constant and appears only 1 time, however LINE and SYMBOL can appear more than once
CODE | LINE | SYMBOL |
---|---|---|
AA | ABCD | Q1 |
AA | AJDS | Q2 |
AA | AJDO | Q3 |
AA | Q4 | |
BB | HFIN | SA |
BB | HPAD | SS |
BB | HIDF | |
BB | HINF |
grep -e CODE -e LINE -e SYMBOL test.txt
|awk'NR%4{printf"%s",$1,$2,$3;}4'
CODE>CODE> AA
LINE1>ABCDLINE1>ABCD
LINE2>AJDSLINE2>AJDS
LINE3>AJDO
SYMBOL1>Q1SYMBOL1>Q1
SYMBOL2>Q2SYMBOL2>Q2
SYMBOL3>Q3SYMBOL3>Q3
SYMBOL4>Q4
CODE>CODE> BB
LINE1>HFINLINE1>HFIN
I am not an expert tbh Thanks Sandy
CodePudding user response:
Here is a ruby to produce a CSV file from that input:
ruby -r csv -e '
codes={}
$<.read.scan(/^CODE>[[:space:]]*(. )\R([\s\S]*?)(?=^CODE|\z)/){|code, st|
codes[code]=st.split(/\R/).map{|s| s.split(/\d >/)}.
group_by{|sl| sl.shift}.transform_values(&:flatten)
}
headers=codes.map{|k,v| v.keys}.flatten.uniq
tbl=CSV.parse((["CODES"] headers).join(","), headers: true)
codes.each{|k,v|
(0..(v.length) 1).each{|i|
tmp=[k]
headers.each{|hk| tmp<<codes[k][hk][i] }
tbl<<tmp
}
}
puts tbl
' file
Prints:
CODES,LINE,SYMBOL
AA,ABCD,Q1
AA,AJDS,Q2
AA,AJDO,Q3
AA,,Q4
BB,HFIN,SA
BB,HPAD,SS
BB,HIDF,
BB,HINA,
CodePudding user response:
I would use an awk
that implements multi-dim arrays, for example GNU awk
:
awk -F '> *' -v OFS=',' '
$1 == "CODE" {
code = $2
next
}
match($1,/[0-9] /) {
key = substr($1,1,RSTART-1)
idx = substr($1,RSTART,RLENGTH)
val = $2
records[code][key][idx] = val
indexes[code][idx]
}
END {
PROCINFO["sorted_in"] = "@ind_num_asc"
for (code in records)
for (idx in indexes[code])
print code, records[code]["LINE"][idx], records[code]["SYMBOL"][idx]
}
' input.txt
The output will be an unquoted CSV:
$ cat output.csv
AA,ABCD,Q1
AA,AJDS,Q2
AA,AJDO,Q3
AA,,Q4
BB,HFIN,SA
BB,HPAD,SS
BB,HIDF,
BB,HINA,