Home > database >  Join line and print in column
Join line and print in column

Time:01-10

I am trying to join line and get my data is in below form

CODE> AA
LINE1>ABCD
LINE2>AJDS
LINE3>AJDO
SYMBOL1>Q1
SYMBOL2>Q2
SYMBOL3>Q3
SYMBOL4>Q4
CODE> BB
LINE1>HFIN
LINE2>HPAD
LINE3>HIDF
LINE4>HINA
SYMBOL1>SA
SYMBOL2>SS
CODE> CC

and I am trying to get out something line below where code is constant and appears only 1 time, however LINE and SYMBOL can appear more than once

CODE LINE SYMBOL
AA ABCD Q1
AA AJDS Q2
AA AJDO Q3
AA Q4
BB HFIN SA
BB HPAD SS
BB HIDF
BB HINF
grep -e CODE -e LINE -e SYMBOL test.txt 
|awk'NR%4{printf"%s",$1,$2,$3;}4'
CODE>CODE> AA     
LINE1>ABCDLINE1>ABCD     
LINE2>AJDSLINE2>AJDS      
LINE3>AJDO      
SYMBOL1>Q1SYMBOL1>Q1     
SYMBOL2>Q2SYMBOL2>Q2     
SYMBOL3>Q3SYMBOL3>Q3      
SYMBOL4>Q4      
CODE>CODE> BB     
LINE1>HFINLINE1>HFIN      

I am not an expert tbh Thanks Sandy

CodePudding user response:

Here is a ruby to produce a CSV file from that input:

ruby -r csv -e '
codes={}
$<.read.scan(/^CODE>[[:space:]]*(. )\R([\s\S]*?)(?=^CODE|\z)/){|code, st| 
    codes[code]=st.split(/\R/).map{|s| s.split(/\d >/)}.
        group_by{|sl| sl.shift}.transform_values(&:flatten)
}

headers=codes.map{|k,v| v.keys}.flatten.uniq

tbl=CSV.parse((["CODES"] headers).join(","), headers: true)
codes.each{|k,v| 
    (0..(v.length) 1).each{|i|
        tmp=[k]
        headers.each{|hk| tmp<<codes[k][hk][i] }
    tbl<<tmp
    }
}
puts tbl
' file

Prints:

CODES,LINE,SYMBOL
AA,ABCD,Q1
AA,AJDS,Q2
AA,AJDO,Q3
AA,,Q4
BB,HFIN,SA
BB,HPAD,SS
BB,HIDF,
BB,HINA,

CodePudding user response:

I would use an awk that implements multi-dim arrays, for example GNU awk:

awk -F '> *' -v OFS=',' '
    $1 == "CODE" {
        code = $2
        next
    }
    match($1,/[0-9] /) {
        key = substr($1,1,RSTART-1)
        idx = substr($1,RSTART,RLENGTH)
        val = $2
        records[code][key][idx] = val
        indexes[code][idx]
    }
    END {
        PROCINFO["sorted_in"] = "@ind_num_asc"
        for (code in records)
            for (idx in indexes[code])
                print code, records[code]["LINE"][idx], records[code]["SYMBOL"][idx]
    }
' input.txt

The output will be an unquoted CSV:

$ cat output.csv
AA,ABCD,Q1
AA,AJDS,Q2
AA,AJDO,Q3
AA,,Q4
BB,HFIN,SA
BB,HPAD,SS
BB,HIDF,
BB,HINA,
  • Related