Home > OS >  Processing a multiset to a set of lists
Processing a multiset to a set of lists

Time:11-14

In a plain text file,

bag1: apple
bag3: pear
bag2: potato
bag2: orange
bag1: banana
bag2: banana
onion

needs to be converted to

bag1: [apple, banana]
bag2: [banana, orange, potato]
bag3: [pear]
non-categorized: onion

Of course, sort is the first step, then using python to check go over one by one. But is there a shell script alternative?

CodePudding user response:

Are standard tools OK? If so then you can use sort and awk:

sort file.txt |
awk '
    {
        if (idx = index($0,": ")) {
            key = substr($0,1,idx)
            val = substr($0,idx 2)
        } else {
            key = "non-categorized:"
            val = $0
        }
        arr[key] = ((key in arr) ? arr[key] ", " : "") val
    }
    END {
        for (key in arr)
            print key, "[", arr[key], "]"
    }
'
non-categorized: [ onion ]
bag1: [ apple, banana ]
bag2: [ banana, orange, potato ]
bag3: [ pear ]

remark: while the values in brackets are guaranteed to be sorted because of the sort command, the order of the categories in the final output is not.

CodePudding user response:

Here is another awk:

awk -F ":[[:space:]] " '
{   key=NF>1 ? $1 : "non-categorized"
    value=NF>1 ? $2 : $1
    set[key]=set[key] ? set[key] ", " value : value
}
END {
    for (k in set) {
        print k ": [" set[k] "]"
    }
}
' file | sort 

Prints:

bag1: [apple, banana]
bag2: [potato, orange, banana]
bag3: [pear]
non-categorized: [onion]
  • Related