In a plain text file,
bag1: apple
bag3: pear
bag2: potato
bag2: orange
bag1: banana
bag2: banana
onion
needs to be converted to
bag1: [apple, banana]
bag2: [banana, orange, potato]
bag3: [pear]
non-categorized: onion
Of course, sort
is the first step, then using python to check go over one by one. But is there a shell script alternative?
CodePudding user response:
Are standard tools OK? If so then you can use sort
and awk
:
sort file.txt |
awk '
{
if (idx = index($0,": ")) {
key = substr($0,1,idx)
val = substr($0,idx 2)
} else {
key = "non-categorized:"
val = $0
}
arr[key] = ((key in arr) ? arr[key] ", " : "") val
}
END {
for (key in arr)
print key, "[", arr[key], "]"
}
'
non-categorized: [ onion ]
bag1: [ apple, banana ]
bag2: [ banana, orange, potato ]
bag3: [ pear ]
remark: while the values in brackets are guaranteed to be sorted because of the sort
command, the order of the categories in the final output is not.
CodePudding user response:
Here is another awk
:
awk -F ":[[:space:]] " '
{ key=NF>1 ? $1 : "non-categorized"
value=NF>1 ? $2 : $1
set[key]=set[key] ? set[key] ", " value : value
}
END {
for (k in set) {
print k ": [" set[k] "]"
}
}
' file | sort
Prints:
bag1: [apple, banana]
bag2: [potato, orange, banana]
bag3: [pear]
non-categorized: [onion]