uniq -c in one column-CodePudding

Imagine we have a txt file like the next one:

Input:

a1 D1
b1 D1
c1 D1
a1 D2
a1 D3
c1 D3

I want to count the time each element in the first column appears but also keep the information provided by the second column (someway). Potential possible output formats are represented, but any coherent alternative is also accepted:

Possible output 1:

3 a1 D1,D2,D3
1 b1 D1
2 c1 D1,D3

Possible output 2:

3 a1 D1
1 b1 D1
2 c1 D1
3 a1 D2
3 a1 D3
1 c1 D3

How can I do this? I guess a combination sort -k 1 input | uniq -c <keep col2> or perhaps using awk but I was not able to write anything that works. However, all answers are considered.

CodePudding user response：

Using any awk:

$ awk '
    {
        vals[$1] = ($1 in vals ? vals[$1] "," : "") $2
        cnts[$1]  
    }
    END {
        for (key in vals) {
            print cnts[key], key, vals[key]
        }
    }
' file
3 a1 D1,D2,D3
1 b1 D1
2 c1 D1,D3

CodePudding user response：

I would harness GNU AWK for this task following way, let file.txt content be

a1 D1
b1 D1
c1 D1
a1 D2
a1 D3
c1 D3

then

awk 'FNR==NR{arr[$1] =1;next}{print arr[$1],$0}' file.txt file.txt

gives output

3 a1 D1
1 b1 D1
2 c1 D1
3 a1 D2
3 a1 D3
2 c1 D3

Explanation: 2-pass solution (observe that file.txt is repeated), first pass does count number of occurences of first column value storing that data into array arr, second pass is for printing computed number from array, followed by whole line.

(tested in GNU Awk 5.0.1)