Given following output derived from df -P | awk '!/udev|boot|tmpfs|none/ && NR>1 {printf ("%-10s\t%-10s\t%-10s\n", $1, $2, $6)}' | grep -wv /.

/dev/sda2       576075280       /hdd
/dev/sda1       1344681704      /home
/dev/vda2       2468687261      /media/user/backup
/dev/vda1       823581356       /media/user/movie
/dev/sdb2       676075280       /media/user/db2
/dev/sdb1       1691481049      /media/user/db1

I want to select row with largest storage from each partition, the desired output would be.

/dev/sda1       1344681704      /home
/dev/vda2       2468687261      /media/pi/backup
/dev/sdb1       1691481049      /media/pi/db1

CodePudding user response：

Solution

cat input.txt |
  awk '{print substr($1, 1, match($1, "[[:digit:]]") - 1), $0}' |
  sort -k1,1 -k3,3nr |
  awk 'id!=$1{ print; id = $1}' | cut -d ' ' -f2-

Input

λ cat input.txt 
/dev/sda2       576075280       /hdd
/dev/sda1       1344681704      /home
/dev/vda2       2468687261      /media/user/backup
/dev/vda1       823581356       /media/user/movie
/dev/sdb2       676075280       /media/user/db2
/dev/sdb1       1691481049      /media/user/db1

Output

/dev/sda1       1344681704      /home
/dev/sdb1       1691481049      /media/user/db1
/dev/vda2       2468687261      /media/user/backup

Explanation

Here we use a technique called Schwartzian transform.

Your question is ambiguous because we don't know how you would consider 2 partitions are the same. Here I use the command awk '{print substr($1, 1, match($1, "[[:digit:]]") - 1), $0}' but you can change it to achieve your needs.

λ cat input.txt | awk '{print substr($1, 1, match($1, "[[:digit:]]") - 1), $0}'                                                                     
/dev/sda /dev/sda2       576075280       /hdd
/dev/sda /dev/sda1       1344681704      /home
/dev/vda /dev/vda2       2468687261      /media/user/backup
/dev/vda /dev/vda1       823581356       /media/user/movie
/dev/sdb /dev/sdb2       676075280       /media/user/db2
/dev/sdb /dev/sdb1       1691481049      /media/user/db1

After adding an extra field as partition identifier, we can easily solve your problem by using combination of sort, awk and cut.

CodePudding user response：

In Linux you can just use lsblk instead of df for finding the biggest partition of each disk:

lsblk -nPpbo KNAME,SIZE,PKNAME,MOUNTPOINT |

awk -F'="|" ?' -v OFS='\t' '
    {
        kname = $2       # device name, for ex. /dev/sda1
        size = $4        # size of the device, in Bytes
        pkname = $6      # parent device name, for ex. /dev/sda 
        mountpoint = $8  # where the device is mounted, absolute path
    }

    pkname !~ "^/" { next }
    mountpoint !~ "^/" { next }
    mountpoint == "/" { next } # not sure why you want to exclude /

    size > sizes[pkname] {
        knames[pkname] = kname
        sizes[pkname] = size
        mountpoints[pkname] = mountpoint
    }

    END {
        for (pkname in knames)
            print knames[pkname], sizes[pkname], mountpoints[pkname]
    }
'

^{remark: the size will be displayed in Bytes instead of 512 or 1024 blocks, and possibly problematic characters in the fields (mostly in the mount point) will be escaped with a two digits hexadecimal notation \xHH. IMHO both of those are good points because you'll be able to read and unescape the resulting TSV accurately with bash.}

Here are the relevant options from lsblk manual and help:

-b, --bytes
       Print the SIZE column in bytes rather than in a human-readable format.
-n, --noheadings
       Do not print a header line.
-P, --pairs
       Produce output in the form of key="value" pairs.
       All potentially unsafe characters are hex-escaped (\x<code>).
-p, --paths
       Print full device paths.
-o, --output list
       Specify which output columns to print. [...]

       KNAME internal kernel device name
       MOUNTPOINT where the device is mounted
       SIZE size of the device
       PKNAME internal parent kernel device name