Home > Software design >  Extracting the user with the most amount of files in a dir
Extracting the user with the most amount of files in a dir

Time:09-16

I am currently working on a script that should receive a standard input, and output the user with the highest amount of files in that directory.

I've wrote this so far:

#!/bin/bash 
while read DIRNAME
do
        ls -l $DIRNAME | awk 'NR>1 {print $4}' | uniq -c      
done

and this is the output I get when I enter /etc for an instance:

 26 root
  1 dip
  8 root
  1 lp
 35 root
  2 shadow
 81 root
  1 dip
 27 root
  2 shadow
 42 root

Now obviously the root folder is winning in this case, but I don't want only to output this, i also want to sum the number of files and output only the user with the highest amount of files.

Expected output for entering /etc:

root

is there a simple way to filter the output I get now, so that the user with the highest sum will be stored somehow?

CodePudding user response:

ls -l /etc | awk 'BEGIN{FS=OFS=" "}{a[$4] =1}END{ for (i in a) print a[i],i}' | sort -g -r | head -n 1 | cut -d' ' -f2

This snippet returns the group with the highest number of files in the /etc directory.

What it does:

  1. ls -l /etc lists all the files in /etc in long form.
  2. awk 'BEGIN{FS=OFS=" "}{a[$4] =1}END{ for (i in a) print a[i],i}' sums the number of occurrences of unique words in the 4th column and prints the number followed by the word.
  3. sort -g -r sorts the output descending based on numbers.
  4. head -n 1 takes the first line
  5. cut -d' ' -f2 takes the second column while the delimiter is a white space.

Note: In your question, you are saying that you want the user with the highest number of files, but in your code you are referring to the 4th column which is the group. My code follows your code and groups on the 4th column. If you wish to group by user and not group, change {a[$4] =1} to {a[$3] =1}.

CodePudding user response:

Without unreliable parsing the output of ls:

read -r dirname

# List user owner of files in dirname
stat -c '%U' "$dirname/" |

# Sort the list of users by name
sort |

# Count occurrences of user
uniq -c |

# Sort by higher number of occurrences numerically
# (first column numerically reverse order)
sort -k1nr |

# Get first line only
head -n1 |

# Keep only starting at character 9 to get user name and discard counts
cut -c9-

CodePudding user response:

I have an awk script to read standard input (or command line files) and sum up the unique names.

summer:

awk '
    { sum[ $2 ]  = $1 }
END { 
  for ( v in sum ) {
    print v, sum[v]
  }
}
' "$@"

Let's say we are using your example of /etc:

ls -l /etc | summer 

yields:

 0
dip 2
shadow 4
root 219
lp 1

I like to keep utilities general so I can reuse them for other purposes. Now you can just use sort and head to get the maximum result output by summer:

ls -l /etc | summer | sort -r -k2,2 -n | head -1 | cut -f1 -d' '

Yields:

root
  • Related