I have thousands of files on my unix server and i want to count the records of files on hourly basis in below format. What is the easiest way to do it?
Date Hour records
2022-07-08 00 5565
2022-07-08 01 77878
2022-07-08 02 545
.
.
2022-07-08 23 656
2022-07-09 00 787
2022-07-09 01 54547
CodePudding user response:
You can try rquery, it can sort and group the records.
[oracle@oem-web-ss ~]$ ls -lrt /var/local/logs/ --time-style=" %Y/%m/%d:%H:%M:%S" | rq -q "parse /(\S )[ ]{1,}(\S )[ ]{1,}(\S )[ ]{1,}(\S )[ ]{1,}(\S )[ ]{1,}(?P<datetime>\S )[ ]{1,}(\S )/ | select truncdate(datetime,3600), count(1) | group truncdate(datetime,3600) | sort truncdate(datetime,3600)"
2022/07/08:05:00:00 17
2022/07/09:04:00:00 2
2022/07/09:05:00:00 18
2022/07/10:03:00:00 1
2022/07/10:04:00:00 1
2022/07/10:05:00:00 18
2022/07/10:22:00:00 1
2022/07/11:04:00:00 2
2022/07/11:05:00:00 20
...
You can download rquery from here: https://github.com/fuyuncat/rquery/releases
CodePudding user response:
Counting (recursively) all files in current dir, by hour
find
is the command to use for finding filesystem entries regarding any kind of consideration. This way will print one date, limited by hour, for each file found.
find . -type f -printf '%TY-%Tm-%Td %TH\n' | sort | uniq -c
Output could look like:
851 2022-07-13 00
849 2022-07-13 01
855 2022-07-13 02
858 2022-07-13 03
...
Some cosmetic, using sed
:
find . -type f -printf '%TY-%Tm-%Td %TH\n' |
sort |
uniq -c |
sed 's/^\( *[0-9]\ \) \([0-9-]\ \) \([0-9]\ \)/ \2 \3 \1/;
1i\ Date Hour Count'
Will produce:
Date Hour Count
...
2022-07-13 00 851
2022-07-13 01 849
2022-07-13 02 855
2022-07-13 03 858
...
By using ls
instead of find
?
ls -ARlrt --time-style=" |%Y-%m-%d:%H|" |
grep -a ^-|
cut -d \| -f 2 |
sort |
uniq -c
Will produce near same result:
...
851 2022-07-13:00
849 2022-07-13:01
855 2022-07-13:02
858 2022-07-13:03
...
But as ls
will print filenames who could contain special characters, could force grep
to procuce wrong output... This way is not recommended!