For example, I want to output all zero files path in a specific directory like hdfs://<DIRECTORY>
.
-rw-r--r-- 3 USER supergroup 0 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03767.pb.zstd
-rw-r--r-- 3 USER supergroup 71667 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03768.pb.zstd
-rw-r--r-- 3 USER supergroup 94330 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03769.pb.zstd
-rw-r--r-- 3 USER supergroup 14756 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03770.pb.zstd
-rw-r--r-- 3 USER supergroup 0 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03771.pb.zstd
// output
-rw-r--r-- 3 USER supergroup 0 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03767.pb.zstd
-rw-r--r-- 3 USER supergroup 0 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03771.pb.zstd
I want to use hdfs -ls
or hdfs -du
and awk
, but I am not familiar with awk
.
How to implement it.
Thanks in advances.
UPDATED:
one more question, if I want to recursively output file path which size is zero in specific directory hdfs://<DIRECTORY>
.
// <DIRECTORY>/<TIME1>
-rw-r--r-- 3 USER supergroup 0 2022-10-23 21:52 hdfs://<DIRECTORY>/<TIME1>/part-03767.pb.zstd
-rw-r--r-- 3 USER supergroup 71667 2022-10-23 21:52 hdfs://<DIRECTORY>/<TIME1>/part-03768.pb.zstd
-rw-r--r-- 3 USER supergroup 94330 2022-10-23 21:52 hdfs://<DIRECTORY>/<TIME1>/part-03769.pb.zstd
// <DIRECTORY>/<TIME2>
-rw-r--r-- 3 USER supergroup 14756 2022-10-23 21:52 hdfs://<DIRECTORY>/<TIME2>/part-03770.pb.zstd
-rw-r--r-- 3 USER supergroup 0 2022-10-23 21:52 hdfs://<DIRECTORY>/<TIME2>/part-03771.pb.zstd
// output
-rw-r--r-- 3 USER supergroup 0 2022-10-23 21:52 hdfs://<DIRECTORY>/<TIME1>/part-03767.pb.zstd
-rw-r--r-- 3 USER supergroup 0 2022-10-23 21:52 hdfs://<DIRECTORY>/<TIME2>/part-03771.pb.zstd
CodePudding user response:
If the output of hdfs ls
is reliable :
$ hdfs ls | awk '$5 == 0'
-rw-r--r-- 3 USER supergroup 0 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03767.pb.zstd
-rw-r--r-- 3 USER supergroup 0 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03771.pb.zstd
This is one of the most simple awk
command ;)