Home > OS >  hdfs how to output size zero file in a specific directory path
hdfs how to output size zero file in a specific directory path

Time:12-27

For example, I want to output all zero files path in a specific directory like hdfs://<DIRECTORY>.

-rw-r--r--   3 USER supergroup          0 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03767.pb.zstd
-rw-r--r--   3 USER supergroup      71667 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03768.pb.zstd
-rw-r--r--   3 USER supergroup      94330 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03769.pb.zstd
-rw-r--r--   3 USER supergroup      14756 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03770.pb.zstd
-rw-r--r--   3 USER supergroup          0 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03771.pb.zstd
// output
-rw-r--r--   3 USER supergroup          0 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03767.pb.zstd
-rw-r--r--   3 USER supergroup          0 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03771.pb.zstd

I want to use hdfs -ls or hdfs -du and awk, but I am not familiar with awk.
How to implement it.
Thanks in advances.

UPDATED: one more question, if I want to recursively output file path which size is zero in specific directory hdfs://<DIRECTORY>.

// <DIRECTORY>/<TIME1>
-rw-r--r--   3 USER supergroup          0 2022-10-23 21:52 hdfs://<DIRECTORY>/<TIME1>/part-03767.pb.zstd
-rw-r--r--   3 USER supergroup      71667 2022-10-23 21:52 hdfs://<DIRECTORY>/<TIME1>/part-03768.pb.zstd
-rw-r--r--   3 USER supergroup      94330 2022-10-23 21:52 hdfs://<DIRECTORY>/<TIME1>/part-03769.pb.zstd

// <DIRECTORY>/<TIME2>
-rw-r--r--   3 USER supergroup      14756 2022-10-23 21:52 hdfs://<DIRECTORY>/<TIME2>/part-03770.pb.zstd
-rw-r--r--   3 USER supergroup          0 2022-10-23 21:52 hdfs://<DIRECTORY>/<TIME2>/part-03771.pb.zstd

// output
-rw-r--r--   3 USER supergroup          0 2022-10-23 21:52 hdfs://<DIRECTORY>/<TIME1>/part-03767.pb.zstd
-rw-r--r--   3 USER supergroup          0 2022-10-23 21:52 hdfs://<DIRECTORY>/<TIME2>/part-03771.pb.zstd

CodePudding user response:

If the output of hdfs ls is reliable :

$ hdfs ls | awk '$5 == 0'
-rw-r--r--   3 USER supergroup          0 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03767.pb.zstd
-rw-r--r--   3 USER supergroup          0 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03771.pb.zstd

This is one of the most simple awk command ;)

  • Related