Home > Back-end >  How do I capture information about files that are on HDFS
How do I capture information about files that are on HDFS

Time:10-28

I would like to capture certain information about each file that is in HDFS, such as: name, creation date, modification and last access. I thought about doing it using the Python OS module, but I'm not sure if it would be possible and also how to do it. Another alternative I thought would be to use the HDFS module itself, but the information about it on the internet is scarce and made it even more difficult.

Does anyone have any idea how I might be doing this?

CodePudding user response:

HDFS is not a normal filesystem that your computer can understand. Therefore, the os module will not be able to do anything with files store in HDFS.

You could try snakebite, which is a pure Python client for HDFS. There is an example on how to list files in HDFS using snakebite here.

  • Related