I have a directory "SmallFiles" that contains 8 files, I archived them using "hadoop archive -archiveName myArch.har -p /Files/SmallFiles /Files" then deleted the original files. I want to know how to extract files again?
When I download it I get these 3 files "index, masterindex, part-0"
CodePudding user response:
You need to access archived files via the har:// URI.
Thus, files archived with:
hadoop archive -archiveName foo.har -p /user/hadoop dir1 dir2 /user/zoo
would be accessed with
hadoop dfs -lsr har:///user/zoo/foo.har/
I think the docs are straightforward here: https://hadoop.apache.org/docs/current/hadoop-archives/HadoopArchives.html