I'm trying to read the contents of a few files and using grep find the lines with the my search query and then output the results into a folder in another directory. I get an error "No such file or directory exists". I have created the folder structure and the text file.
hadoop fs -cat /Final_Dataset/c*.txt | grep 2015-01-* > /energydata/2015/01/01.txt
ERROR:
-bash: /energydata/2015/01/01.txt: No such file or directory
CodePudding user response:
> /energydata/2015/01/01.txt
means that the output is being redirected to a local file. hdfs fs -cat
sends output to your local machine and at that point you're no longer operating within Hadoop. grep
simply acts on a stream of data, it doesn't care (or know) where it came from.
You need to make sure that /energydata/2015/01/
exists locally before you run this command. You can create it with mkdir -p /energydata/2015/01/
.
If you're looking to pull certain records from a file on HDFS and then re-write the new file to HDFS then I'd suggest not cat
-ing the file and instead keeping the processing entirely on the cluster, by using something like Spark or Hive to transform data efficiently. Or failing that just do a hadoop dfs -put <local_path> /energydata/2015/01/01.txt
.
CodePudding user response:
The following CLI command worked
hadoop fs -cat /FinalDataset/c*.txt | grep 2015-01-* | hadoop fs -put - /energydata/2015/01/output.txt