Home > other >  Best practice to read data from EMR to physical server
Best practice to read data from EMR to physical server

Time:12-22

I am using pyspark to read data from EMR. But if the EMR cluster is fully occupied, I can see on the cluster manager that all the memories are occupied by some ETL job, still can I run this script on my physical server that bring data from the EMR cluster to my physical server ? what is the best practice suggest ?

Will it take the same amount of time to read the data from EMR to physical server? how does it handle the request if it is requested to read the data when it is fully occupied on EMR?

What kind of processes are executed on EMR (s3 bucket) while accessing/reading data from physical server through s3 utility?

Can I pull data to physical server when EMR cluster is fully occupied? If no, why ?

Thanks and Regards

CodePudding user response:

The best practice is to expand your cluster and do things consistently. I strongly suggest that you do not develop out-of-band/one-off processes that move things outside of the normal flow of work.

  • Related