How to write record from parquet to another parquet?-CodePudding

I have a big parquet file with some data. Let's say there is many information about some animals like:

 id, name, breed, traits

and I can query it in the spark in a standard way by sql. Example:

spark.sql("SELECT * form animals where id IN (10, 11)").collect()

and I got a result.

But what I want to do is copy that found records as new parquet, with the same structure. Is that even possible? I tried to find some information on web, but I don't find anything useful so stack as always is my last hope :)

Maybe someone has some hints or resources, docs about that kind of operation on parquets?

CodePudding user response：

You can store that results in a df and then save that data as parquet file -

df = spark.sql("SELECT * form animals where id IN (10, 11)") 

df.write.parquet("filename.parquet")

To know more about parquet file read and write - click here