I have a big parquet file with some data. Let's say there is many information about some animals like:
id, name, breed, traits
and I can query it in the spark in a standard way by sql. Example:
spark.sql("SELECT * form animals where id IN (10, 11)").collect()
and I got a result.
But what I want to do is copy that found records as new parquet, with the same structure. Is that even possible? I tried to find some information on web, but I don't find anything useful so stack as always is my last hope :)
Maybe someone has some hints or resources, docs about that kind of operation on parquets?
CodePudding user response:
You can store that results in a df and then save that data as parquet file -
df = spark.sql("SELECT * form animals where id IN (10, 11)")
df.write.parquet("filename.parquet")
To know more about parquet file read and write - click here