Home > Software engineering >  Presto & MongoDB - Schema creation and updates
Presto & MongoDB - Schema creation and updates

Time:02-15

I have the Presto set-up done locally and am able to query data from MongoDB collections. At the start, I created presto_schema collection into MongoDB to let Presto understand the Collection details in order to query and I had one Collection entry added into my presto_schema. However, I noticed later that any new Collection into MongoDB, which was not added into presto_schema is still accessible from Presto and upon the first query it is observed that the new collection details are automatically amended into the presto_schema collection with the relevant new collection schema details.

But for the collections with nested schema, it is missing to automatically add all the nested fields and it only adds what it identifies from the initial query.

For example, consider below is my Collection (new_collection), which it got created newly with content as below:

{
    "_id" : "13ec5e2a-ef04-4d05-b971-ef8e65638f83",
    "name" : "npt",
    "client" : "npt_client",
    "attributes" : {
        "level" : 697,
        "country" : "SC",
        "doy" : 2022
    }
}

And say if my first query from Presto is as below:

presto:mydb> select count(*) from new_collection where attributes.level > 200;

The presto_schema gets automatically added with a new entry for this new collection, however, it adds all the non-nested fields info and the nested fields too that are available from the initial query, but fails to add the other nested fields. So any queries on the other nested fields, Presto does not recognize them. I could go ahead and amend the presto_schema with all the missing nested fields, but wondering if there is any other automated way. So that, we don't need to keep amending it manually on any new field addition into the collection (considering a scenario where we have a complete dynamic fields, which would be added into the Collection's nested object).

CodePudding user response:

I would recommend upgrading to Trino (formerly PrestoSQL) because the MongoDB connector (version >= 360) supports mapping fields to JSON type. This type mapping is unavailable in prestodb.

https://trino.io/download.html

  • Related