Home > Software engineering >  Python ElasticSearch: Mapper Parsing Exceptions for join field
Python ElasticSearch: Mapper Parsing Exceptions for join field

Time:01-25

I'm using ElasticSearch 8.3.2 to store some data I have. The data consists of metabolites and several "studies" for each metabolite, with each study in turn containing concentration values. I am also using the Python ElasticSearch client to communicate with the backend, which works fine. To associate metabolites with studies, I was considering using a join field as described here.

I have defined this index mapping:

INDEXMAPPING_MET = {
    "mappings": {
        "properties": {
            "id": {"type": "keyword"},
            "entry_type": {"type": "text"},
            "pc_relation": {
                "type": "join",
                "relations": {
                    "metabolite": "study"
                }
            },
            "concentration": {
                "type": "nested",
            }
        }
    }
}

pc_relation is the join field here, with metabolites being the parent documents of each study document. I can create metabolite entries (the parent documents) just fine using the Python client, for example

self.client.index(index="metabolitesv2", id=metabolite, body=json.dumps({
                #[... some other fields here]
                "pc_relation": {
                    "name": "metabolite",
                },
            }))

However, once I try adding child documents, I get a mapping_parser_exception. Notably, I only get this exception when trying to add the pc_relation field, any other fields work just fine and I can create documents if I omit the join field. Here is an example for a study document I am trying to create (on the same index):

self.client.index(index="metabolitesv2", id=study, body=json.dumps({
                #[... some other fields here]
                "pc_relation": {
                    "name": "study",
                    "parent": metabolite_id
                },
            }))

At first I thought there might be some typing issues, but casting everything to a string sadly does not change the outcome. I would really appreciate any help with regards to where the error could be as I am not really sure what the issue is - From what I can tell from the official ES documentation and other Python ES projects I am not really doing anything differently.

Tried: Creating an index with a join field, creating a parent document, creating a child document with a join relation to the parent. Expectation: Documents get created and can be queried using has_child or has_parent tags. Result: MappingParserException when trying to create the child document

CodePudding user response:

Tldr;

You need to provide a routing value at indexing time for the child document.

The routing value is mandatory because parent and child documents must be indexed on the same shard

By default the routing value of a document is its _id, so in practice you need to provide the _id of the parent document when indexing the child.

Solution

self.client.index(index="metabolitesv2", id=study, routing=metabolite, body=json.dumps({
    #[... some other fields here]
    "pc_relation": {
        "name": "study",
        "parent": metabolite_id
    },
}))

To reproduce

PUT 75224800
{
  "settings": {
    "number_of_shards": 4
  }, 
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      },
      "pc_relation": {
        "type": "join",
        "relations": {
          "metabolite": "study"
        }
      }
    }
  }
}

PUT 75224800/_doc/1
{
  "id": "1",
  "pc_relation": "metabolite"
}

# No routing Id this is going to fail
PUT 75224800/_doc/2
{
  "id": 2,
  "pc_relation":{
    "name": "study",
    "parent": "1"
  }
}

PUT 75224800/_doc/3
{
  "id": "3",
  "pc_relation": "metabolite"
}

PUT 75224800/_doc/4?routing=3
{
  "id": 2,
  "pc_relation":{
    "name": "study",
    "parent": "3"
  }
}
  • Related