Home > database >  Solr. When indexing custom json, many fields of the same name are stored in one field
Solr. When indexing custom json, many fields of the same name are stored in one field

Time:07-14


I am trying to create an index from a json file using Solr 8.11.
Here is the content of my json file:
{
    "dict": "TEST En-En",
    "index_language": "En",
    "contents_language": "En",
    "lang": "En-En",
    "type": "explanatory",
    "words_count": "55236",
    "cards": [
        {
            "title": "hood",
            "title_index": "hood",
            "text": "<div class=m-l-15>definition</div> ",
            "text_index": "definition"
        },
        {
            "title": "'s Gravenhage",
            "title_index": "'s Gravenhage",
            "text": "<div class=m-l-15>definition</div> ",
            "text_index": "definition"
        },
        {
            "title": "'tween",
            "title_index": "'tween",
            "text": "<div class=m-l-15>definition</div> ",
            "text_index": "definition"
        }
    ]
}

I expect to receive the following:

{
    "dict": "TEST En-En",
    "index_language": "En",
    "contents_language": "En",
    "lang": "En-En",
    "type": "explanatory",
    "words_count": 55236,
    "title": "hood",
    "text": "<div class=m-l-15>definition</div> ",
},
{
    "dict": "TEST En-En",
    "index_language": "En",
    "contents_language": "En",
    "lang": "En-En",
    "type": "explanatory",
    "words_count": 55236,
    "title": "'s Gravenhage",
    "text": "<div class=m-l-15>definition</div> ",
},
{
    "dict": "TEST En-En",
    "index_language": "En",
    "contents_language": "En",
    "lang": "En-En",
    "type": "explanatory",
    "words_count": 55236,
    "title": "'tween",
    "text": "<div class=m-l-15>definition</div> ",
}

But I get this:

{
    "dict": "TEST En-En",
    "index_language": "En",
    "contents_language": "En",
    "lang": "En-En",
    "type": "explanatory",
    "words_count": 55236,
    "title": [
        "'hood",
        "'s Gravenhage",
        "'tween"
    ],
    "text": [
        "<div class=m-l-15>definition</div> ",
        "<div class=m-l-15>definition</div> ",
        "<div class=m-l-15>definition</div> "
    ]
}

That is, the title field from all documents is stored in one multi-valued title field.
Here is the schema:

  <field name="id" type="string" multiValued="false" indexed="true" stored="true"/>
  <field name="dict" type="string" multiValued="false" indexed="true" stored="true"/>
  <field name="index_language" type="string" multiValued="false" indexed="true" stored="true"/>
  <field name="contents_language" type="string" multiValued="false" indexed="true" stored="true"/>
  <field name="lang" type="string" multiValued="false" indexed="true" stored="true"/>
  <field name="type" type="string" multiValued="false" indexed="true" stored="true"/>
  <field name="words_count" type="tint"/>
  <field name="text" type="text_general"/>
  <field name="title" type="text_general"/>
  <field name="text_index" type="text_general" indexed="true" stored="false"/>
  <field name="title_index" type="text_general" indexed="true" stored="false"/>

This is the request:

path=/update/json/docs params={?split=/cards
&commitWithin=1000
&f=dict:/dict
&f=index_language:/index_language
&f=contents_language:/contents_language
&f=lang:/lang
&f=type:/type
&f=words_count:/words_count
&f=title:/cards/title
&f=title_index:/cards/title_index
&f=text:/cards/text
&f=text_index:/cards/text_index
 -H 'Content-type:application/json'
&overwrite=true
&wt=json}

According to the documentation, I should get what I expect.
Please tell me what am I doing wrong.

CodePudding user response:

You have an additional ? in front of the split parameter, effectively making it not work - since it gets a parameter named ?split and not split. Remove the additional ? and it should work.

  • Related