Let my-index-0 be an ES index with an alias of my-index.
It has the following mapping:
{
"my-index-0": {
"aliases": {
"my-index": {}
},
"mappings": {
"doc": {
"properties": {
"foo": {
"properties": {
"fizz": {
"type": "keyword"
},
"baz": {
"type": "keyword"
}
}
}
}
}
}
}
}
Let's say I want to remove the baz
field from foo
. I'm using the following steps:
- Create a new index
my-index-1
with updated mapping (foo.baz
removed) usingPUT /my-index-1
{
"mappings": {
"doc": {
"properties": {
"foo": {
"properties": {
"fizz": {
"type": "keyword"
},
}
}
}
}
}
}
- Reindex data from
my-index-0
tomy-index-1
usingPOST /_reindex
{
"source": {
"index": "my-index-0"
},
"dest": {
"index": "my-index-1"
}
}
- Move the
my-index
alias to themy-index-1
index usingPOST /_aliases
{
"actions": [
{"remove": {"index": "my-index-0", "alias": "my-index"}},
{"add": {"index": "my-index-1", "alias": "my-index"}},
]
}
Expected result
Data in the new index does not have the foo.baz
property.
Actual result
On my-index-1
creation, its mapping does not contain the foo.baz
field, however, after re-indexation, my-index-1
's mapping is changed to the old index' mapping.
Note: _source
can be used for simple fields removal
If one wants to remove a field, for example, removal of bar
from the mapping below
{
"mappings": {
"foo": {
"type": "text"
},
"bar": {
"type": "text"
}
}
}
it is sufficient to provide the _source
param without the bar
field in the request to reindex API:
{
"source": {
"index": "my-index-0",
"_source": ["foo"]
},
"dest": {
"index": "my-index-1"
}
}
How to achieve the same with a nested structure?
CodePudding user response:
When you use reindex
ES tries to copy all data from source to destination index. If you want to make your index to not to be modified you need to add this line to your mapping:
"dynamic" : "strict"
Now if you want to reindex
data you will get an error "strict_dynamic_mapping_exception"
because "mapping set to strict, dynamic introduction of [baz] within [foo] is not allowed"
. So you need to delete this field in your reindex
like this:
POST _reindex
{
"source": {
"index": "my-index-0"
},
"dest": {
"index": "my-index-1"
},
"script": {
"source": "ctx._source.remove(\"foo.baz\")"
}
}
Note: adding "dynamic" : "strict"
is optional and prevents your index from modifying. It will work for you if you just edit your reindex
query.
CodePudding user response:
I think I've found the generic solution I was looking for.
In the _source
attribute, one can specify explicitly every nested field, therefore, the _source
value for the scenario in the example should be ["foo.fiz"]
- note the lack of "foo.bar"
which shouldn't be copied.
{
"source": {
"index": "my-index-0",
"_source": ["foo.fiz"]
},
"dest": {
"index": "my-index-1"
}
}
Essentially, the problem of generating the "_source"
attribute for a generic case, can be reduced to finding the intersection of sets of all property paths for old and new mappings.
Python solution
The function below Recursively iterate through properties and yield all property paths.
def get_property_path(properties: dict[str, Any], name: str = "") -> Iterator[str]:
for property_name, property_value in properties.items():
new_name = f"{name}.{property_name}" if name else property_name
if nested_properties := property_value.get("properties"):
yield from get_property_path(nested_properties, new_name)
else:
yield new_name
for example
>>> properties = {
"a": {
"properties": {
"b": {
"properties": {
"c": {"type": "text"},
},
},`
},
},
"e": {
"properties": {
"f": {"type": "text"},
},
},
}
>>> list(get_property_path(properties))
>>> ['a.b.c', 'e.f']
It can be later used to calculate the set of fields that should be copied (fields that are both in old and new mapping):
_source = list(
set(get_property_path(old_mapping["properties"]))
& set(get_property_path(new_mapping["properties"]))
)
I won't accept my answer tho, as there might be a simpler solution that is based on the ES API.