Create an compound index in MongoDB on an ID field and array-CodePudding

I have a MongoDB and I want to insert documents that are unique for a combination of the ID field and an array.

For example:

test = mongo.db.test

test.create_index([('name', 1), ('values', 1)], unique=True)

docs = [
    {"name": "John", "values": ["apple", "orange"]},
    {"name": "John", "values": ["apple", "orange", "banana"]},
    {"name": "Mary", "values": ["apple", "orange"]}
]

for x in docs:
    try:
        result = test.insert_one(x)
    except:
        pass

for x in test.find({}):
    print(x)

Should all be inserted, but when running this code the second document doesn't get inserted:

{'_id': ObjectId('6399f848c6574f8aa1c14174'), 'name': 'John', 'values': ['apple', 'orange']}
{'_id': ObjectId('6399f848c6574f8aa1c14176'), 'name': 'Mary', 'values': ['apple', 'orange']}

All of the docs and help I've found on this say that this should work, but it doesn't. Any suggestions?

CodePudding user response：

The solution ("workaround"?) here is get the database to index the entire array as a single key. This can be accomplished by double nesting the array which would mean going from "values": ["apple", "orange"] to "values": [["apple", "orange"]]. More completely, docs should be:

docs = [
    {"name": "John", "values": [["apple", "orange"]]},
    {"name": "John", "values": [["apple", "orange", "banana"]]},
    {"name": "Mary", "values": [["apple", "orange"]]}
]

This playground example shows the duplicate key exception being generated with the original data. The alternative playground example here shows the modified data getting inserted as expected.

Why?

The error message from the first demonstration provides a hint as to what is going on here:

[E11000 duplicate key error collection: ... index: values_unique dup key: { name: "John", values: "apple" }]

Of particular interest the object for the duplicate key contains just "apple" as the value for the values field. The documentation about unique indexes has the following to say (emphasis added):

The unique constraint applies to separate documents in the collection. That is, the unique index prevents separate documents from having the same value for the indexed key.

Because the constraint applies to separate documents, for a unique multikey index, a document may have array elements that result in repeating index key values as long as the index key values for that document do not duplicate those of another document. In this case, the repeated index entry is inserted into the index only once.

Combined, we can determine that the first document in the question will generate 2 keys in the index, one for each value of the array. Then when the database attempts to insert the second document, it will attempt to generate 3 additional keys in the index. Two of those entries conflict with the two entries from the first document leading to the duplicate key exception.

The solution above behaves differently because the database is now indexing the entire array from the document as a single key in the index. So as opposed to the 7 total keys that the original documents would attempt to generate, the modified documents will result in just 3 total keys (one for each document). This is typically not good for querying efficiency, but it helps you achieve the requested uniqueness constraint.