Home > other >  Pandas internals - "Index labels must be unique"
Pandas internals - "Index labels must be unique"

Time:05-16

The Pandas Internals documentation (v1.2.4) states

In pandas there are a few objects implemented which can serve as valid containers for the axis labels:

  • Index: the generic “ordered set” object, an ndarray of object dtype assuming nothing about its contents. The labels must be hashable (and likely immutable) and unique. Populates a dict of label to location in Cython to do O(1) lookups.

Clearly dataframe indexes do not need to be unique:

df = pd.DataFrame({10, 20, 30}, index=['a','b','b'])
df.index
# Index(['a', 'b', 'b'], dtype='object')

Why does the documentation quoted above state that labels in an index must be unique?

CodePudding user response:

In pandas there are a few objects implemented which can serve as valid containers for the axis labels

The keyword here is valid, Pandas allows you to create non-unique indexes, but there will be some functions with errors. Just because it allows something, doesn't make it "valid".

The set_index() function has a keyword verify_integrity that can be used to make the function error when the index wouldn't be "valid".

  • Related