The Pandas Internals documentation (v1.2.4) states
In pandas there are a few objects implemented which can serve as valid containers for the axis labels:
- Index: the generic “ordered set” object, an ndarray of object dtype assuming nothing about its contents. The labels must be hashable (and likely immutable) and unique. Populates a dict of label to location in Cython to do O(1) lookups.
Clearly dataframe indexes do not need to be unique:
df = pd.DataFrame({10, 20, 30}, index=['a','b','b'])
df.index
# Index(['a', 'b', 'b'], dtype='object')
Why does the documentation quoted above state that labels in an index must be unique?
CodePudding user response:
In pandas there are a few objects implemented which can serve as valid containers for the axis labels
The keyword here is valid, Pandas allows you to create non-unique indexes, but there will be some functions with errors. Just because it allows something, doesn't make it "valid".
The set_index()
function has a keyword verify_integrity
that can be used to make the function error when the index wouldn't be "valid".