Home > Mobile >  Pivot - Index contains duplicate entries, cannot reshape - which line in the dataframe is causing th
Pivot - Index contains duplicate entries, cannot reshape - which line in the dataframe is causing th

Time:07-26

I have an issue with a dataframe I try to pivot. The error message says that it contains duplicate entires. However I have checked the file and there are no duplicates (checked with df.duplicated, in Excel and manually). As I am running out of ideas, is there a way to know in which line in the dataframe is causing the error to throw? The Python error message is unfortuneately not very clear...

The code itself is working with another dataframe so I assume my code should be fine...

CodePudding user response:

Suppose you have a dataframe you want to pivot with 'A' column is the index and 'B' column is the column. You can debug your pivot with value_counts and duplicated:

>>> df.pivot('A', 'B', 'C')
...
ValueError: Index contains duplicate entries, cannot reshape

>>> df.value_counts('A', 'B')
A      B
Paul   v    2  # <- dup. All values should be set to 1!
Alex   v    1
Louis  v    1
dtype: int64

>>> df[df[['A', 'B']].duplicated(keep=False)]
      A  B  C
1  Paul  v  2
2  Paul  v  3

Input dataframe:

df = pd.DataFrame({'A': ['Louis', 'Paul', 'Paul', 'Alex'],
                   'B': ['v', 'v', 'v', 'v'],
                   'C': [1, 2, 3, 4]})
print(df)

# Output
       A  B  C
0  Louis  v  1
1   Paul  v  2  # <- dup (Paul, v)
2   Paul  v  3  # <- dup (Paul, v)
3   Alex  v  4

Now if you want to keep, for example, the first value encountered, use pivot_table:

>>> df.pivot_table('C', 'A', 'B', aggfunc='first')
B      v
A       
Alex   4
Louis  1
Paul   2

You can also use different predefined functions like 'last', 'max', 'min' or a custom lambda function.

CodePudding user response:

a b c
54545 3 8
54545 2 16
54545 1 64

The idea is to generate a Pivot out of it with B being the columns, column A is going to be the index and C is the value of the columns.

df = df_2.pivot(index='A', columns="B", values='C').reset_index()

Hope it is understandable what I want to do.

  • Related