Home > Mobile >  ValueError: cannot reindex from a duplicate axis in explode
ValueError: cannot reindex from a duplicate axis in explode

Time:07-13

I am trying to expand the dataframe from one row to multiple rows based on two columns.

Following is the snapshot of the actual data.

initial_row_index        Date        Product_ID               No_of_items
1                        2021-07-11  ['A13N', 'A4BE','5GH$']  [3,5,1]
2                        2021-07-12  ['A13N', 'X9HE','7H3T']  [7,2,4]
3                        2021-07-13  ['A4BE', 'X9HE']         [8,4]

I want the transferred output to be as follows

initial_row_index        Date        Product_ID       No_of_items
1                        2021-07-11  A13N             3
1                        2021-07-11  A4BE             5
1                        2021-07-11  5GH$             1
2                        2021-07-12  A13N             7
2                        2021-07-12  X9HE             2
2                        2021-07-12  7H3T             4
3                        2021-07-13  A4BE             8
3                        2021-07-13  X9HE             4

To get this result I tried the following

df.apply(pd.Series.explode)

I am getting the following error.

ValueError: cannot reindex from a duplicate axis

I have already verified that I don't have any duplicate index in the dataframe. The length of the lists in both the column for each row have same no of elements. After looking into the error found few similar questions where the answers suggested to explode the df column by column as follows.

df.explode("Product_ID").reset_index().drop("index",1).explode("No_of_items").reset_index().drop("index",1) 

But the above solution is giving me 9 rows instead of 3 rows after exploding the 1st row. How can I solve the value error or achieve the desired out put using any other method.

CodePudding user response:

Try this:

df.explode(['Product_ID', 'No_of_items'])
>   initial_row_index   Date    Product_ID  No_of_items
0   1   2021-07-11  A13N    3
0   1   2021-07-11  A4BE    5
0   1   2021-07-11  5GH$    1
1   2   2021-07-12  A13N    7
1   2   2021-07-12  X9HE    2
1   2   2021-07-12  7H3T    4
2   3   2021-07-13  A4BE    8
2   3   2021-07-13  X9HE    4

CodePudding user response:

Depending on your pandas version, you have to set your index first and then reset it again:

df.set_index(["initial_row_index", "date"]).apply(pd.Series.explode).reset_index()

results into

    initial_row_index   date        Product_ID  No_of_items
0   1                   2021-07-11  A13N        3
1   1                   2021-07-11  A4BE        5
2   1                   2021-07-11  5GH$        1
3   2                   2021-07-12  A13N        7
4   2                   2021-07-12  X9HE        2
5   2                   2021-07-12  7H3T        4
6   3                   2021-07-13  A4BE        8
7   3                   2021-07-13  X9HE        4
  • Related