I am trying to expand the dataframe from one row to multiple rows based on two columns.
Following is the snapshot of the actual data.
initial_row_index Date Product_ID No_of_items
1 2021-07-11 ['A13N', 'A4BE','5GH$'] [3,5,1]
2 2021-07-12 ['A13N', 'X9HE','7H3T'] [7,2,4]
3 2021-07-13 ['A4BE', 'X9HE'] [8,4]
I want the transferred output to be as follows
initial_row_index Date Product_ID No_of_items
1 2021-07-11 A13N 3
1 2021-07-11 A4BE 5
1 2021-07-11 5GH$ 1
2 2021-07-12 A13N 7
2 2021-07-12 X9HE 2
2 2021-07-12 7H3T 4
3 2021-07-13 A4BE 8
3 2021-07-13 X9HE 4
To get this result I tried the following
df.apply(pd.Series.explode)
I am getting the following error.
ValueError: cannot reindex from a duplicate axis
I have already verified that I don't have any duplicate index in the dataframe. The length of the lists in both the column for each row have same no of elements. After looking into the error found few similar questions where the answers suggested to explode the df column by column as follows.
df.explode("Product_ID").reset_index().drop("index",1).explode("No_of_items").reset_index().drop("index",1)
But the above solution is giving me 9 rows instead of 3 rows after exploding the 1st row. How can I solve the value error or achieve the desired out put using any other method.
CodePudding user response:
Try this:
df.explode(['Product_ID', 'No_of_items'])
> initial_row_index Date Product_ID No_of_items
0 1 2021-07-11 A13N 3
0 1 2021-07-11 A4BE 5
0 1 2021-07-11 5GH$ 1
1 2 2021-07-12 A13N 7
1 2 2021-07-12 X9HE 2
1 2 2021-07-12 7H3T 4
2 3 2021-07-13 A4BE 8
2 3 2021-07-13 X9HE 4
CodePudding user response:
Depending on your pandas
version, you have to set your index first and then reset it again:
df.set_index(["initial_row_index", "date"]).apply(pd.Series.explode).reset_index()
results into
initial_row_index date Product_ID No_of_items
0 1 2021-07-11 A13N 3
1 1 2021-07-11 A4BE 5
2 1 2021-07-11 5GH$ 1
3 2 2021-07-12 A13N 7
4 2 2021-07-12 X9HE 2
5 2 2021-07-12 7H3T 4
6 3 2021-07-13 A4BE 8
7 3 2021-07-13 X9HE 4