Home > other >  How to create rows from pandas dataframe column value
How to create rows from pandas dataframe column value

Time:09-25

I want to create Dataframe rows using the value in the Dataframe column(Race, TGR1). I still have additional columns aside from Race, TGR1 but the number of column values are the same. I can't think of the best possible way to achieve this.

Any help would be greatly appreciated.

Track              Date               Race                             TGR1
0   Addington       24/09/2021  R1,R2,R3,R4,R5,R6,R7,R8,R9,R0,R1,R2    5,8,2,5,6,1,6,3,1,2,1,2
1   Mount Gambier   26/09/2021  R1,R2,R3,R4,R5,R6,R7,R8,R9,R0          8,1,4,8,8,1,2,1,2,2

Expected output

Track           Date                  Race             TGR1
Addington       24/09/2021                R1                 5
Addington       24/09/2021                R2                 8
Addington       24/09/2021                R3                 2
Addington       24/09/2021                R4                 5
Addington       24/09/2021                R5                 6
Addington       24/09/2021                R6                 1
Addington       24/09/2021                R7                 6
Addington       24/09/2021                R8                 3
Addington       24/09/2021                R9                 1
Addington       24/09/2021                R0                 2
Addington       24/09/2021                R1                 1
Addington       24/09/2021                R2                 2

Mount Gambier   26/09/2021                R1                 8
Mount Gambier   26/09/2021                R2                 1
Mount Gambier   26/09/2021                R3                 4
Mount Gambier   26/09/2021                R4                 8
Mount Gambier   26/09/2021                R5                 8
Mount Gambier   26/09/2021                R6                 1
Mount Gambier   26/09/2021                R7                 2
Mount Gambier   26/09/2021                R8                 1
Mount Gambier   26/09/2021                R9                 2
Mount Gambier   26/09/2021                R10                2

CodePudding user response:

You can use apply pd.Series.explode. You first need to set aside the columns not to be exploded using set_index, then bring them back as columns with reset_index.

(df.assign(Race=df['Race'].str.split(','),
           TGR1=df['TGR1'].str.split(','))
   .set_index(['Track', 'Date'])
   .apply(pd.Series.explode)
   .reset_index()
)

output:

            Track        Date Race TGR1
0       Addington  24/09/2021   R1    5
1       Addington  24/09/2021   R2    8
2       Addington  24/09/2021   R3    2
3       Addington  24/09/2021   R4    5
4       Addington  24/09/2021   R5    6
5       Addington  24/09/2021   R6    1
6       Addington  24/09/2021   R7    6
7       Addington  24/09/2021   R8    3
8       Addington  24/09/2021   R9    1
9       Addington  24/09/2021   R0    2
10      Addington  24/09/2021   R1    1
11      Addington  24/09/2021   R2    2
12  Mount Gambier  26/09/2021   R1    8
13  Mount Gambier  26/09/2021   R2    1
14  Mount Gambier  26/09/2021   R3    4
15  Mount Gambier  26/09/2021   R4    8
16  Mount Gambier  26/09/2021   R5    8
17  Mount Gambier  26/09/2021   R6    1
18  Mount Gambier  26/09/2021   R7    2
19  Mount Gambier  26/09/2021   R8    1
20  Mount Gambier  26/09/2021   R9    2
21  Mount Gambier  26/09/2021   R0    2
  • Related