Home > Software design >  Exploding a data frame row by row and storing the exploded values in a new dataframe
Exploding a data frame row by row and storing the exploded values in a new dataframe

Time:12-03

I have the following code.

I want to go through the 'outlierdataframe' dataframe row by row and explode the values in the 'x' and 'y' columns.

For each exploded row, I then want to store this exploded row as its own dataframe, with columns 'newID', 'x' and 'y'.

However, the following code prints everything in one column rather than printing the exploded 'x' values in one column, the exploded 'y' values in another column?

I would be so grateful for a helping hand!

individualframe = outlierdataframe.iloc[0]
individualoutliers = individualframe.explode(list('xy'))
newframe = pd.DataFrame(individualoutliers)
print(newframe)

outlier dataframe first line:

enter image description here

indexing first line of outlier dataframe:

outlierdataframe.iloc[0]

index                                                      24
subID                                         Prolific_610020
level                                                       1
complete                                                False
duration                                            20.015686
map_view                                            12.299759
distance                                           203.426697
x           [55, 55, 55, 60, 60, 60, 65, 70, 70, 75, 80, 8...
y           [60, 60, 60, 60, 65, 65, 70, 70, 75, 75, 80, 8...
r           [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 1...
batch                                                       1
newID                                                  610020
Name: 24, dtype: object

newframe = pd.DataFrame(individualoutliers)
print(newframe)

                24
0                 24
1    Prolific_610020
2                  1
3              False
4          20.015686
..               ...
121               55
122               55
123               55
124                1
125           610020

CodePudding user response:

You can use pandas.DataFrame.apply with pandas.Series.explode to explode your selected (list) columns (e.g, x and y).

Try this :

out = (
        df
          .loc[:, ["newID", "x", "y"]]
          .apply(lambda x: pd.Series(x).explode())
      )

# Output :

print(out)

    newID    x    y
0  610020  100   60
0  610020   55   60
0  610020   55   60
0  610020   60   60
0  610020   60   65
0  610020   60   65
0  610020   65   70
0  610020   70   70
0  610020   70   75
0  610020   75   75
0  610020   80   80

If you need to assign a single dataframe (with a patter name, df_newID) for each group, use this:

for k, g in out.groupby("newID"):
    globals()['df_'   str(k)] = g
    
print(df_610020, type(df_610020))

    newID    x    y
0  610020  100   60
0  610020   55   60
0  610020   55   60
0  610020   60   60
0  610020   60   65
0  610020   60   65
0  610020   65   70
0  610020   70   70
0  610020   70   75
0  610020   75   75
0  610020   80   80 <class 'pandas.core.frame.DataFrame'>

CodePudding user response:

The following solution works:

individualframe = outlierdataframe.iloc[0]
individualoutliers1 = individualframe[['x']].explode('x')
individualoutliers2 = individualframe[['y']].explode('y')
newIDs = individualframe[['newID']][0]
individualoutliers1 = pd.DataFrame(individualoutliers1)
individualoutliers2 = pd.DataFrame(individualoutliers2)
data = [individualoutliers1,individualoutliers2]
newframe = pd.concat(data,axis=1)
newframe = newframe.rename(columns={newframe.columns.values[0]:'x',newframe.columns.values[1]:'y'})
newframe['newID'] = newIDs 
print(newframe)


Output exceeds the size limit. Open the full output data in a text editor
      y    y   newID
0    55   60  610020
1    55   60  610020
2    55   60  610020
3    60   60  610020
4    60   65  610020
5    60   65  610020
6    65   70  610020
  • Related