I have the following code.
I want to go through the 'outlierdataframe' dataframe row by row and explode the values in the 'x' and 'y' columns.
For each exploded row, I then want to store this exploded row as its own dataframe, with columns 'newID', 'x' and 'y'.
However, the following code prints everything in one column rather than printing the exploded 'x' values in one column, the exploded 'y' values in another column?
I would be so grateful for a helping hand!
individualframe = outlierdataframe.iloc[0]
individualoutliers = individualframe.explode(list('xy'))
newframe = pd.DataFrame(individualoutliers)
print(newframe)
outlier dataframe first line:
indexing first line of outlier dataframe:
outlierdataframe.iloc[0]
index 24
subID Prolific_610020
level 1
complete False
duration 20.015686
map_view 12.299759
distance 203.426697
x [55, 55, 55, 60, 60, 60, 65, 70, 70, 75, 80, 8...
y [60, 60, 60, 60, 65, 65, 70, 70, 75, 75, 80, 8...
r [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 1...
batch 1
newID 610020
Name: 24, dtype: object
newframe = pd.DataFrame(individualoutliers)
print(newframe)
24
0 24
1 Prolific_610020
2 1
3 False
4 20.015686
.. ...
121 55
122 55
123 55
124 1
125 610020
CodePudding user response:
You can use pandas.DataFrame.apply
with pandas.Series.explode
to explode your selected (list
) columns (e.g, x
and y
).
Try this :
out = (
df
.loc[:, ["newID", "x", "y"]]
.apply(lambda x: pd.Series(x).explode())
)
# Output :
print(out)
newID x y
0 610020 100 60
0 610020 55 60
0 610020 55 60
0 610020 60 60
0 610020 60 65
0 610020 60 65
0 610020 65 70
0 610020 70 70
0 610020 70 75
0 610020 75 75
0 610020 80 80
If you need to assign a single dataframe (with a patter name, df_newID
) for each group, use this:
for k, g in out.groupby("newID"):
globals()['df_' str(k)] = g
print(df_610020, type(df_610020))
newID x y
0 610020 100 60
0 610020 55 60
0 610020 55 60
0 610020 60 60
0 610020 60 65
0 610020 60 65
0 610020 65 70
0 610020 70 70
0 610020 70 75
0 610020 75 75
0 610020 80 80 <class 'pandas.core.frame.DataFrame'>
CodePudding user response:
The following solution works:
individualframe = outlierdataframe.iloc[0]
individualoutliers1 = individualframe[['x']].explode('x')
individualoutliers2 = individualframe[['y']].explode('y')
newIDs = individualframe[['newID']][0]
individualoutliers1 = pd.DataFrame(individualoutliers1)
individualoutliers2 = pd.DataFrame(individualoutliers2)
data = [individualoutliers1,individualoutliers2]
newframe = pd.concat(data,axis=1)
newframe = newframe.rename(columns={newframe.columns.values[0]:'x',newframe.columns.values[1]:'y'})
newframe['newID'] = newIDs
print(newframe)
Output exceeds the size limit. Open the full output data in a text editor
y y newID
0 55 60 610020
1 55 60 610020
2 55 60 610020
3 60 60 610020
4 60 65 610020
5 60 65 610020
6 65 70 610020