Home > Blockchain >  Extract dataframe rows and store them in new dataframes based on repeated column value
Extract dataframe rows and store them in new dataframes based on repeated column value

Time:11-04

Let's say I have this dataframe :

Action |   X1 |   X2 | X3
--------------------------
A0     | 0.1  |  0.4 | 0.7
A0     | 0.2  |  0.5 | 0.8
A0     | 0.3  |  0.6 | 0.9
A1     | 1.0  |  1.2 | 1.3
A1     | 1.1  |  1.5 | 1.6
A0     | 0.1  |  0.4 | 0.7
A0     | 0.2  |  0.5 | 0.8
A2     | 0.3  |  0.6 | 0.9
A2     | 0.1  |  0.4 | 0.7
A2     | 0.2  |  0.5 | 0.8
A2     | 0.3  |  0.6 | 0.9

I want to store rows with same Action column value on a seperate dataframe, even when columns values are repeated (meaning not store rows with same column value in the same dataframe). I want to store them in a sepearte one each time and then append them all to a list for example. So as a final result I would have a list of multiple dataframes.

In this example, I want to store rows from 0 -> 2, where Action column is equal to A0 in a new dataframe. Then store rows from 3 -> 4, where Action column is equal to A1 in a another new dataframe. Then again store rows from 5 -> 6, where Action column is equal to A0 in a another new dataframe seperately from the first dataframe. Then store all dataframes in a list.

Here's what I have tried :

grouped = df.groupby(df.Action)
A0 = grouped.get_group("A0")

But this stores all rows with same column value together.

In this example, I would have 4 seperate dataframes like this :

D1:

Action |   X1 |   X2 | X3
--------------------------
A0     | 0.1  |  0.4 | 0.7
A0     | 0.2  |  0.5 | 0.8
A0     | 0.3  |  0.6 | 0.9

D2:

Action |   X1 |   X2 | X3
--------------------------
A1     | 1.0  |  1.2 | 1.3
A1     | 1.1  |  1.5 | 1.6

D3:

Action |   X1 |   X2 | X3
--------------------------
A0     | 0.1  |  0.4 | 0.7
A0     | 0.2  |  0.5 | 0.8

D4:

Action |   X1 |   X2 | X3
--------------------------
A2     | 0.3  |  0.6 | 0.9
A2     | 0.1  |  0.4 | 0.7
A2     | 0.2  |  0.5 | 0.8
A2     | 0.3  |  0.6 | 0.9

And then store all 4 dataframes (D1, D2, D3 & D4) in a list for example.

CodePudding user response:

It would have been a one-line python code if you grouped all Actions with the same values (Made them into 3 DataFrames of A0, A1 and A2 instead of 4)

But in your case, this will solve your problem:

#create some data with Names column
data = pd.DataFrame({'Action': ['A0', 'A0', 'A0', 'A1', 'A1', 'A0', 'A0', 'A2', 'A2', 'A2', 'A2'], 'X1' : np.random.rand(11), 'X2' : np.random.rand(11), 'X3' : np.random.rand(11)})

dfs = []
temp_rows = []
current_action = data.iloc[0]['Action']
for i, row in data.iterrows():

  if current_action != row['Action']:
    dfs.append(pd.DataFrame(temp_rows))
    current_action = row['Action']  
    temp_rows = []
  
  temp_rows.append(row)

dfs.append(pd.DataFrame(temp_rows)) # Don't forget this one!

dfs is your list of DataFrames.

  • Related