Let's say I have this dataframe :
Action | X1 | X2 | X3
--------------------------
A0 | 0.1 | 0.4 | 0.7
A0 | 0.2 | 0.5 | 0.8
A0 | 0.3 | 0.6 | 0.9
A1 | 1.0 | 1.2 | 1.3
A1 | 1.1 | 1.5 | 1.6
A0 | 0.1 | 0.4 | 0.7
A0 | 0.2 | 0.5 | 0.8
A2 | 0.3 | 0.6 | 0.9
A2 | 0.1 | 0.4 | 0.7
A2 | 0.2 | 0.5 | 0.8
A2 | 0.3 | 0.6 | 0.9
I want to store rows with same Action column value on a seperate dataframe, even when columns values are repeated (meaning not store rows with same column value in the same dataframe). I want to store them in a sepearte one each time and then append them all to a list for example. So as a final result I would have a list of multiple dataframes.
In this example, I want to store rows from 0 -> 2, where Action column is equal to A0 in a new dataframe. Then store rows from 3 -> 4, where Action column is equal to A1 in a another new dataframe. Then again store rows from 5 -> 6, where Action column is equal to A0 in a another new dataframe seperately from the first dataframe. Then store all dataframes in a list.
Here's what I have tried :
grouped = df.groupby(df.Action)
A0 = grouped.get_group("A0")
But this stores all rows with same column value together.
In this example, I would have 4 seperate dataframes like this :
D1:
Action | X1 | X2 | X3
--------------------------
A0 | 0.1 | 0.4 | 0.7
A0 | 0.2 | 0.5 | 0.8
A0 | 0.3 | 0.6 | 0.9
D2:
Action | X1 | X2 | X3
--------------------------
A1 | 1.0 | 1.2 | 1.3
A1 | 1.1 | 1.5 | 1.6
D3:
Action | X1 | X2 | X3
--------------------------
A0 | 0.1 | 0.4 | 0.7
A0 | 0.2 | 0.5 | 0.8
D4:
Action | X1 | X2 | X3
--------------------------
A2 | 0.3 | 0.6 | 0.9
A2 | 0.1 | 0.4 | 0.7
A2 | 0.2 | 0.5 | 0.8
A2 | 0.3 | 0.6 | 0.9
And then store all 4 dataframes (D1, D2, D3 & D4) in a list for example.
CodePudding user response:
It would have been a one-line python code if you grouped all Actions with the same values (Made them into 3 DataFrames of A0, A1 and A2 instead of 4)
But in your case, this will solve your problem:
#create some data with Names column
data = pd.DataFrame({'Action': ['A0', 'A0', 'A0', 'A1', 'A1', 'A0', 'A0', 'A2', 'A2', 'A2', 'A2'], 'X1' : np.random.rand(11), 'X2' : np.random.rand(11), 'X3' : np.random.rand(11)})
dfs = []
temp_rows = []
current_action = data.iloc[0]['Action']
for i, row in data.iterrows():
if current_action != row['Action']:
dfs.append(pd.DataFrame(temp_rows))
current_action = row['Action']
temp_rows = []
temp_rows.append(row)
dfs.append(pd.DataFrame(temp_rows)) # Don't forget this one!
dfs is your list of DataFrames.