I want to split a dataframe into 3 new dataframes according to a priority column. My dataframe is as follows:
City Priority
0 New York 3
1 Paris 1
2 Boston 7
3 La Habana 6
4 Bilbao 10
5 Roma 2
6 Barcelona 1
7 Bruselas 8
8 Tokyo 7
9 Caracas 11
There are 3 types of priorities:
- Priority 7 to 9
- Priority 1 to 6
- Priority from 10 to 11
The idea is to divide this dataframe in 3 with the following structure and that in turn would be ordered by the value of its priority:
- Dataframe with 3 rows of priority from 7 to 9
- Dataframe with 5 rows of priority from 1 to 6
- Dataframe with 2 rows of the priority from 10 to 11.
The result would be as follows:
Dataframe 1:
City Priority
0 Boston 7
1 Tokyo 7
2 Bruselas 8
Dataframe 2:
City Priority
0 Paris 1
1 Barcelona 1
2 Roma 2
3 New York 3
4 La Habana 6
Dataframe 3:
City Priority
0 Bilbao 10
1 Caracas 11
I think it is important to note that if there were no rows of priority 7 to 9, the priority numbers that would be chosen for that dataframe of 3 would be 10, if not 11, if not 1, if not 2, etc. The same with the rest of the dataframes and priorities: 1, 2, 3, 4, etc for the second one and 10, 11, 1, 2, 3, etc for the third one.
Also, if there were 4 values such as 7, 7, 7, 8, only rows 7, 7, 7 would appear in the 3-row Dataframe and the row with value 8 would be in Dataframe 2.
Likewise, I think it is also important to say that in that iteration, when the first dataframe of 3 rows is generated, they should be "extracted" from the original dataframe so that they are not taken into account when generating the other dataframes. I hope I have explained myself well and that someone can help me. Best regards and thanks!
CodePudding user response:
IIUC this should work as expected:
(1) you create a column bin_Priority
which applies each row to the right bin, the labels of the bins are the priority in which order to look for them.
(2) sort_values
on bin_Priority
, then in each bin on Priority
.
(3) split
your df
into 3 df's
, the 1st with 3 rows, the 2nd with 2 rows and the 3rd with 5 rows.
If values of Priority groups are missing it chooses the right values because they are ordered right.
Please let me know if that is what you are searching for.
df = pd.DataFrame({
'City': ['New York','Paris','Boston','La Habana','Bilbao','Roma','Barcelona','Bruselas','Tokyo','Caracas'],
'Priority': [3, 1, 7, 6, 10, 2, 1, 8, 7, 11]
})
#(1)
df['bin_Priority'] = pd.cut(df['Priority'], bins=[0,6,9,11], labels=[3, 1, 2]).to_numpy()
#(2)
ordered_priority_df = df.sort_values(by=['bin_Priority', 'Priority'])
#(3)
out = np.split(ordered_priority_df, [3,5])
print(df, ordered_priority_df, *out, sep='\n\n')
#df
City Priority bin_Priority
0 New York 3 3
1 Paris 1 3
2 Boston 7 1
3 La Habana 6 3
4 Bilbao 10 2
5 Roma 2 3
6 Barcelona 1 3
7 Bruselas 8 1
8 Tokyo 7 1
9 Caracas 11 2
#ordered_priority_df
City Priority bin_Priority
2 Boston 7 1
8 Tokyo 7 1
7 Bruselas 8 1
4 Bilbao 10 2
9 Caracas 11 2
1 Paris 1 3
6 Barcelona 1 3
5 Roma 2 3
0 New York 3 3
3 La Habana 6 3
# out[0]
City Priority bin_Priority
2 Boston 7 1
8 Tokyo 7 1
7 Bruselas 8 1
# out[1]
City Priority bin_Priority
4 Bilbao 10 2
9 Caracas 11 2
# out[2]
City Priority bin_Priority
1 Paris 1 3
6 Barcelona 1 3
5 Roma 2 3
0 New York 3 3
3 La Habana 6 3
Here is an example where I changed the value of Paris from 1
to 7
. value 8
(which should be in the 1st df
) ends in the 2nd df
and same with value 11
(from 2nd to 3rd).
df = pd.DataFrame({
'City': ['New York','Paris','Boston','La Habana','Bilbao','Roma','Barcelona','Bruselas','Tokyo','Caracas'],
'Priority': [3, 7, 7, 6, 10, 2, 1, 8, 7, 11]
})
df['bin_Priority'] = pd.cut(df['Priority'], bins=[0,6,9,11], labels=[3, 1, 2]).to_numpy()
ordered_priority_df = df.sort_values(by=['bin_Priority', 'Priority'])
out = np.split(ordered_priority_df, [3,5])
print(df, *out, sep='\n\n')
City Priority bin_Priority
0 New York 3 3
1 Paris 7 1
2 Boston 7 1
3 La Habana 6 3
4 Bilbao 10 2
5 Roma 2 3
6 Barcelona 1 3
7 Bruselas 8 1
8 Tokyo 7 1
9 Caracas 11 2
City Priority bin_Priority
1 Paris 7 1
2 Boston 7 1
8 Tokyo 7 1
City Priority bin_Priority
7 Bruselas 8 1
4 Bilbao 10 2
City Priority bin_Priority
9 Caracas 11 2
6 Barcelona 1 3
5 Roma 2 3
0 New York 3 3
3 La Habana 6 3