My initial dataframe looks as follows:
name | id |
---|---|
test | 1 |
test | 2 |
test | 2 |
test | 3 |
test | 4 |
test | 5 |
test | 6 |
test | 7 |
test | 8 |
test | 9 |
test | 9 |
test | 10 |
test | 11 |
test | 12 |
test | 13 |
Now I want to create three subdataframes (or a list of dataframes) which contain the rows with the following ids:
df[0]: 1, 4, 7, 10, 13
df[1]: 2, 5, 8, 11
df[2]: 3, 6, 9, 12
So in fact df[1] results in the following:
name | id |
---|---|
test | 2 |
test | 2 |
test | 5 |
test | 8 |
test | 11 |
I tried it with a loop and "append" but I read it is quite slow if you have many rows. I am really not sure how to do it in a nice and effective way with pandas.
CodePudding user response:
df is your initial dataframe:
dfs = []
a = [[1, 4, 7, 10, 13],[2, 5, 8, 11],[3, 6, 9, 12]]
for i in a:
dfs.append(df.loc[df['id'].isin(i)])
print(dfs[0])
print(dfs[1])
print(dfs[2])
output:
name id
0 test 1
4 test 4
7 test 7
11 test 10
14 test 13
name id
1 test 2
2 test 2
5 test 5
8 test 8
12 test 11
name id
3 test 3
6 test 6
9 test 9
10 test 9
13 test 12
CodePudding user response:
Given the pattern, it seems that you want to group based on the result of df.id % 3
. In that case, here is a vectorized approach using DataFrame.groupby
groups = [g for k, g in df.groupby(df.id % 3, sort=False)]
Input
>>> df
name id
0 test 1
1 test 2
2 test 2
3 test 3
4 test 4
5 test 5
6 test 6
7 test 7
8 test 8
9 test 9
10 test 9
11 test 10
12 test 11
13 test 12
14 test 13
name id
0 test 1
4 test 4
7 test 7
11 test 10
14 test 13
Output
>>> groups[0]
name id
0 test 1
4 test 4
7 test 7
11 test 10
14 test 13
>>> groups[1]
name id
1 test 2
2 test 2
5 test 5
8 test 8
12 test 11
>>> groups[2]
name id
3 test 3
6 test 6
9 test 9
10 test 9
13 test 12