I have a column with IDs and I divide the column into groups using pd.cut like this:
ID | Group |
---|---|
3645390 | 1 |
3678122 | 1 |
3615370 | 2 |
3371122 | 2 |
3645590 | 2 |
3778682 | 3 |
3125140 | 3 |
3578772 | 3 |
After that, how do I loop across the 'Group' column and pick out the IDs and assign them to an array.
array_1 = [3645390,3678122]
array_2 = [3615370,3371122,3645590]
.
so on..
CodePudding user response:
It is often a bad practice to generate variables, the ideal is to use a container (dictionaries are ideal).
You could use groupby
and transform the output into dictionary of lists:
out = df.groupby('Group')['ID'].apply(list).to_dict()
Then access your lists by group key:
>>> out
{1: [3645390, 3678122],
2: [3615370, 3371122, 3645590],
3: [3778682, 3125140, 3578772]}
>>> out[1] ## group #1
[3645390, 3678122]
If you really want array_x
as keys:
(df.assign(Group='array_' df['Group'].astype(str))
.groupby('Group')['ID'].apply(list).to_dict()
)
output:
{'array_1': [3645390, 3678122],
'array_2': [3615370, 3371122, 3645590],
'array_3': [3778682, 3125140, 3578772]}
CodePudding user response:
Rather than create a new list for every array, you can much more easily store them in a dictionary and access them using array[1]
, array[2]
and so on. This is what an implementation would look like:
array = {}
for group in df.Group.drop_duplicates().tolist():
array[group] = df[df.Group == group, 'ID']
CodePudding user response:
Or:
>>> {k: list(v) for k, v in df.groupby('Group')['ID']}
{1: [3645390, 3678122], 2: [3615370, 3371122, 3645590], 3: [3778682, 3125140, 3578772]}
>>>
But if you are fine with series, just use:
>>> dict(tuple(df.groupby('Group')['ID']))
{1: 0 3645390
1 3678122
Name: ID, dtype: int64, 2: 2 3615370
3 3371122
4 3645590
Name: ID, dtype: int64, 3: 5 3778682
6 3125140
7 3578772
Name: ID, dtype: int64}
>>>