How to divide a column into groups and loop across the groups-CodePudding

I have a column with IDs and I divide the column into groups using pd.cut like this:

ID	Group
3645390	1
3678122	1
3615370	2
3371122	2
3645590	2
3778682	3
3125140	3
3578772	3

After that, how do I loop across the 'Group' column and pick out the IDs and assign them to an array.

array_1 = [3645390,3678122]
array_2 = [3615370,3371122,3645590]
.
so on..

CodePudding user response：

It is often a bad practice to generate variables, the ideal is to use a container (dictionaries are ideal).

You could use groupby and transform the output into dictionary of lists:

out = df.groupby('Group')['ID'].apply(list).to_dict()

Then access your lists by group key:

>>> out
{1: [3645390, 3678122],
 2: [3615370, 3371122, 3645590],
 3: [3778682, 3125140, 3578772]}

>>> out[1]  ## group #1
[3645390, 3678122]

If you really want array_x as keys:

(df.assign(Group='array_' df['Group'].astype(str))
   .groupby('Group')['ID'].apply(list).to_dict()
)

output:

{'array_1': [3645390, 3678122],
 'array_2': [3615370, 3371122, 3645590],
 'array_3': [3778682, 3125140, 3578772]}

CodePudding user response：

Rather than create a new list for every array, you can much more easily store them in a dictionary and access them using array[1], array[2] and so on. This is what an implementation would look like:

array = {}
for group in df.Group.drop_duplicates().tolist():
    array[group] = df[df.Group == group, 'ID']

CodePudding user response：

Or:

>>> {k: list(v) for k, v in df.groupby('Group')['ID']}
{1: [3645390, 3678122], 2: [3615370, 3371122, 3645590], 3: [3778682, 3125140, 3578772]}
>>>

But if you are fine with series, just use:

>>> dict(tuple(df.groupby('Group')['ID']))
{1: 0    3645390
1    3678122
Name: ID, dtype: int64, 2: 2    3615370
3    3371122
4    3645590
Name: ID, dtype: int64, 3: 5    3778682
6    3125140
7    3578772
Name: ID, dtype: int64}
>>>