How to get the list of values of different classes for a feature?-CodePudding

I have a dataframe. I want to get the list of all values of different classes.

    df = pd.DataFrame([(3, 1),
                       (4, 3),
                       (6, 2),
                       (7, 2),
                       (2, 3),
                       (4, 2),
                       (4, 1),
                       (1, 3),
                       (6, 3),
                       (8, 1)],
                      columns=['Feature', 'Class'])

In the above example, I have three classes, namely 1, 2, and 3. I would like to get the output of all different lists of values in a class. The output can be following:

Class 1: [3, 4, 8]
Class 2: [6, 7, 4]
Class 3: [4, 2, 1, 6]

CodePudding user response：

You can do it simply:

classes = df.groupby('Class')['Feature'].apply(list)

Output:

>>> classes
Class
1       [3, 4, 8]
2       [6, 7, 4]
3    [4, 2, 1, 6]
Name: Feature, dtype: object

If you want to get all unique values, try this:

unique = df.groupby('Class')['Feature'].unique()

CodePudding user response：

As pointed out in this great answer, you can use the df.groupby() method along with the df.apply() method to achieve this:

import pandas as pd

df = pd.DataFrame([(3, 1),
                   (4, 3),
                   (6, 2),
                   (7, 2),
                   (2, 3),
                   (4, 2),
                   (4, 1),
                   (1, 3),
                   (6, 3),
                   (8, 1)],
                  columns=['Feature', 'Class'])

print(df.groupby('Class')['Feature'].apply(list))

Output:

Class
1       [3, 4, 8]
2       [6, 7, 4]
3    [4, 2, 1, 6]
Name: Feature, dtype: object

But if you want to loop through the class numbers one by one, a more intuitive way would be to do:

print(df.loc[df['Class'] == 1])

Output:

   Feature  Class
0        3      1
6        4      1
9        8      1

Or include the "Feature" column to get:

print(df.loc[df['Class'] == 1]["Feature"])

Output:

0    3
6    4
9    8
Name: Feature, dtype: int64