count the number of unique column elements in python example-CodePudding

Imagine that this data frame is a small sample of a bigger data frame with 11 pianists, each producing an emotion of Angry, Happy, Relaxed, and Sad to a listener. Now I want to count for every pianist the number of emotions, since I want to later plot it, to see a pattern in the data.

I am struggling to get this done. I, somehow, managed it to a certain degree but, it is very bad code and very long if I have to do it for these 11 pianists.

Could somebody please help me out, in really automating it, more efficient and better code?

My Work:


d = {
    'pianist_id': 
        [1, 1, 1, 2, 2, 2, 3, 3, 4, 4], 
    'class':
        ['Angry', 'Sad', 'Sad', 'Angry', 'Angry', 'Angry', 'Relaxed', 'Happy', 'Happy', 'Happy']
}

df = pd.DataFrame(d)

count = 0

for i in range(df.shape[0]):
    if df['pianist_id'][i] == 1:
        count  = 1

df_split_1 = df.iloc[: count]

print(data_split_1['class'].value_counts())

pianist_1 = data_split_1['class'].value_counts().to_dict()

dict_pianist_1 = {}

dict_pianist_1['1'] = pianist_1

I want to have something like this for every 11 pianists.

{
    '1': {
        'Sad': 67,
        'Happy': 66,
        'Angry': 54, 
        'Relaxed': 50
    },
    '2':{
        'Angry',,,,,''
    }, 
    ,,,,,,
}

Thanks for the help!

CodePudding user response：

You can group by pianist_id column and then use value_counts to get each type count of class column. Finally use to_dict to convert them to dict.

d = df.groupby('pianist_id').apply(lambda group: group['class'].value_counts().to_dict()).to_dict()

print(d)

{1: {'Sad': 2, 'Angry': 1}, 2: {'Angry': 3}, 3: {'Relaxed': 1, 'Happy': 1}, 4: {'Happy': 2}}

CodePudding user response：

You can compute the size of each pair :

df.groupby(['pianist_id', 'class']).size()

Which gives the following output :

pianist_id  class
1           Angry      1
            Sad        2
2           Angry      3
3           Happy      1
            Relaxed    1
4           Happy      2
dtype: int64

To get the format you need, you have to unstack the index, allowing to fill the missing values at the same time, and then convert the final DataFrame to a dict :

df.groupby(['pianist_id', 'class']).size().unstack(fill_value=0).to_dict(orient='index')

Producing the output :

{1: {'Angry': 1, 'Happy': 0, 'Relaxed': 0, 'Sad': 2}, 2: {'Angry': 3, 'Happy': 0, 'Relaxed': 0, 'Sad': 0}, 3: {'Angry': 0, 'Happy': 1, 'Relaxed': 1, 'Sad': 0}, 4: {'Angry': 0, 'Happy': 2, 'Relaxed': 0, 'Sad': 0}}

CodePudding user response：

Since the end result specified in the question is a Python dict of dicts, you may prefer to use a more Python-centric than pandas-centric approach. Here's an answer that gives several alternatives for which pandas usage is limited to taking the original dataframe as input, calling its apply method and accessing its 'pianist_id' and 'class' columns:

result = {id : {} for id in df['pianist_id'].unique()}

def updateEmotionCount(id, emotion):
    result[id].update({emotion : result[id].get(emotion, 0)   1})

df.apply(lambda x: updateEmotionCount(x['pianist_id'], x['class']), axis = 1)
print(result)

... or, in two lines using just lambda:

result = {id : {} for id in df['pianist_id'].unique()}
df.apply(lambda x: result[x['pianist_id']].update({x['class'] : result[x['pianist_id']].get(x['class'], 0)   1}), axis = 1)

... or, using more lines but benefitting from the convenience of defaultdict:

import collections
result = {id : collections.defaultdict(int) for id in df['pianist_id'].unique()}
def updateEmotionCount(id, emotion):
    result[id][emotion]  = 1
df.apply(lambda x: updateEmotionCount(x['pianist_id'], x['class']), axis = 1)
result = {id : dict(result[id]) for id in result}

... or (finally) using the walrus operator := to eliminate the separate function and just use lambda (there is an argument that this approach is somewhat cryptic ... but the same could be said of pandas-centric solutions):

Using regular dict datatype:

result = {id : {} for id in df['pianist_id'].unique()}
df.apply(lambda x: (id := x['pianist_id'], emotion := x['class'], result[id].update({emotion : result[id].get(emotion, 0)   1})), axis = 1)

Using defaultdict:

import collections
result = {id : collections.defaultdict(int) for id in df['pianist_id'].unique()}
df.apply(lambda x: (id := x['pianist_id'], emotion := x['class'], result[id].update({emotion : result[id][emotion]   1})), axis = 1)
result = {id : dict(result[id]) for id in result}