Imagine that this data frame is a small sample of a bigger data frame with 11 pianists, each producing an emotion of Angry
, Happy
, Relaxed
, and Sad
to a listener. Now I want to count for every pianist the number of emotions, since I want to later plot it, to see a pattern in the data.
I am struggling to get this done. I, somehow, managed it to a certain degree but, it is very bad code and very long if I have to do it for these 11 pianists.
Could somebody please help me out, in really automating it, more efficient and better code?
My Work:
d = {
'pianist_id':
[1, 1, 1, 2, 2, 2, 3, 3, 4, 4],
'class':
['Angry', 'Sad', 'Sad', 'Angry', 'Angry', 'Angry', 'Relaxed', 'Happy', 'Happy', 'Happy']
}
df = pd.DataFrame(d)
count = 0
for i in range(df.shape[0]):
if df['pianist_id'][i] == 1:
count = 1
df_split_1 = df.iloc[: count]
print(data_split_1['class'].value_counts())
pianist_1 = data_split_1['class'].value_counts().to_dict()
dict_pianist_1 = {}
dict_pianist_1['1'] = pianist_1
I want to have something like this for every 11 pianists.
{
'1': {
'Sad': 67,
'Happy': 66,
'Angry': 54,
'Relaxed': 50
},
'2':{
'Angry',,,,,''
},
,,,,,,
}
Thanks for the help!
CodePudding user response:
You can group by pianist_id
column and then use value_counts
to get each type count of class
column. Finally use to_dict
to convert them to dict.
d = df.groupby('pianist_id').apply(lambda group: group['class'].value_counts().to_dict()).to_dict()
print(d)
{1: {'Sad': 2, 'Angry': 1}, 2: {'Angry': 3}, 3: {'Relaxed': 1, 'Happy': 1}, 4: {'Happy': 2}}
CodePudding user response:
You can compute the size of each pair :
df.groupby(['pianist_id', 'class']).size()
Which gives the following output :
pianist_id class
1 Angry 1
Sad 2
2 Angry 3
3 Happy 1
Relaxed 1
4 Happy 2
dtype: int64
To get the format you need, you have to unstack the index, allowing to fill the missing values at the same time, and then convert the final DataFrame
to a dict
:
df.groupby(['pianist_id', 'class']).size().unstack(fill_value=0).to_dict(orient='index')
Producing the output :
{1: {'Angry': 1, 'Happy': 0, 'Relaxed': 0, 'Sad': 2}, 2: {'Angry': 3, 'Happy': 0, 'Relaxed': 0, 'Sad': 0}, 3: {'Angry': 0, 'Happy': 1, 'Relaxed': 1, 'Sad': 0}, 4: {'Angry': 0, 'Happy': 2, 'Relaxed': 0, 'Sad': 0}}
CodePudding user response:
Since the end result specified in the question is a Python dict of dicts, you may prefer to use a more Python-centric than pandas-centric approach. Here's an answer that gives several alternatives for which pandas usage is limited to taking the original dataframe as input, calling its apply
method and accessing its 'pianist_id' and 'class' columns:
result = {id : {} for id in df['pianist_id'].unique()}
def updateEmotionCount(id, emotion):
result[id].update({emotion : result[id].get(emotion, 0) 1})
df.apply(lambda x: updateEmotionCount(x['pianist_id'], x['class']), axis = 1)
print(result)
... or, in two lines using just lambda
:
result = {id : {} for id in df['pianist_id'].unique()}
df.apply(lambda x: result[x['pianist_id']].update({x['class'] : result[x['pianist_id']].get(x['class'], 0) 1}), axis = 1)
... or, using more lines but benefitting from the convenience of defaultdict
:
import collections
result = {id : collections.defaultdict(int) for id in df['pianist_id'].unique()}
def updateEmotionCount(id, emotion):
result[id][emotion] = 1
df.apply(lambda x: updateEmotionCount(x['pianist_id'], x['class']), axis = 1)
result = {id : dict(result[id]) for id in result}
... or (finally) using the walrus operator :=
to eliminate the separate function and just use lambda
(there is an argument that this approach is somewhat cryptic ... but the same could be said of pandas-centric solutions):
Using regular dict
datatype:
result = {id : {} for id in df['pianist_id'].unique()}
df.apply(lambda x: (id := x['pianist_id'], emotion := x['class'], result[id].update({emotion : result[id].get(emotion, 0) 1})), axis = 1)
Using defaultdict
:
import collections
result = {id : collections.defaultdict(int) for id in df['pianist_id'].unique()}
df.apply(lambda x: (id := x['pianist_id'], emotion := x['class'], result[id].update({emotion : result[id][emotion] 1})), axis = 1)
result = {id : dict(result[id]) for id in result}