How do I create a nested dictionary to pair a dataframe's column's categories with its cor-CodePudding

I have a dataset comprising of categorial data as the one shown below:

How do I create a nested dictionary using this dataframe in such a way that the "key" will be the column, and the nested "key":"value" will be the "category":"number of times said categories occur"?

CodePudding user response：

You can use collections.Counter to count the number of occurrences of each category. When fed an iterable (such as a DataFrame column), this will return a dict-like object of the type "category": count, like your inner dict.

To get this for each one of the columns, you could iterate over the columns, like so:

from collections import Counter
all_counts = {}
for column in df.columns:
  all_counts[column] = Counter(df[column])

CodePudding user response：

Try as follows:

import pandas as pd

# sample
data = {'gender': ['Male','Female','Male'],
        'heart_disease':[0,1,1]}

df = pd.DataFrame(data)

a_dict = {}
for x in df.columns:
    a_dict[x] = df[x].value_counts().to_dict()
    
print(a_dict)

{'gender': {'Male': 2, 'Female': 1}, 'heart_disease': {1: 2, 0: 1}}