Home > Enterprise >  convert pandas column with dictionary of dictionaries to separate columns
convert pandas column with dictionary of dictionaries to separate columns

Time:06-19

I have pandas df with one column having dictionary of dictionaries: this is what i have:

| ca| cb| 
|:---- |:------:
| 1  | cat1:{paws:1 , hair:2} ,dog1:{paws:3 , hair:5}  
| 2  | cat2:{paws:1 , hair:2} ,dog2:{paws:3 , hair:5}
| 3  | cat3:{paws:1 , hair:2} ,dog3:{paws:3 , hair:5}
| 4  | cat4:{paws:1 , hair:2} ,dog4:{paws:3 , hair:5}

What i want is:

| ca| animal| paws|  hair|
|:----:| -----:| -----:| -----:|
| 1  | cat1  | 1 | 2 
| 1  | dog1  | 3 | 5 
| 2  | cat2  | 1 | 2 
| 2  | dog2 | 3 | 5 

What should be fastest way to do this?

CodePudding user response:

I found a solution below:

I reproduced a demo version of your dataframe with the following dictionary for the following outcome:

data = {
    "ca": [1, 2],
    "cb": [{"cat1": {"paws": 1, "hair": 2}, "dog1":{"paws":3 , "hair":5}}, 
            {"cat2":{"paws":1 , "hair":2} , "dog2":{"paws":3 , "hair":5}}]
}

df = pandas.DataFrame(data)

df
ca  cb
1   {'cat1': {'paws': 1, 'hair': 2}, 'dog1': {'paw...
2   {'cat2': {'paws': 1, 'hair': 2}, 'dog2': {'paw...

Moving on, I had to eliminate the first step of the dict, ie, extracting the cats and the dogs alike.

first_level = pandas.concat([df.drop(['cb'], axis=1), df['cb'].apply(pandas.Series)], axis=1)
first_level

    ca  cat1    dog1    cat2    dog2
0   1   {'paws': 1, 'hair': 2}  {'paws': 3, 'hair': 5}  NaN NaN
1   2   NaN NaN {'paws': 1, 'hair': 2}  {'paws': 3, 'hair': 5}

The key takeaway here is that you need to apply the melt function to convert the columns to values and set them to respective rows.

first_level.melt(id_vars=["ca"]).dropna()
first_level

    ca  variable    value
0   1   cat1    {'paws': 1, 'hair': 2}
2   1   dog1    {'paws': 3, 'hair': 5}
5   2   cat2    {'paws': 1, 'hair': 2}
7   2   dog2    {'paws': 3, 'hair': 5}

And then the rest is quite simple, with the same apply function, I can convert this dictionary to columns too and the issue is resolved:

second_level = pandas.concat([first_level.drop(['value'], axis=1), first_level['value'].apply(pandas.Series)], axis=1)
second_level

    ca  variable    paws    hair
0   1   cat1    1   2
2   1   dog1    3   5
5   2   cat2    1   2
7   2   dog2    3   5


  • Related