Home > Software design >  generating a dictionary from two columns of pandas dataframe and putting them into a new dataframe
generating a dictionary from two columns of pandas dataframe and putting them into a new dataframe

Time:07-18

This is my df:

import pandas as pd


df = pd.DataFrame(
    {
        'uri': ['a', 'a', 'a', 'b','b', 'c', 'c', 'c'],
        'predicate': ['same', 'wiki' ,'wiki', 'same', 'same', 'same', 'same', 'wiki'],
        'object': ['x', 'y' ,'s', 'h', 'k', 'o', 'm', 'n'],
    }
)

I want to generate a new dataframe like the one below:

uri    result
a      {'same':['x'], 'wiki':['y', 's']}
b      {'same':['h', 'k']}
c      {'same':['o', 'm'], 'wiki':['n']}

I tried this code but I don't know how to generate a dataframe out of it.

df.groupby(['uri', 'predicate'])['object'].apply(list).to_dict()

CodePudding user response:

You can avoid double groupby by create list of tuples from MultiIndex Series:

s = df.groupby(['uri', 'predicate'])['object'].apply(list)

d = pd.DataFrame([(level, s.xs(level).to_dict()) for level in s.index.levels[0]],
                 columns=['uri','result'])
print (d)
  uri                               result
0   a  {'same': ['x'], 'wiki': ['y', 's']}
1   b                 {'same': ['h', 'k']}
2   c  {'same': ['o', 'm'], 'wiki': ['n']}
  

CodePudding user response:

One approach using a double groupby:

res = df.groupby(["uri"]).apply(lambda x: x.groupby("predicate")["object"].apply(list).to_dict())
print(res)

Output

uri
a    {'same': ['x'], 'wiki': ['y', 's']}
b                   {'same': ['h', 'k']}
c    {'same': ['o', 'm'], 'wiki': ['n']}
dtype: object
  • Related