This is my df:
import pandas as pd
df = pd.DataFrame(
{
'uri': ['a', 'a', 'a', 'b','b', 'c', 'c', 'c'],
'predicate': ['same', 'wiki' ,'wiki', 'same', 'same', 'same', 'same', 'wiki'],
'object': ['x', 'y' ,'s', 'h', 'k', 'o', 'm', 'n'],
}
)
I want to generate a new dataframe like the one below:
uri result
a {'same':['x'], 'wiki':['y', 's']}
b {'same':['h', 'k']}
c {'same':['o', 'm'], 'wiki':['n']}
I tried this code but I don't know how to generate a dataframe out of it.
df.groupby(['uri', 'predicate'])['object'].apply(list).to_dict()
CodePudding user response:
You can avoid double groupby by create list of tuples from MultiIndex Series
:
s = df.groupby(['uri', 'predicate'])['object'].apply(list)
d = pd.DataFrame([(level, s.xs(level).to_dict()) for level in s.index.levels[0]],
columns=['uri','result'])
print (d)
uri result
0 a {'same': ['x'], 'wiki': ['y', 's']}
1 b {'same': ['h', 'k']}
2 c {'same': ['o', 'm'], 'wiki': ['n']}
CodePudding user response:
One approach using a double groupby
:
res = df.groupby(["uri"]).apply(lambda x: x.groupby("predicate")["object"].apply(list).to_dict())
print(res)
Output
uri
a {'same': ['x'], 'wiki': ['y', 's']}
b {'same': ['h', 'k']}
c {'same': ['o', 'm'], 'wiki': ['n']}
dtype: object