I wish to convert a DataFrame into a dictionary where columns are the keys and (column-wise) rows are its values. I also need to use grouping when doing so.
team id name salary
0 Alpha 10 Jack 1000
1 Alpha 15 John 2000
2 Alpha 20 John 2000
3 Bravo 50 Thomas 5000
4 Bravo 55 Robert 6000
5 Bravo 60 Albert 7000
Expected output:
# expected output WITH duplicates
ex = {'Alpha': {'id': [10, 15, 20], 'name': ['Jack', 'John', 'John'], 'salary': [1000, 2000, 2000]},
'Bravo': {'id': [50, 55, 60], 'name': ['Thomas', 'Robert', 'Albert'], 'salary': [5000, 6000, 7000]}
}
# expected output WITHOUT duplicates ('name', 'salary')
ex = {'Alpha': {'id': [10, 15, 20], 'name': ['Jack', 'John'], 'salary': [1000, 2000]},
'Bravo': {'id': [50, 55, 60], 'name': ['Thomas', 'Robert', 'Albert'], 'salary': [5000, 6000, 7000]}
}
Can it be done somehow using df.to_dict()
?
Code for example:
import pandas as pd
d = {'team': ['Alpha', 'Alpha', 'Alpha', 'Bravo', 'Bravo', 'Bravo'],
'id': [10, 15, 20, 50, 55, 60],
'name': ['Jack', 'John', 'John', 'Thomas', 'Robert', 'Albert'],
'salary': [1000, 2000, 2000, 5000, 6000, 7000]}
df = pd.DataFrame(data=d)
CodePudding user response:
A Groupby
then to_dict
should do the trick:
out = df.groupby('team').agg(list).to_dict('index')
print(out)
Output:
{'Alpha': {'id': [10, 15, 20],
'name': ['Jack', 'John', 'John'],
'salary': [1000, 2000, 2000]},
'Bravo': {'id': [50, 55, 60],
'name': ['Thomas', 'Robert', 'Albert'],
'salary': [5000, 6000, 7000]}}
For unique lists:
out = df.groupby('team').agg(lambda x: x.unique().tolist()).to_dict('index')
print(out)
# Output:
{'Alpha': {'id': [10, 15, 20],
'name': ['Jack', 'John'],
'salary': [1000, 2000]},
'Bravo': {'id': [50, 55, 60],
'name': ['Thomas', 'Robert', 'Albert'],
'salary': [5000, 6000, 7000]}}
CodePudding user response:
expected output WITH duplicates
df.groupby('team').agg(list).T.to_dict()
output:
{'Alpha': {'id': [10, 15, 20],
'name': ['Jack', 'John', 'John'],
'salary': [1000, 2000, 2000]},
'Bravo': {'id': [50, 55, 60],
'name': ['Thomas', 'Robert', 'Albert'],
'salary': [5000, 6000, 7000]}}
expected output WITHOUT duplicates
df.groupby('team').agg(lambda x: list(set(x))).T.to_dict()
output:
{'Alpha': {'id': [10, 20, 15],
'name': ['Jack', 'John'],
'salary': [1000, 2000]},
'Bravo': {'id': [50, 60, 55],
'name': ['Thomas', 'Albert', 'Robert'],
'salary': [5000, 7000, 6000]}}