INPUT- This is an excel sheet
| DatasetName | Fields |
| ----------- | ------ |
| Table_1 | Column1 |
| Table_1 | Column2 |
| Table_1 | Column3 |
| Table_1 | Column4 |
| Table_2 | Column1 |
| Table_2 | Column2 |
| Table_2 | Column3 |
| Table_2 | Column4 |
| Table_3 | Column1 |
| Table_3 | Column2 |
| Table_3 | Column3 |
| Table_3 | Column4 |
OUTPUT-
{'Table_1' : ['Column1','Column2','Column3','Column4'] ,
'Table_2' : ['Column1','Column2','Column3','Column4'] ,
'Table_3' : ['Column1','Column2','Column3','Column4'] }
CodePudding user response:
Try:
import pandas as pd
df = pd.read_csv('my_file.csv')
# Get the unique dataset names
names = df['DatasetName'].unique()
result = {}
# Loop over the names, get the corresponding values, store them as a list
for name in names:
x = df.loc[df['DatasetName'] == name]
vals = x['Fields'].tolist()
result[name] = vals
print(result)
CodePudding user response:
Leverage on the powerful Pandas features on data grouping and aggregation:
You can do it easily with just one line with .groupby()
.agg()
to_dict()
, as follows:
import pandas as pd
# read the Excel into dataframe `df`
df = pd.read_excel('my_excel.xlsx')
# generate dict
dict_out = df.groupby('DatasetName')['Fields'].agg(list).to_dict()
Result:
print(dict_out)
{'Table_1 ': ['Column1', 'Column2', 'Column3', 'Column4'],
'Table_2': ['Column1', 'Column2', 'Column3', 'Column4'],
'Table_3': ['Column1', 'Column2', 'Column3', 'Column4']}