Home > Mobile >  Converting Dataframe into dictionary of lists
Converting Dataframe into dictionary of lists

Time:08-25

I am working on a situation where I need to convert a dataframe into dictionary of lists. Example dataframe is below :

enter image description here

I want to convert above dataframe into dictionary of lists like below :

dict = {"abc":[sentence 1, sentence 2], "def":[sentence 3], "ghi":[sentence 4, sentence 5]}

Here is my solution :

dict = {}
for idx, row in test_df.iterrows():
    if not row["label"] in dict:
        dict[row["label"]] = []
    else:
        continue

for key in dict:
    dict[key] = list()
    for idx, row in test_df.iterrows():
        if key == row["label"]:
            dict[key].append(row["sentence"])
        else:
            continue

print(dict)

My solution works but it looks like a lot of code and there should be a easy way out. Any suggestions?

CodePudding user response:

data = pd.DataFrame([
    {"sentence": "sentence1", "label":"abc"},
    {"sentence": "sentence2", "label":"abc"},
    {"sentence": "sentence3", "label":"def"},
    {"sentence": "sentence4", "label":"ghi"},
    {"sentence": "sentence5", "label":"ghi"},
])
data
    sentence label
0  sentence1   abc
1  sentence2   abc
2  sentence3   def
3  sentence4   ghi
4  sentence5   ghi
data.groupby("label")["sentence"].apply(list).reindex().to_dict()
{'abc': ['sentence1', 'sentence2'],
 'def': ['sentence3'],
 'ghi': ['sentence4', 'sentence5']}

CodePudding user response:

You can use groupby, something like this:

import pandas as pd

df = pd.DataFrame(
    {
        'sentence': ['sentence1', 'sentence2', 'sentence3', 'sentence4', 'sentence5'],
        'label': ['abc', 'abc', 'def', 'ghi', 'ghi']
    }
)

df = df.groupby('label')['sentence'].apply(list)

print({k: v for k, v in df.items()})

Output:

{'abc': ['sentence1', 'sentence2'], 'def': ['sentence3'], 'ghi': ['sentence4', 'sentence5']}

CodePudding user response:

import pandas as pd

df = pd.DataFrame({'sentence':['10','20','30','40','50'], 'label' : ['abc', 'abc', 'def', 'ghi', 'ghi']})
d = {key: list(df.where(df.label == key).sentence.dropna().values) for key in set(df.label)}

Using a dict comprehension

  • Related