I am working on a situation where I need to convert a dataframe into dictionary of lists. Example dataframe is below :
I want to convert above dataframe into dictionary of lists like below :
dict = {"abc":[sentence 1, sentence 2], "def":[sentence 3], "ghi":[sentence 4, sentence 5]}
Here is my solution :
dict = {}
for idx, row in test_df.iterrows():
if not row["label"] in dict:
dict[row["label"]] = []
else:
continue
for key in dict:
dict[key] = list()
for idx, row in test_df.iterrows():
if key == row["label"]:
dict[key].append(row["sentence"])
else:
continue
print(dict)
My solution works but it looks like a lot of code and there should be a easy way out. Any suggestions?
CodePudding user response:
data = pd.DataFrame([
{"sentence": "sentence1", "label":"abc"},
{"sentence": "sentence2", "label":"abc"},
{"sentence": "sentence3", "label":"def"},
{"sentence": "sentence4", "label":"ghi"},
{"sentence": "sentence5", "label":"ghi"},
])
data
sentence label
0 sentence1 abc
1 sentence2 abc
2 sentence3 def
3 sentence4 ghi
4 sentence5 ghi
data.groupby("label")["sentence"].apply(list).reindex().to_dict()
{'abc': ['sentence1', 'sentence2'],
'def': ['sentence3'],
'ghi': ['sentence4', 'sentence5']}
CodePudding user response:
You can use groupby
, something like this:
import pandas as pd
df = pd.DataFrame(
{
'sentence': ['sentence1', 'sentence2', 'sentence3', 'sentence4', 'sentence5'],
'label': ['abc', 'abc', 'def', 'ghi', 'ghi']
}
)
df = df.groupby('label')['sentence'].apply(list)
print({k: v for k, v in df.items()})
Output:
{'abc': ['sentence1', 'sentence2'], 'def': ['sentence3'], 'ghi': ['sentence4', 'sentence5']}
CodePudding user response:
import pandas as pd
df = pd.DataFrame({'sentence':['10','20','30','40','50'], 'label' : ['abc', 'abc', 'def', 'ghi', 'ghi']})
d = {key: list(df.where(df.label == key).sentence.dropna().values) for key in set(df.label)}
Using a dict comprehension