I have a very large dataframe containing a column called 'time_words'. Each cell of the column contains a list of dictionaries, for example:
time_columns |
---|
{'Yesterday': {'text': 'Yesterday', 'type': 'DATE', 'value': '2022-04-15'}} |
{'Yesterday': {'text': 'Yesterday', 'type': 'DATE', 'value': '2022-04-16'}, 'Thursday': {'text': 'Thursday', 'type': 'DATE', 'value': '2022-04-14'}} |
How can I efficiently get a table containing the frequency count of the unique keys of the main dictionary like below? (In a table because I want to save the result to a CSV.)
text | count |
---|---|
Yesterday | 2 |
Thursday | 1 |
CodePudding user response:
Try:
df = (
df["time_columns"]
.explode()
.value_counts()
.reset_index(name="count")
.rename(columns={"index": "text"})
)
print(df)
Prints:
text count
0 Yesterday 2
1 Thursday 1
CodePudding user response:
Given the input data, could you try this ?
tmp=pd.concat(([pd.DataFrame.from_dict(v,orient='index') for k,v in df['time_columns'].items()]))
tmp['text'].value_counts()
CodePudding user response:
The easy way would be to just iterate through list and save results to new dictionary sth like:
res = {}
for dict in df['time_columns']:
for key in dict.keys():
if key not in res.keys():
res[key] = 1
else:
res[key] = 1
If you know keys in advance you can initialize dict with keys and zeros and replace if statement inside the loop with just increment.
keys = ['Yesterday', 'Thursday', 'etc.']
res = {key: 0 for key in keys}
for dict in df['time_columns']:
for key in dict.keys():
res[key] = 1