if i have the following python dictionary:
[{'website':'google.com', 'hits': 100, 'source': 'mobile'},
{'website':'facebook.com', 'hits': 100, 'source': 'internet'},
{'website':'google.com', 'hits': 100, 'source': 'internet'},
{'website':'google.com', 'hits': 100, 'source': 'tablet'},
{'website':'youtube.com', 'hits': 100, 'source': 'mobile'},
]
where the values for key 'hits' stays the same (so it never changes will always be 100)
how can i combine the values of key 'source' into a list, but keeping the output as a separate dictionary inside a list
basically to get this output:
[{'website':'google.com', 'hits': 100, 'source': ['mobile', 'internet', 'tablet']},
{'website':'facebook.com', 'hits': 100, 'source': 'internet'},
{'website':'youtube.com', 'hits': 100, 'source': 'mobile'}
]
CodePudding user response:
Using pandas.DataFrame
:
import pandas as pd
data = [
{'website':'google.com', 'hits': 100, 'source': 'mobile'},
{'website':'facebook.com', 'hits': 100, 'source': 'internet'},
{'website':'google.com', 'hits': 100, 'source': 'internet'},
{'website':'google.com', 'hits': 100, 'source': 'tablet'},
{'website':'youtube.com', 'hits': 100, 'source': 'mobile'},
]
df = pd.DataFrame(data)
will create data like this:
> df.head()
website hits source
0 google.com 100 mobile
1 facebook.com 100 internet
2 google.com 100 internet
3 google.com 100 tablet
4 youtube.com 100 mobile
then you can group by source column, and save in your desired format:
new_data = []
for item in df.groupby('website'):
new_data.append({
'website': item[0],
'hits': 100,
'source': list(item[1]['source'])
})
print(new_data)
# [
# {'website': 'facebook.com', 'hits': 100, 'source': ['internet']},
# {'website': 'google.com', 'hits': 100, 'source': ['mobile', 'internet', 'tablet']},
# {'website': 'youtube.com', 'hits': 100, 'source': ['mobile']}
# ]
CodePudding user response:
Create a new array, and check if the dictionary exists before adding it.
new = []
for each in initial:
# added enumerate to get index
i, found = list(filter(lambda a:a[1]['website'] ==each['website'], enumerate(new)))
# if new does not have it already
if not len(found) :
new.append(each)
else:
try:
new[i]['source'].append(each['source'])
except:
new[i]['source'] = [new[i]['source'] , each['source'] ]
I wrote this on my phone so there might be some errors. But you get the idea
CodePudding user response:
import pandas as pd
data = [{'website':'google.com', 'hits': 100, 'source': 'mobile'},
{'website':'facebook.com', 'hits': 100, 'source': 'internet'},
{'website':'google.com', 'hits': 100, 'source': 'internet'},
{'website':'google.com', 'hits': 100, 'source': 'tablet'},
{'website':'youtube.com', 'hits': 100, 'source': 'mobile'}]
df = pd.DataFrame(data) // convert data to pandas dataframe
print(df)
website hits source
0 google.com 100 mobile
1 facebook.com 100 internet
2 google.com 100 internet
3 google.com 100 tablet
4 youtube.com 100 mobile
output = df.groupby(['website', 'hits'])['source'].apply(list).reset_index().to_dict(orient='records')
print(output)
[{'website': 'facebook.com', 'hits': 100, 'source': ['internet']},
{'website': 'google.com', 'hits': 100, 'source': ['mobile', 'internet', 'tablet']},
{'website': 'youtube.com', 'hits': 100, 'source': ['mobile']}]