merging python list of dictionaries with repeated dictionary keys-CodePudding

if i have the following python dictionary:

[{'website':'google.com', 'hits': 100, 'source': 'mobile'}, 
{'website':'facebook.com', 'hits': 100, 'source': 'internet'},
 {'website':'google.com', 'hits': 100, 'source': 'internet'},
 {'website':'google.com', 'hits': 100, 'source': 'tablet'},
 {'website':'youtube.com', 'hits': 100, 'source': 'mobile'},

]

where the values for key 'hits' stays the same (so it never changes will always be 100)

how can i combine the values of key 'source' into a list, but keeping the output as a separate dictionary inside a list

basically to get this output:

[{'website':'google.com', 'hits': 100, 'source': ['mobile', 'internet', 'tablet']}, 
{'website':'facebook.com', 'hits': 100, 'source': 'internet'},
 {'website':'youtube.com', 'hits': 100, 'source': 'mobile'}
]

CodePudding user response：

Using pandas.DataFrame:

import pandas as pd

data = [
    {'website':'google.com', 'hits': 100, 'source': 'mobile'}, 
    {'website':'facebook.com', 'hits': 100, 'source': 'internet'},
    {'website':'google.com', 'hits': 100, 'source': 'internet'},
    {'website':'google.com', 'hits': 100, 'source': 'tablet'},
    {'website':'youtube.com', 'hits': 100, 'source': 'mobile'},
]
df = pd.DataFrame(data)

will create data like this:

> df.head()
        website  hits    source
0    google.com   100    mobile
1  facebook.com   100  internet
2    google.com   100  internet
3    google.com   100    tablet
4   youtube.com   100    mobile

then you can group by source column, and save in your desired format:

new_data = []
for item in df.groupby('website'):
    new_data.append({
        'website': item[0],
        'hits': 100,
        'source': list(item[1]['source'])
    })
print(new_data)
# [
#     {'website': 'facebook.com', 'hits': 100, 'source': ['internet']},
#     {'website': 'google.com', 'hits': 100, 'source': ['mobile', 'internet', 'tablet']},
#     {'website': 'youtube.com', 'hits': 100, 'source': ['mobile']}
# ]

CodePudding user response：

Create a new array, and check if the dictionary exists before adding it.

new = [] 
for each in initial:
  # added enumerate to get index
  i, found = list(filter(lambda a:a[1]['website'] ==each['website'], enumerate(new))) 
  # if new does not have it already 
  if not len(found) :
    new.append(each) 
  else:
    try:
      new[i]['source'].append(each['source']) 
    except:
      new[i]['source'] = [new[i]['source'] , each['source'] ]

I wrote this on my phone so there might be some errors. But you get the idea

CodePudding user response：

import pandas as pd

data = [{'website':'google.com', 'hits': 100, 'source': 'mobile'}, 
        {'website':'facebook.com', 'hits': 100, 'source': 'internet'},
        {'website':'google.com', 'hits': 100, 'source': 'internet'},
        {'website':'google.com', 'hits': 100, 'source': 'tablet'},
        {'website':'youtube.com', 'hits': 100, 'source': 'mobile'}]

df = pd.DataFrame(data)     // convert data to pandas dataframe

print(df)

     website       hits  source
0    google.com    100   mobile
1    facebook.com  100   internet
2    google.com    100   internet
3    google.com    100   tablet
4   youtube.com    100   mobile

output = df.groupby(['website', 'hits'])['source'].apply(list).reset_index().to_dict(orient='records')

print(output)
[{'website': 'facebook.com', 'hits': 100, 'source': ['internet']}, 
 {'website': 'google.com', 'hits': 100, 'source': ['mobile', 'internet', 'tablet']},
 {'website': 'youtube.com', 'hits': 100, 'source': ['mobile']}]