I have a dict
like this:
seriesdict={'series':[],'id':[]}
where
series
= the series name and
id
= the unique id associated with the book in calibre
This dict
is appended for each series to a list
sorted_data=[]
as this list will be used to request data from the internet, I'd like to reduce the amount of requests I have to do, both to safe time and reduce traffic on the site. I'd like to check each series only once and move on to the next one.
I have already sorted the list according to the series, but I am struggling on how to check if the series is already in the list, and if so, how to add the following id's to the first added series.
This is what I've tried so far:
for entry in seriesdict:
if entry['series'] not in sortedseriesdict['series']:
sortedseriesdict['series']=entry['series']
sortedseriesdict['ids']=entry['id']
sorted_data.append(sortedseriesdict.copy())
elif entry['series'] in sortedseriesdict['series']:
sortedseriesdict['ids']=entry['id']
sorted_data.append(sortedseriesdict.copy())
This iteration question seems similar, but I am unsure if it could help in my case, as the ids being added have to keep all old data as well.
This is a part of the list:
[{'index': 237, 'series': '5 Centimeters per Second', 'id': '13050'}
{'index': 303, 'series': '86 EIGHTY-SIX', 'id': '9809'},
{'index': 304, 'series': '86 EIGHTY-SIX', 'id': '13540'},
{'index': 305, 'series': '86 EIGHTY-SIX', 'id': '9289'},
{'index': 306, 'series': '86 EIGHTY-SIX', 'id': '13323'},
{'index': 307, 'series': '86 EIGHTY-SIX', 'id': '10783'},
{'index': 309, 'series': '86 EIGHTY-SIX', 'id': '12084'},
{'index': 310, 'series': '86 EIGHTY-SIX', 'id': '10943'},
{'index': 311, 'series': '86 EIGHTY-SIX', 'id': '9202'},
{'index': 2329, 'series': 'A Certain Magical Index', 'id': '12843'}]
I would like to create the seriesdict
so that the sorted_data looks like this:
[{'series': '5 Centimeters per Second', 'ids': '9809'},
{'series': '86 EIGHTY-SIX', 'ids': '13540, 9289, 13323, 10783, 12084, 10943, 9202'},
{'series': 'A Certain Magical Index', 'ids': '12843'},
...
]
How can I do that, if it is possible?
Any answer is appreciated.
CodePudding user response:
Since you seem to be dealing with series data, I would like to suggest using pandas
library. This would save you a lot of hassle and tinkering around and will propose a solution with pandas
first. First we will take your seriesdict
and convert it to a pandas.DataFrame
object.
import pandas as pd
series_dict = [
{"index": 237, "series": "5 Centimeters per Second", "id": "13050"},
{"index": 303, "series": "86 EIGHTY-SIX", "id": "9809"},
{"index": 304, "series": "86 EIGHTY-SIX", "id": "13540"},
{"index": 305, "series": "86 EIGHTY-SIX", "id": "9289"},
{"index": 306, "series": "86 EIGHTY-SIX", "id": "13323"},
{"index": 307, "series": "86 EIGHTY-SIX", "id": "10783"},
{"index": 309, "series": "86 EIGHTY-SIX", "id": "12084"},
{"index": 310, "series": "86 EIGHTY-SIX", "id": "10943"},
{"index": 311, "series": "86 EIGHTY-SIX", "id": "9202"},
{"index": 2329, "series": "A Certain Magical Index", "id": "12843"},
]
df = pd.DataFrame(series_dict)
now df
contains all the data we need and we can start modifying data as per your wish. For that we are going to group the data by series
and take the id
column of the result and apply a function that joins the column values with ,
. By resetting the index, we can achieve proper structure of the result dataframe.
df = df.groupby("series")["id"].apply(", ".join).reset_index()
Now if we print the result with :
print(df)
we get
series id
0 5 Centimeters per Second 13050
1 86 EIGHTY-SIX 9809, 13540, 9289, 13323, 10783, 12084, 10943,...
2 A Certain Magical Index 12843
If you really want to have the data in the structure you proposed,
my_data = [value for _, value in df.to_dict(orient="index").items()]
would return
[{'series': '5 Centimeters per Second', 'id': '13050'}, {'series': '86 EIGHTY-SIX', 'id': '9809, 13540, 9289, 13323, 10783, 12084, 10943, 9202'}, {'series': 'A Certain Magical Index', 'id': '12843'}]