I'm trying to retrieve wikipedia pages' characters count, for articles in different languages. I'm using a dictionary with as key the name of the page and as value a dictionary with the language as key and the count as value.
The code is:
pages = ["L'arte della gioia", "Il nome della rosa"]
langs = ["it", "en"]
dicty = {}
dicto ={}
numz = 0
for x in langs:
wikipedia.set_lang(x)
for y in pages:
pagelang = wikipedia.page(y)
splittedpage = pagelang.content
dicto[y] = dicty
for char in splittedpage:
numz =1
dicty[x] = numz
If I print dicto, I get
{"L'arte della gioia": {'it': 72226, 'en': 111647}, 'Il nome della rosa': {'it': 72226, 'en': 111647}}
The count should be different for the two pages.
CodePudding user response:
Please try this code. I didn't run it because I don't have the wikipedia module.
Notes:
- Since your expected result is
dict[page,dict[lan,cnt]]
, I think first iterate pages is more natural, then iterate languages. Maybe for performance reason you want first iterate languages, please comment. - Characters count of
text
can simply belen(text)
, why iterate and sum again? - Variable names. You will soon be lost in
x
y
like variables.
pages = ["L'arte della gioia", "Il nome della rosa"]
langs = ["it", "en"]
dicto = {}
for page in pages:
lang_cnt_dict = {}
for lang in langs:
wikipedia.set_lang(lang)
page_lang = wikipedia.page(page)
chars_cnt = len(pagelang.content)
lang_cnt_dict[lan] = chars_cnt
dicto[page] = lang_cnt_dict
print(dicto)
update
If you want iterate langs first
pages = ["L'arte della gioia", "Il nome della rosa"]
langs = ["it", "en"]
dicto = {}
for lang in langs:
wikipedia.set_lang(lang)
for page in pages:
page_lang = wikipedia.page(page)
chars_cnt = len(pagelang.content)
if page in dicto:
dicto[page][lang] = chars_cnt
else:
dicto[page] = {lang: chars_cnt}
print(dicto)