Any comments or solutions are welcomed; I could not create a dictionary variable with one line.
import requests as re
from bs4 import BeautifulSoup
url = re.get('https://toiguru.jp/toeic-vocabulary-list')
soup = BeautifulSoup(url.content, "html.parser")
words = [str(el).replace("<td>", "") for el in soup.find_all("td")]
words = [str(el).replace("</td>", "") for el in words]
**words = [str(el).split("<br/>")for el in words]**
# With this code below, it got an error saying "IndexError: list index out of range"
words = {str(el[0]):str(el[1])for el in words}
# From here, I could not have any idea to create a dictionary variable like below
#{ENword: translation for ENword}
# e.g.) {'survey':'調査'}, {'interview':'面接'}
words = [str(el).split("<br/>")for el in words]
*The code above outputs values as below:
[['survey', '調査'], ['interview', '面接'], ['exhibition', '展示'], ['conference', '会議'], ['available', '利用できる'], ['annual', '年
1回の'], ['equipment', '備品/器具'], ['department', '部署'], ['refund', '払い戻す'], ['receipt', '領収書'], ['schedule', '予定, 計画'], ・・・and more・・・]
I want to change the above-mentioned values like this:
{ENword: translation for ENword}
e.g.) {'survey':'調査'}, {'interview':'面接'}
With bs4, I want to create a dictionary variable.
CodePudding user response:
Try the code below. There seems to be atleast 1 element in words that has no 2 items
words = {el[0]:el[1] for el in words if len(el)==2}
to find the non valid elements with different formatation u can use:
not_good=[[f"index={counter}", f"value={el2}"] for counter,el2 in enumerate(words) if len(el2)!=2]
print(not_good)
#output [['index=474', "value=['neither', 'どちらも…でない', '']"], ['index=475', "value=['']"], ['index=481', "value=['enclose', '同封する', '']"], ['index=701', "value=['']"]]
CodePudding user response:
Ignore ['']
:
words = {el[0]: el[1] for el in words if el != ['']}
# {'survey': '調査', 'interview': '面接', ..., 'neither': 'どちらも…でない', ..., 'enclose': '同封する', ...}
or list of dict:
words = [{el[0]: el[1]} for el in words if el != ['']]
# [{'survey': '調査'}, {'interview': '面接'}, ..., {'neither': 'どちらも…でない'}, ..., {'enclose': '同封する'}, ...]