My code is:
import json
with open(r'C:\Users\kevin\Dropbox\webcrawler\webcrawler\webcrawler\jobscout24.json') as f:
data = json.load(f)
values = ','.join([str(i) for i in data])
my_list = values.split(",")
my_list = values.split("'")
links = [my_list]
for i in my_list:
a = 'https://www.jobscout24.ch' i
print (a)
!My output:(https://i.stack.imgur.com/dB4tS.png)
The output should be only the links with the numbers (i.e.(https://www.jobscout24.ch/de/job/senior-systems-engineer-citrix/7139368/)
It shouldn't contain the links without the numbers (i.e.(https://www.jobscout24.ch)
I already tried to extract only every second element with:
print (a[::2])
But this didn't work as expected.
Content of the json file:[ {"datajoburltext": ["/de/job/system-engineer-global-infrastructure/7184973/", "/de/job/systems-engineer/7141131/", "/de/job/systems-engineer-digital-workplace-consultant/7138024/", "/de/job/systems-engineer-vmware-consultant/7138051/", "/de/job/systems-engineer-voice-enterprise-mobility/7188224/", "/de/job/systems-engineer-100/7020537/", "/de/job/systems-engineer-diabetes-care/7214185/", "/de/job/level-end-user-systems-engineer/7156570/", "/de/job/systems-engineer-workplace/7141116/", "/de/job/systems-engineer/7194974/", "/de/job/systems-engineer/7194972/", "/de/job/senior-systems-engineer-citrix/7139368/", "/de/job/information-systems-engineer/7154423/", "/de/job/windows-systems-engineer/7179959/", "/de/job/systems-engineer-ecm/7169846/", "/de/job/systems-engineer-operations/7165522/", "/de/job/praktikant-systems-engineer-jet/7192042/", "/de/job/systems-engineer/7219016/", "/de/job/systems-engineer-client-und-mobile-management/7197563/", "/de/job/solution-architect-vmware-consultant/7141122/", "/de/job/systems-engineer-diabetes-care/7214185/", "/de/job/senior-systems-engineer-citrix/7139368/"]} ]
I hope you can help me.
CodePudding user response:
You must work with json-object. But not with string.
import json
with open(r'C:\Users\kevin\Dropbox\webcrawler\webcrawler\webcrawler\jobscout24.json') as f:
data = json.load(f)
url_parts = sum((p["datajoburltext"] for p in data), [])
print(*['https://www.jobscout24.ch' part for part in url_parts], sep='\n')