Home > Back-end >  how to get the base string and page no string in for loop?
how to get the base string and page no string in for loop?

Time:02-23

currently i am putting the full url in urlist i want the only string after pageno in the urlist and the program should go on rest as it as.

https://bidplus.gem.gov.in/bidlists?bidlists&page_no=**AMCR24yMNFkfoXF3wKPmGMy_wV8TJPAlxm6oWiTHGOI**

  urlList = ["https://bidplus.gem.gov.in/bidlists?bidlists&page_no=AMCR24yMNFkfoXF3wKPmGMy_wV8TJPAlxm6oWiTHGOI",
                   "https://bidplus.gem.gov.in/bidlists?bidlists&page_no=Hgw0LYpSZdLXow1Wq84uKar1nxXbFhClXQDuAAiPDxU",
                   "https://bidplus.gem.gov.in/bidlists?bidlists&page_no=rO5Erb90Q_P1S0fL5O6FEShlv20RBXmkHFusZogvUoo",
                   "https://bidplus.gem.gov.in/bidlists?bidlists&page_no=jiE0kS8e-ghmlmjDMPUJm1OBCRotqJ6n7srXZN99LZc",
                   "https://bidplus.gem.gov.in/bidlists?bidlists&page_no=MY89EG2RtzpSMlT1wjE61Cv31nAyetQ49kmXfw2AfMo",
]
    for url in urlList:
        print('Hold on creating URL to fetch data...')
        url = 'https://bidplus.gem.gov.in/bidlists?bidlists&page_no='   str(page_no)
        print('URL created: '   url)
        scraped_data = requests.get(url, verify=False)
        soup_data = bs(scraped_data.text, 'lxml')
        extracted_data = sou
    
    p_data.find('div', {'id': 'pagi_content'})

CodePudding user response:

Use this line after your urlList variable:

urlList = [x.split('=')[-1] for x in urlList]

CodePudding user response:

you can split the urls on = and get the part you need:

for url in urls:
     print(url.split("=")[-1])

outputs:
AMCR24yMNFkfoXF3wKPmGMy_wV8TJPAlxm6oWiTHGOI
Hgw0LYpSZdLXow1Wq84uKar1nxXbFhClXQDuAAiPDxU
rO5Erb90Q_P1S0fL5O6FEShlv20RBXmkHFusZogvUoo
jiE0kS8e-ghmlmjDMPUJm1OBCRotqJ6n7srXZN99LZc
MY89EG2RtzpSMlT1wjE61Cv31nAyetQ49kmXfw2AfMo

if you want the page number in its own list this is how:

pagenumbers = [i.split("=")[-1] for i in urls]
>>> pagenumbers
['AMCR24yMNFkfoXF3wKPmGMy_wV8TJPAlxm6oWiTHGOI', 'Hgw0LYpSZdLXow1Wq84uKar1nxXbFhClXQDuAAiPDxU', 'rO5Erb90Q_P1S0fL5O6FEShlv20RBXmkHFusZogvUoo', 'jiE0kS8e-ghmlmjDMPUJm1OBCRotqJ6n7srXZN99LZc', 'MY89EG2RtzpSMlT1wjE61Cv31nAyetQ49kmXfw2AfMo']

there is no need to split the urls. In your for loop you can just use the url directy since you are iterating over the full url.

for url in urlList:
        print('Hold on fetching data...')
        scraped_data = requests.get(url, verify=False)
        soup_data = bs(scraped_data.text, 'lxml')
  • Related