Error encountered after executing a print list statement in Python-CodePudding

I am new to Python and I am facing a weird issue with my code. Following is the part of my code where I am having the problem. The code is fetching a csv file and storing it to an array.

import requests

objNSEResponse = requests.get("https://www.nseindia.com" , headers = { "Referer":"https://www.nseindia.com" , "User-Agent":"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"})
objNSECookieJar = objNSEResponse.cookies

objNSEResponse = requests.get("https://www.nseindia.com/api/reportGSM?csv=true" , headers = {"Referer":"https://www.nseindia.com" , "User-Agent":"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"} , cookies = objNSECookieJar)

arrNSEGSMList = map(lambda arrNSEGSMRecord : arrNSEGSMRecord.replace("\"","").split(","), objNSEResponse.content.decode("utf-8-sig").replace("\n\"","\"").split("\n"))

print(list(arrNSEGSMList))

# delete unnecessary values
arrNSEGSMList = numpy.array(list(arrNSEGSMList1))
arrNSEGSMList = numpy.delete(arrNSEGSMList, [0], axis=0)        # delete header row
arrNSEGSMList = numpy.delete(arrNSEGSMList, [2,3,4], axis=1)    # delete all columns except stock symbol and serial number

Whenever I insert the statement : print(list(arrNSEGSMList)) , I get "IndexError: index 0 is out of bounds for axis 0 with size 0" at the following statement : arrNSEGSMList = numpy.delete(arrNSEGSMList, [0], axis=0). Zero length numpy array get created due to this print statement. If I remove the print statement then everything executes without any problem. I get the data which I require in the array "arrNSEGSMList". I am not able to figure out the reason for it.

CodePudding user response：

The use of map in the statement

arrNSEGSMList = map(lambda arrNSEGSMRecord : arrNSEGSMRecord.replace("\"","").split(","), objNSEResponse.content.decode("utf-8-sig").replace("\n\"","\"").split("\n"))

makes arrNSEGSMList an iterator that can be iterated through once. The first call of list(arrNSEGSMList) consumes the iterator, so when you try to execute list(arrNSEGSMList) the second time, all the items in arrNSEGSMList have already been consumed and you get an empty list.

Here's a similar example:

In [18]: m = map(len, ['abc', 'def', 'gh', 'ij'])  # Create a map iterator

In [19]: m
Out[19]: <map at 0x11efb9730>

In [20]: type(m)
Out[20]: map

In [21]: list(m)  # Create a list by iterating through the iterator.
Out[21]: [3, 3, 2, 2]

In [22]: list(m)  # The iterator has already been consumed.
Out[22]: []

A fix is to write

arrNSEGSMList = list(map(lambda arrNSEGSMRecord : arrNSEGSMRecord.replace("\"","").split(","), objNSEResponse.content.decode("utf-8-sig").replace("\n\"","\"").split("\n")))

so you convert the map to a list just once. Or even better, replace the use of map with a list comprehension (as suggested by @AKX in a comment):

arrNSEGSMList = [record.replace("\"","").split(",") for record in objNSEResponse.content.decode("utf-8-sig").replace("\n\"","\"").split("\n")]

From then on, just use arrNSEGSMList to use the list. E.g.

print(arrNSEGSMList)

CodePudding user response：

Beyond the actual issue caused by map being an iterator (as answered by the other answer):

Since you're parsing CSV data, you should probably use the built-in csv module instead of stripping quotes and splitting lines by hand.
Also, if you use a Requests Session to make your requests, you don't need to deal with cookies yourself. It's also easier to set default headers.
Thirdly, there's no need to use Numpy arrays to slice the data; just regular list comprehensions and slices will do.

Here's a reformulation of your code using these:

import requests
import csv

headers = {
    "User-Agent": "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)",
    "Referer": "https://www.nseindia.com",
}
with requests.Session() as sess:
    sess.headers.update(headers)
    sess.get("https://www.nseindia.com/").raise_for_status()
    resp = sess.get("https://www.nseindia.com/api/reportGSM?csv=true")
    resp.raise_for_status()
    # The encoding of the response is actually utf-8-sig,
    # so we force the attribute here so `resp.iter_lines()`
    # works as it should.
    resp.encoding = "utf-8-sig"  
    content = list(csv.reader(resp.iter_lines(decode_unicode=True)))

# `[1:]` slices the header line off.
symbol_and_serial = [(r[0], r[1]) for r in content[1:]]
for symbol, serial in symbol_and_serial:
    print(symbol, serial)

This prints out

1 ADROITINFO
2 ALPSINDUS
3 ANTGRAPHIC
4 ASIL

etc.