I am new to Python and I am facing a weird issue with my code. Following is the part of my code where I am having the problem. The code is fetching a csv file and storing it to an array.
import requests
objNSEResponse = requests.get("https://www.nseindia.com" , headers = { "Referer":"https://www.nseindia.com" , "User-Agent":"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"})
objNSECookieJar = objNSEResponse.cookies
objNSEResponse = requests.get("https://www.nseindia.com/api/reportGSM?csv=true" , headers = {"Referer":"https://www.nseindia.com" , "User-Agent":"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"} , cookies = objNSECookieJar)
arrNSEGSMList = map(lambda arrNSEGSMRecord : arrNSEGSMRecord.replace("\"","").split(","), objNSEResponse.content.decode("utf-8-sig").replace("\n\"","\"").split("\n"))
print(list(arrNSEGSMList))
# delete unnecessary values
arrNSEGSMList = numpy.array(list(arrNSEGSMList1))
arrNSEGSMList = numpy.delete(arrNSEGSMList, [0], axis=0) # delete header row
arrNSEGSMList = numpy.delete(arrNSEGSMList, [2,3,4], axis=1) # delete all columns except stock symbol and serial number
Whenever I insert the statement : print(list(arrNSEGSMList))
, I get "IndexError: index 0 is out of bounds for axis 0 with size 0" at the following statement : arrNSEGSMList = numpy.delete(arrNSEGSMList, [0], axis=0)
. Zero length numpy array get created due to this print statement. If I remove the print statement then everything executes without any problem. I get the data which I require in the array "arrNSEGSMList". I am not able to figure out the reason for it.
CodePudding user response:
The use of map
in the statement
arrNSEGSMList = map(lambda arrNSEGSMRecord : arrNSEGSMRecord.replace("\"","").split(","), objNSEResponse.content.decode("utf-8-sig").replace("\n\"","\"").split("\n"))
makes arrNSEGSMList
an iterator that can be iterated through once. The first call of list(arrNSEGSMList)
consumes the iterator, so when you try to execute list(arrNSEGSMList)
the second time, all the items in arrNSEGSMList
have already been consumed and you get an empty list.
Here's a similar example:
In [18]: m = map(len, ['abc', 'def', 'gh', 'ij']) # Create a map iterator
In [19]: m
Out[19]: <map at 0x11efb9730>
In [20]: type(m)
Out[20]: map
In [21]: list(m) # Create a list by iterating through the iterator.
Out[21]: [3, 3, 2, 2]
In [22]: list(m) # The iterator has already been consumed.
Out[22]: []
A fix is to write
arrNSEGSMList = list(map(lambda arrNSEGSMRecord : arrNSEGSMRecord.replace("\"","").split(","), objNSEResponse.content.decode("utf-8-sig").replace("\n\"","\"").split("\n")))
so you convert the map
to a list just once. Or even better, replace the use of map
with a list comprehension (as suggested by @AKX in a comment):
arrNSEGSMList = [record.replace("\"","").split(",") for record in objNSEResponse.content.decode("utf-8-sig").replace("\n\"","\"").split("\n")]
From then on, just use arrNSEGSMList
to use the list. E.g.
print(arrNSEGSMList)
CodePudding user response:
Beyond the actual issue caused by map
being an iterator (as answered by the other answer):
- Since you're parsing CSV data, you should probably use the built-in
csv
module instead of stripping quotes and splitting lines by hand. - Also, if you use a Requests Session to make your requests, you don't need to deal with cookies yourself. It's also easier to set default headers.
- Thirdly, there's no need to use Numpy arrays to slice the data; just regular list comprehensions and slices will do.
Here's a reformulation of your code using these:
import requests
import csv
headers = {
"User-Agent": "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)",
"Referer": "https://www.nseindia.com",
}
with requests.Session() as sess:
sess.headers.update(headers)
sess.get("https://www.nseindia.com/").raise_for_status()
resp = sess.get("https://www.nseindia.com/api/reportGSM?csv=true")
resp.raise_for_status()
# The encoding of the response is actually utf-8-sig,
# so we force the attribute here so `resp.iter_lines()`
# works as it should.
resp.encoding = "utf-8-sig"
content = list(csv.reader(resp.iter_lines(decode_unicode=True)))
# `[1:]` slices the header line off.
symbol_and_serial = [(r[0], r[1]) for r in content[1:]]
for symbol, serial in symbol_and_serial:
print(symbol, serial)
This prints out
1 ADROITINFO
2 ALPSINDUS
3 ANTGRAPHIC
4 ASIL
etc.