I'm trying to parse the timetable content using goquery to work with it later. But I have a problem.
I have two functions. The first one takes an html document and searches for a token (csrfmiddlewaretoken) and the second one sends a request using this token and extracts information. Finishing extracting all necessary information from the page, I search for the token to use it in future request and store it.
But for some reason found token turns into an empty string when it reaches if len(foundCsrfToken) == 0 {
. If I print length of the token just before the statement it prints this:
...
64
0
...
I've got rid of all goroutines in case if it's the problem.
func findCsrfMiddlewareToken(responseBody io.Reader) (string, error) {
document, err := goquery.NewDocumentFromReader(responseBody)
if err != nil {
return "", err
}
var foundCsrfToken string
document.Find("script").Each(func(_ int, scrpt *goquery.Selection) {
scriptText := scrpt.Text()
if funcDefIndex := strings.Index(scriptText, "function Filter"); funcDefIndex != -1 {
csrfTokenValueStart := strings.Index(scriptText, "csrfmiddlewaretoken: '")
offset := csrfTokenValueStart len("csrfmiddlewaretoken: '")
foundCsrfToken = scriptText[offset : offset csrfMiddlewareTokenLength]
}
})
if len(foundCsrfToken) == 0 {
return "", errNoCsrfMiddlewareToken
}
return foundCsrfToken, nil
}
func (parser *TimetableParser) ParseTimetable(timetableFilterInfo internal.TimetableInfo) (internal.Timetable, error) {
timetable := internal.Timetable{}
requestBody := makeFormValues(timetableFilterInfo, parser.csrfMiddlewareToken).Encode()
request, err := http.NewRequest("POST", baseUrl, strings.NewReader(requestBody))
if err != nil {
return timetable, err
}
request.Header.Add("Content-Type", "application/x-www-form-urlencoded")
request.Header.Add("Content-Length", strconv.Itoa(len(requestBody)))
request.Header.Add("Referer", baseUrl)
response, err := parser.client.Do(request)
if err != nil {
return timetable, err
}
defer response.Body.Close()
document, err := goquery.NewDocumentFromReader(response.Body)
if err != nil {
return timetable, err
}
document.Find("table#schedule").Find("tr").Each(func(rowIndex int, row *goquery.Selection) {
subjectTimeElement := row.Closest("td")
subjectTimeElement.NextAll().Each(func(columnIndex int, cell *goquery.Selection) {
subjectInfo := extractSubjectInfoFromCell(cell)
subjectInfo.Order = rowIndex
timetable.Subjects[columnIndex][rowIndex] = subjectInfo
})
})
parser.csrfMiddlewareToken, err = findCsrfMiddlewareToken(response.Body)
if err != nil {
log.Println("csrfMiddlewareToken: " err.Error())
}
return timetable, nil
}
Go version: go1.17.1 windows/amd64
goquery version: 1.7.1
CodePudding user response:
I've just realized what is wrong. io.Reader is treated as a stream. So when I make read from it once, it becomes empty. As you can see, after gathering all necessary information and reading the response, it is passed into the first function. But it's already empty.
When I call findCsrfMiddlewareToken
function for the first time, it works as usual and prints token length (64). But when I get to second call with empty response, it prints 0.
Possible solution: How to read multiple times from same io.Reader