Home > Blockchain >  Getting an error whenever the string is too long while passing it back to go from python script with
Getting an error whenever the string is too long while passing it back to go from python script with

Time:11-13

I am parsing a pdf file with python and sending the text string back to golang server. When I run the code with smaller pdf file it works properly but with large pdf files it returns exit status 1

Here is the code i am using:

func parsePdf(path string) string {
    cmd := exec.Command("python", "pdf_parser.py", path)
    output, err := cmd.Output() //this line throws error
    if err != nil {
        fmt.Println(err)
    }
    f, _ := os.Create("go-pdf-output.txt")
    _, err := f.WriteString(string(output))
    if err != nil {
        fmt.Println(err2)
    }
    return string(output)
}

This is the err I get from cmd.Err

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x18 pc=0xfc00e6]

This is my python script where I print the string after parsing:

import fitz
import sys

path = sys.argv[1]
doc = fitz.open(path)
list = []

for page in doc:
    text = page.get_text("text")
    list.append(text)

outputString= ' '.join(list)
print(outputString)

If I run the python script seperately it works perfectly. Error is thrown at this line output, err := cmd.Output() If the pdf file is small it works fine but if the pdf file is larger (ex: a book pdf) it fails.

I think the error is the size of bytes that the cmd.Output() can return. Is there any better way to transfer the data from python script to golang.

CodePudding user response:

I solved it on my own. It's simple instead of printing the outputString directly, print a json.dumps(). I'll provide the whole code below:

main.go file

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "log"
    "os"
    "os/exec"
)

type ParseText struct {
    Text string `json:"text"`
}

func main() {
    fmt.Println("Running...")

    pdfPath := "./Y2V7 Full With SS-2.pdf"
    _, err := parsePdf(pdfPath)
    if err != nil {
        fmt.Println(err)
    }
}

func parsePdf(path string) (string, error) {
    cmd := exec.Command("python", "pdf_parser.py", path)
    var stdout, stderr bytes.Buffer

    cmd.Stdout = &stdout
    cmd.Stderr = &stderr
    err := cmd.Run()
    if err != nil {
        log.Printf("Error when executing python: %s\n", stderr.Bytes())
        return "", fmt.Errorf("Error executing python: %w", err)
    }

    res := ParseText{}
    err = json.Unmarshal(stdout.Bytes(), &res)
    writeToFile("go-pdf.txt", res.Text)
    return res.Text, err
}
func writeToFile(fileName, text string) {
    f, err := os.Create(fileName)

    if err != nil {
        log.Fatal(err)
    }

    defer f.Close()

    _, err2 := f.WriteString(text)

    if err2 != nil {
        log.Fatal(err2)
    }
}

pdf-parser.py file

import fitz
import sys
import json

URL = sys.argv[1]
doc = fitz.open(URL)
list = []

for page in doc:
    text = page.get_text("text")
    list.append(text)

outputString= ' '.join(list)
print(json.dumps({"text":outputString}))
  • Related