I am parsing a pdf file with python and sending the text string back to golang server. When I run the code with smaller pdf file it works properly but with large pdf files it returns exit status 1
Here is the code i am using:
func parsePdf(path string) string {
cmd := exec.Command("python", "pdf_parser.py", path)
output, err := cmd.Output() //this line throws error
if err != nil {
fmt.Println(err)
}
f, _ := os.Create("go-pdf-output.txt")
_, err := f.WriteString(string(output))
if err != nil {
fmt.Println(err2)
}
return string(output)
}
This is the err I get from cmd.Err
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x18 pc=0xfc00e6]
This is my python script where I print the string after parsing:
import fitz
import sys
path = sys.argv[1]
doc = fitz.open(path)
list = []
for page in doc:
text = page.get_text("text")
list.append(text)
outputString= ' '.join(list)
print(outputString)
If I run the python script seperately it works perfectly. Error is thrown at this line output, err := cmd.Output()
If the pdf file is small it works fine but if the pdf file is larger (ex: a book pdf) it fails.
I think the error is the size of bytes that the cmd.Output()
can return. Is there any better way to transfer the data from python script to golang.
CodePudding user response:
I solved it on my own. It's simple instead of printing the outputString
directly, print a json.dumps()
. I'll provide the whole code below:
main.go file
package main
import (
"bytes"
"encoding/json"
"fmt"
"log"
"os"
"os/exec"
)
type ParseText struct {
Text string `json:"text"`
}
func main() {
fmt.Println("Running...")
pdfPath := "./Y2V7 Full With SS-2.pdf"
_, err := parsePdf(pdfPath)
if err != nil {
fmt.Println(err)
}
}
func parsePdf(path string) (string, error) {
cmd := exec.Command("python", "pdf_parser.py", path)
var stdout, stderr bytes.Buffer
cmd.Stdout = &stdout
cmd.Stderr = &stderr
err := cmd.Run()
if err != nil {
log.Printf("Error when executing python: %s\n", stderr.Bytes())
return "", fmt.Errorf("Error executing python: %w", err)
}
res := ParseText{}
err = json.Unmarshal(stdout.Bytes(), &res)
writeToFile("go-pdf.txt", res.Text)
return res.Text, err
}
func writeToFile(fileName, text string) {
f, err := os.Create(fileName)
if err != nil {
log.Fatal(err)
}
defer f.Close()
_, err2 := f.WriteString(text)
if err2 != nil {
log.Fatal(err2)
}
}
pdf-parser.py file
import fitz
import sys
import json
URL = sys.argv[1]
doc = fitz.open(URL)
list = []
for page in doc:
text = page.get_text("text")
list.append(text)
outputString= ' '.join(list)
print(json.dumps({"text":outputString}))