Home > Back-end >  Extract plot lines from chart
Extract plot lines from chart

Time:10-05

I am trying to extract some plot lines from the image below. As you can see one line is quite thin. My idea was to remove the noise around the lines (grid lines and text).

enter image description here

So far I came up with this code to remove the grid:

import numpy as np
import cv2

gray = cv2.imread('test.png')

edges = cv2.Canny(gray,50,150,apertureSize = 3)

lines = cv2.HoughLinesP(image=edges,rho=0.01,theta=np.pi/90, threshold=100,lines=np.array([]), minLineLength=100,maxLineGap=80)
a,b,c = lines.shape
for i in range(a):
    cv2.line(gray, (lines[i][0][0], lines[i][0][1]), (lines[i][0][2], lines[i][0][3]), (255, 255, 255), 3, cv2.LINE_AA)
cv2.imwrite('result.png',gray)

After that i get the following result: enter image description here

As you can see, there is still some noise around the plots and they are a bit cut off (doesn't have to be perfectly). Has anyone a better solution or some tips how i can improve this? Maybe remove the words first? Maybe detect the lines directly instead of removing the grid etc.?

CodePudding user response:

You can segment both plots using the HSV color space and looking for the blue and orange color. This results on a pretty clean binary mask. Let's check out the code:

# Imports:
import numpy as np
import cv2

# Set image path
path = "D://opencvImages//"
fileName = "graphs.png"

# Reading an image in default mode:
inputImage = readImage(path   fileName)

# BGR to HSV:
hsvImage = cv2.cvtColor(inputImage, cv2.COLOR_BGR2HSV)

The first portion of the script converts the BGR image to the HSV color space. Next you need the color values to apply a simple binary thresholding. The tricky part is to get correct HSV values. For the blue graph, a proper Hue threshold seems to be from [85, 179] while leaving the rest of the channels opened, from [0, 255]. For the orange color, a possible Hue range could be [11, 30]. I create a list with these two thresholds:

# Array with HSV values:
hsvValues = []

# Blue range:
lowThreshold = [85, 0, 0]
highThreshold = [179, 255, 255]
# Into the list:
hsvValues.append((lowThreshold, highThreshold))

# Orange range:
lowThreshold = [11, 0, 0]
highThreshold = [30, 255, 255]
# Into the list:
hsvValues.append((lowThreshold, highThreshold))

Now, let's create the mask. Just iterate over the HSV list and apply the thresholding using the cv2.inRange function:

# Create mask:
for i in range(len(hsvValues)):
    # Get current thresholds:
    currentLowThres = np.array(hsvValues[i][0])
    currentHighThres = np.array(hsvValues[i][2])

    # Apply HSV threshold:
    hsvMask = cv2.inRange(hsvImage, currentLowThres, currentHighThres)
    cv2.imshow("Hsv Mask", hsvMask)
    cv2.waitKey(0)

These are the two masks:

Blue plot:

Orange plot:

Now, do you want to create actual line models using this information? That's another problem. I'd be cautious to use Hough's line transform/detection. Although the masks are pretty clean, Hough's line parameter tuning is notoriously capricious (i.e., difficult and non-scalable) if you attempt to run not-so similar images through the algorithm, Additionally, I guess you could be more interested in multiple lines segments per plot instead of one continuous line, so I'd be on the lookout for a more ad-hoc approach.

  • Related