Convert Image chart to array-CodePudding

I m new at opencv , I want to convert multiple charts images to an array

we can put anyscale, we only need to have the values of the graph that is displayed in the image as data, and if we plot again we must have the same display.

Any ideas on how to get started?

CodePudding user response：

Let's give a try

import cv2
import numpy as np
from matplotlib import pyplot as plt

im=cv2.imread('in.jpg') # Your image
H,W,D=im.shape # Shape of your image.
whiteness=im.sum(axis=2) # Sum of R G B
# For each x, compute first pixel (starting from top) which is not white
# (255*3=765. So, I tolerate a small margin. Visible pixels would be way
# under 760)
# Note that argmax stops whenever it founds a True
# Note also that if the whole column is white, it will return 0.
# We assume that 0 is not a valid value
# (If it is, we could also check if whiteness[y[x]] is really < 760)
y=np.array([np.argmax(whiteness[:,x]<760) for x in range(W)])
xmin=np.argmax(y>0) # Ignore margin 
y=y[xmin:]
y=max(y)-y # Reverse the values (since y is 0 up for images, and down for charts)
plt.plot(y)
plt.show()

But then, you need lot of automatisation:

my removal of margin is rudimentary, and would not work on other examples probably.
You said you didn't care of scaling, but you may care of where is the 0. Here it is assume to be the max (that is the min, in chart coordinates) value. You may want to detect the x axis.
As you can see, it is not very smooth when you look very closely. That is due to some antialiasing in your picture (pixel value doen't go straight from white to black), which makes it a little bit random when exactly it cross the 760 threshold. You may want, 1. to experiment with other thresholds. Or to have a post-processing smoothing.
You could also take advantage of this antialiasing to even have subpixel resolution. So instead of being a disadvantage, it helps on the contrary. But that is quite tricky, because you have to "reverse engineer" a little bit the antialiasing (influence of neighbours, minimum value, since nothing says the curve is strictly black, ...)
Edit: on second look, you have a more serious problem than antialiasing in the image: it is some "echo" pixels from the jpg encoding. That should be solved easily by lowering a lot the threshold.

But, well, that is the start you wanted, I guess.

Also, note that I am doing the worst crime in numpy: iterating on pixels. Sure, only on rows, so that is not totally awful (the worst would be to iterate on x,y). But, yet, I am pretty sure, there is a way to do the same thing with pure numpy operations. At the very least, using np.vectorize, but I wanted to keep code clarity, since it is more a pedagogical code than a real one.