Is there a way to fill a "donut hole" of 0 values in a pandas DataFrame?-CodePudding

I'm working with some images, using gray intensity values in data frames. I manipulate the raw values a handful of ways, and let's just say I end up with a data frame that looks like this:

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 1 2 1 0 0 0 0 0 0

0 0 0 0 0 3 1 1 1 1 0 0 0 0 0 

0 0 0 0 0 1 2 1 3 1 5 0 0 0 0

0 0 0 0 1 3 1 1 1 4 3 0 0 0 0

0 0 0 0 1 1 5 2 **0** 1 1 0 0 0 0

0 0 0 0 0 5 1 1 **0** 2 1 0 0 0 0

0 0 0 0 0 1 1 1 2 1 3 0 0 0 0

0 0 0 0 0 0 1 2 1 0 0 0 0 0 0

0 0 0 0 0 0 0 1 1 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Note the bolded 0 values, which are the only 0s surrounded by non-zero values on all sides. What I'm hoping to do is convert the 0s in those locations, and only those locations, to 1 (for example), without disrupting all the outer 0s.

I've looked through the documentation but I'm not sure how--if it's possible--to tell pd.replace what conditions must be met to replace a given 0. Does anyone have any suggestions? Sorry for the formatting - new here.

CodePudding user response：

I wouldnt bother with a pandas dataframe. I would convert it to a two-dimensional python list and work with it from there.
First i converted your dataframe to a list: df.values.tolist()
Then i used this question to find all neighbouring values and from there it was smooth sailing

def checkNeighboursForNull(arrayList, rowIdx, colIdx) -> bool:
    values = []
    for index_y, value_y in enumerate(arrayList):
        for index_x, value_x in enumerate(value_y):
            if (rowIdx, colIdx) != (index_y, index_x):
                if (abs(index_x - colIdx) < 2) and (abs(index_y - rowIdx) < 2):
                    values.append(value_x)
    if 0 in values:
        return False
    return True

This code returns a boolean whether the current 0 that it found has any neighbouring zeros

arrayList = [[1, 2, 3, 4],
             [4, 4, 0, 0],
             [7, 8, 9, 2]]

for rowIdx, row in enumerate(arrayList):
    for colIdx, col in enumerate(row):
        if col == 0:
            isNullInNeighbours = checkNeighboursForNull(arrayList, rowIdx, colIdx)
            arrayList[rowIdx][colIdx] = 1 if isNullInNeighbours else 0
        
print(arrayList)

CodePudding user response：

Unless I'am missing serious context element, this is really not what dataframes in general, and pandas is particular, is made for.

This is a task for an image processing algorithm. Like openCV. Good news is OpenCV's images are just numpy arrays. And you can easily get one of those from a pandas DataFrame.

For example, you could try something in the effect of

import cv2

image=(df.values==0).astype(np.uint8) # ndarray, as 8 bit image
# Returns the number of 0 connected areas   1 for the non-0 area (not connected). And a mask image, with 0 in the non-0 areas (sounds paradoxical, but that is just because I asked for components of df.values==0 image), and an id of the area in the rest. Just try it to see.
ncomp, mask=cv2.connectedComponents(image)
# Iterate all areas to find out which ones are non connected to the border (which is the same as not being entirely surrounded by non-0)
# Note that component 0 is the one of non-0 area, so we don't want it
for i in range(1,ncomp):
    if (mask[0]==i).any() or (mask[-1]==i).any() or (mask[:,0]==i).any() or (mask[:,-1]==i).any():
        continue # This component is connected to one border
    df.values[mask==i]=1

That is, if I understood correctly what you meant by "surrounded by non 0". I take it was the area that has to be surrounded by non 0, not each of the 0. (Otherwise, your example would not be valid, since even the 2 central 0 have each other as neighbors).

Another possibility (depending on what you are really trying to do), is, still with OpenCV, to use mathematical morphology. Specifically closing operator (or opening, if you want to focus on binary image of 0 values, as before). It is not exactly the same as before. But if your 0 areas are in practice small holes in big non-zero areas, may be this is what you are looking for.