Creating DataFrame of groups by pixel values in Python (number, size, etc.)-CodePudding

I have the following data (simple representation of black particles on a white filter):

data = [
    [0, 0, 0, 255, 255, 255, 0, 0],
    [0, 255, 0, 255, 255, 255, 0, 0],
    [0, 0, 0, 255, 255, 255, 0, 0, ],
    [0, 0, 0, 0, 255, 0, 0, 0],
    [0, 255, 255, 0, 0, 255, 0, 0],
    [0, 255, 0, 0, 0, 255, 0, 0],
    [0, 0, 0, 0, 0, 255, 0, 0],
    [0, 0, 0, 0, 0, 255, 0, 0]
]

And I have counted the number of particles (groups) and assigned them each a number using the following code:

arr = np.array(data)
groups, group_count = measure.label(arr > 0, return_num = True, connectivity = 1)
print('Groups: \n', groups)

With the Output:

Groups: 
 [[0 0 0 1 1 1 0 0]
 [0 2 0 1 1 1 0 0]
 [0 0 0 1 1 1 0 0]
 [0 0 0 0 1 0 0 0]
 [0 3 3 0 0 4 0 0]
 [0 3 0 0 0 4 0 0]
 [0 0 0 0 0 4 0 0]
 [0 0 0 0 0 4 0 0]]

I then have four (4) particles (groups) of different sizes.

I am looking to create a DataFrame representing each particle. Like this:

Any help is much appreciated!

CodePudding user response：

You can use Counter from the collection library:

import collections
import pandas as pd

groups = [[0, 0, 0, 1, 1, 1, 0, 0],
          [0, 2, 0, 1, 1, 1, 0, 0],
          [0, 0, 0, 1, 1, 1, 0, 0],
          [0, 0, 0, 0, 1, 0, 0, 0],
          [0, 3, 3, 0, 0, 4, 0, 0],
          [0, 3, 0, 0, 0, 4, 0, 0],
          [0, 0, 0, 0, 0, 4, 0, 0],
          [0, 0, 0, 0, 0, 4, 0, 0]]

counter = collections.Counter(groups[0])
for i in range(1,len(groups)):
    counter = counter   collections.Counter(groups[i]) 
    
df = pd.DataFrame.from_dict(counter, orient='index', columns=["Size"]).drop(0)

>>> df
   Size
1    10
2     1
3     3
4     4

CodePudding user response：

There should be a more elegant approach, but here is what I have come up with:

import pandas as pd
customDict = {}
for group in groups:
  for value in group:
    if str(value) not in customDict:
      customDict[str(value)] = [0]
    customDict[str(value)][0]  = 1
df = pd.DataFrame.from_dict(customDict, orient="index").reset_index()
df.rename(columns={"index": "particle #", 0: "size"}, inplace=True)
df.drop(0, inplace=True)
df

Output

	particle #	size
1	1	10
2	2	1
3	3	3
4	4	4

CodePudding user response：

Use numpy.unique with return_counts=True:

df = (pd.DataFrame(np.c_[np.unique(groups, return_counts=True)],
                   columns=['Particle #', 'Size'])
        .loc[lambda d: d['Particle #'].ne(0)] # only to filter out the 0s
     )

or, different syntax:

df = (pd.DataFrame(dict(zip(['Particle #', 'Size'],
                            np.unique(groups, return_counts=True))))
        .loc[lambda d: d['Particle #'].ne(0)] # only to filter out the 0s
     )

output:

   Particle #  Size
1           1    10
2           2     1
3           3     3
4           4     4