Home > Software engineering >  How to calculate depth of each group in column as maximal number of non empty columns?
How to calculate depth of each group in column as maximal number of non empty columns?

Time:04-20

I have a dataset:

id1     id2    id3    id4
a1      a2     NaN    NaN
a1      a2     a5     NaN
b1      b2     b3     b4
b1      NaN    NaN    NaN
c1      c2     NaN    NaN

I want to calculate maximal number of columns with not NaN for each id1. So desired results is:

id1   depth
a1     3
b1     4
c1     2

How to do that?

CodePudding user response:

I created some mock data and stored it in df. In order to make the depth count for one column we do the following.

import pandas as pd
import numpy as np

# create a sample dataframe
columns = ['A', 'B', 'C', 'D']
data = {
    'A': ['E1', 'E2', 'E3', 'E4', 'E5'],
    'B': ['Male', 'Female', 'Female', np.nan, 'Male'],
    'C': [27, 24, 29, np.nan, 25],
    'D': ['Accounting', np.nan, 'Accounting', np.nan, 'Sales']
}
df = pd.DataFrame(data)

ncol = 4
nrow = 5

# for just the first column
print('A', '\t', 'depth')
for i in range(nrow):

    count = 0
    for col in columns:
        if not isinstance(df[col][i], str) and np.isnan(np.asarray([df[col][i]]))[0]:
            break

        count  = 1

    print(df['A'][i], '\t', count)
  1. For each row
  2. Start a counter at 0
  3. For each element in that row
  4. Add 1 to the counter if it is not a nan. If it is a nan, we want to break out of the loop (stop counting).

Output:

A        depth
E1       4
E2       3
E3       4
E4       1
E5       4

CodePudding user response:

is this the result you want?

df.groupby('id1').apply(lambda x: len(x.dropna(axis=1,how='all').columns))
'''
id1
a1    3
b1    4
c1    2
dtype: int64
  • Related