I have a dataset:
id1 id2 id3 id4
a1 a2 NaN NaN
a1 a2 a5 NaN
b1 b2 b3 b4
b1 NaN NaN NaN
c1 c2 NaN NaN
I want to calculate maximal number of columns with not NaN for each id1. So desired results is:
id1 depth
a1 3
b1 4
c1 2
How to do that?
CodePudding user response:
I created some mock data and stored it in df
. In order to make the depth count for one column we do the following.
import pandas as pd
import numpy as np
# create a sample dataframe
columns = ['A', 'B', 'C', 'D']
data = {
'A': ['E1', 'E2', 'E3', 'E4', 'E5'],
'B': ['Male', 'Female', 'Female', np.nan, 'Male'],
'C': [27, 24, 29, np.nan, 25],
'D': ['Accounting', np.nan, 'Accounting', np.nan, 'Sales']
}
df = pd.DataFrame(data)
ncol = 4
nrow = 5
# for just the first column
print('A', '\t', 'depth')
for i in range(nrow):
count = 0
for col in columns:
if not isinstance(df[col][i], str) and np.isnan(np.asarray([df[col][i]]))[0]:
break
count = 1
print(df['A'][i], '\t', count)
- For each row
- Start a counter at 0
- For each element in that row
- Add 1 to the counter if it is not a nan. If it is a nan, we want to break out of the loop (stop counting).
Output:
A depth
E1 4
E2 3
E3 4
E4 1
E5 4
CodePudding user response:
is this the result you want?
df.groupby('id1').apply(lambda x: len(x.dropna(axis=1,how='all').columns))
'''
id1
a1 3
b1 4
c1 2
dtype: int64