Home > Software engineering >  How to count unique ids if certain columns have no missing values
How to count unique ids if certain columns have no missing values

Time:05-03

I have the following df

ID   date    v1   v2  v3   v4  v5  v6
A     ..      1   2   3   NaN NaN NaN 
A     ..      0   2  NaN  NaN NaN NaN
B     ..      0   2   4    5   3   9
B     ..      2   6   6    6   3  NaN
D     ..      2  NaN NaN  NaN NaN NaN
D     ..      9   2   2   NaN NaN NaN
D     ..      2   2   6    6   1   2

I want to extract rows whose columns (from v1-v5) aren't missing.

desired output

ID   date    v1   v2  v3   v4  v5  v6
B     ..      0   2   4    5   3   9
B     ..      2   6   6    6   3  NaN
D     ..      2   2   6    6   1   2

Thanks!

CodePudding user response:

Use dropna with a subset:

cols = ['v1', 'v2', 'v3', 'v4', 'v5']
df2 = df.dropna(subset=cols)

Or, use notna combined with all for boolean indexing:

cols = ['v1', 'v2', 'v3', 'v4', 'v5']

df2 = df[df[cols].notna().all(axis=1)]

Alternative if v1-v5 are consecutive:

df2 = df[df.loc[:, 'v1':'v5'].notna().all(axis=1)]

output:

  ID date  v1   v2   v3   v4   v5   v6
2  B   ..   0  2.0  4.0  5.0  3.0  9.0
3  B   ..   2  6.0  6.0  6.0  3.0  NaN
6  D   ..   2  2.0  6.0  6.0  1.0  2.0
  • Related