Home > database >  How to count NaN rows against all ids in dataframe but nan will be consider by checking specific col
How to count NaN rows against all ids in dataframe but nan will be consider by checking specific col

Time:05-25

Context:

I have the Plantcube file which has 7 columns and that file is generated by the response of some device and every second that device response temperature or humidity and cube_id and timestamp is by default kind of thing never missed in all 400k records...

Question:

I want to find the count of ids where the device sends a response without temperature or humidity, finding id and their count will help me to trace cubes which having problems sending the response.

enter image description here

If you look at row no 3 Cube ID 48 and row no 5 Cube ID 90 does not have any information, so I want to count how many times id 48, 90, and others have the same situation.

Expected output eg:

Cube ID -> Missing Count

48 -> 1030

90 -> 790

400286 -> 36

File link : https://drive.google.com/file/d/1xZST8n27IcVsFor1qqu90jZ1E2cJ6pHb/view?usp=sharing

Thanks

CodePudding user response:

You can utilize isna and all:

df.loc[df.drop('CubeID',axis=1).isna().all(1)]['CubeID'].value_counts()

CodePudding user response:

mask1 = df['Temperature Layer A'].isna()
mask2 = df['Temperature Layer B'].isna()
mask3 = df['Humidity Layer A'].isna()
mask4 = df['Humidity Layer B'].isna()
df[mask1 & mask2 & mask3 & mask4]['Cube ID'].value_counts()

Output:

16    1564
20    1561
45    1561
75    1560
21    1560
      ...
70    1537
40    1537
37    1536
10    1533
46     613
  • Related