Home > Software design >  Assign labels from columns in dataframe
Assign labels from columns in dataframe

Time:01-31

I have the following problem, from a 5 column dataframe I need to assign a label to it but only if any of the columns has more than 2 non nan values, for example for the following dataframe:

Name   Orange  Blue  Red  Black
A       nan     nan   ok   nan
A       nan     nan  nan   nan
A       nan     ok    ok   nan
A        ok     nan   ok    ok

In this case A would have to be label as Red, I tried with the elimination of the columns with only nan values:

df[df[~df.isnull().all()]]

but i can't assign the label properly, any idea to optimize the process? the output can be a df with the column with more non nan vales like:

Name   Red  
A      ok   
A      nan   
A      ok   
A      ok   

CodePudding user response:

Here is one way to do it by using notna and boolean indexing :

N = 2 # <- adjust here the threshold

sl = df.set_index("Name").notna().sum().loc[lambda s: s.gt(N)]
​
out = df[["Name"]].join(df.iloc[:, sl])

Another variant (better one) :

out = df.loc[:, df.notna().sum() >= N]

Output :

print(out)

  Name  Red
0    A   ok
1    A  NaN
2    A   ok
3    A   ok
  • Related