Home > Blockchain >  Pandas dropping na within conditions
Pandas dropping na within conditions

Time:11-07

if i have the following input

participant trials Correct_Choice Opt_in  
1             1         NaN          1
1             1          1           NAN
2             1         NaN          1
2             1          0           NaN   

My desired output is

participant trials Correct_Choice Opt_in  
1             1         1          1
2             1         0          1
  

What is the best way to do it in pands?

In R (using data.table) the code is as follows

setDT(df)[, lapply(.SD, na.omit) , by = list(participant,trials)]

For e.g.

e.g. data set in R

structure(list(participant = c("612550d21d30a44cb0d2d579", "612550d21d30a44cb0d2d579", 
"612550d21d30a44cb0d2d579", "612550d21d30a44cb0d2d579", "612550d21d30a44cb0d2d579", 
"612550d21d30a44cb0d2d579", "612550d21d30a44cb0d2d579", "612550d21d30a44cb0d2d579", 
"612550d21d30a44cb0d2d579", "612550d21d30a44cb0d2d579", "612550d21d30a44cb0d2d579", 
"612550d21d30a44cb0d2d579", "612550d21d30a44cb0d2d579", "612550d21d30a44cb0d2d579", 
"612550d21d30a44cb0d2d579", "612550d21d30a44cb0d2d579", "612550d21d30a44cb0d2d579", 
"612550d21d30a44cb0d2d579"), block = c(0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), trials = c(0L, 
0L, 0L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 
4L), opt_in.keys = c(1, NA, NA, 1, NA, NA, NA, 1, NA, NA, 1, 
NA, NA, NA, 1, NA, NA, NA), correct_chosen = c(NA, 1L, NA, NA, 
1L, NA, NA, NA, 1L, NA, NA, 1L, NA, NA, NA, 1L, NA, NA)), row.names = c(NA, 
-18L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x7fd1a20102e0>)

CodePudding user response:

Assuming that you have a dataframe (df) that looks like this:

   participant  trials  Correct_Choice Opt_in
0            1       1             NaN      1
1            1       1             1.0    NaN
2            2       1             NaN      1
3            2       1             0.0    NaN


Then you could get the desired output with the following PYTHON CODE:

output = df.groupby(['participant', 'trials']).first().reset_index()
print(output)


OUTPUT:

   participant  trials  Correct_Choice Opt_in
0            1       1             1.0      1
1            2       1             0.0      1
  • Related