if i have the following input
participant trials Correct_Choice Opt_in
1 1 NaN 1
1 1 1 NAN
2 1 NaN 1
2 1 0 NaN
My desired output is
participant trials Correct_Choice Opt_in
1 1 1 1
2 1 0 1
What is the best way to do it in pands?
In R (using data.table) the code is as follows
setDT(df)[, lapply(.SD, na.omit) , by = list(participant,trials)]
For e.g.
e.g. data set in R
structure(list(participant = c("612550d21d30a44cb0d2d579", "612550d21d30a44cb0d2d579",
"612550d21d30a44cb0d2d579", "612550d21d30a44cb0d2d579", "612550d21d30a44cb0d2d579",
"612550d21d30a44cb0d2d579", "612550d21d30a44cb0d2d579", "612550d21d30a44cb0d2d579",
"612550d21d30a44cb0d2d579", "612550d21d30a44cb0d2d579", "612550d21d30a44cb0d2d579",
"612550d21d30a44cb0d2d579", "612550d21d30a44cb0d2d579", "612550d21d30a44cb0d2d579",
"612550d21d30a44cb0d2d579", "612550d21d30a44cb0d2d579", "612550d21d30a44cb0d2d579",
"612550d21d30a44cb0d2d579"), block = c(0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), trials = c(0L,
0L, 0L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L,
4L), opt_in.keys = c(1, NA, NA, 1, NA, NA, NA, 1, NA, NA, 1,
NA, NA, NA, 1, NA, NA, NA), correct_chosen = c(NA, 1L, NA, NA,
1L, NA, NA, NA, 1L, NA, NA, 1L, NA, NA, NA, 1L, NA, NA)), row.names = c(NA,
-18L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x7fd1a20102e0>)
CodePudding user response:
Assuming that you have a dataframe (df) that looks like this:
participant trials Correct_Choice Opt_in
0 1 1 NaN 1
1 1 1 1.0 NaN
2 2 1 NaN 1
3 2 1 0.0 NaN
Then you could get the desired output with the following PYTHON CODE:
output = df.groupby(['participant', 'trials']).first().reset_index()
print(output)
OUTPUT:
participant trials Correct_Choice Opt_in
0 1 1 1.0 1
1 2 1 0.0 1