Home > OS >  How to remove NAs from a certain column in data frames in a list?
How to remove NAs from a certain column in data frames in a list?

Time:06-17

I have a list (my.list) that looks like this:

> my.list
$S1
       A   B   C         D
1 101027  NA  C1        NA
2 101031 1.5 PTA 0.8666667
3 101032 1.4  C1 0.5571429
4 101127  NA PTA        NA
5 101220 9.3  C1 0.7849462

$S2
       A    B   C         D
1 102142   NA  C1        NA
2 102143 0.70 PTA 1.7142857
3 102144   NA  C1 2.7257000
4 102148 0.45 PTA        NA
5 102151 0.91  C1 0.7032967
6 102152 0.78 PTA        NA

I want to remove the rows that are 'NA' in column D, but only if they are also 'PTA' in Column C.

My desired output would look like this:

> my.list
$S1
       A   B   C         D
1 101027  NA  C1        NA
2 101031 1.5 PTA 0.8666667
3 101032 1.4  C1 0.5571429
4 101220 9.3  C1 0.7849462

$S2
       A    B   C         D
1 102142   NA  C1        NA
2 102143 0.70 PTA 1.7142857
3 102144   NA  C1 2.7257000
4 102151 0.91  C1 0.7032967

How can I go about doing this?

Reproducible Data:

my.list <- structure(list(S1 = structure(list(A = c(101027L, 101031L, 101032L, 
101127L, 101220L), B = c(NA, 1.5, 1.4, NA, 9.3), C = c("C1", "PTA", "C1", "PTA", "C1", "PTA"), D = c(NA, 0.8666667, 0.5571429, NA, 0.7849462
)), .Names = c("A", "B", "C", "D"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5")), S2 = structure(list(A = c(102142L, 102143L, 
102144L, 102148L, 102151L, 102152L), B = c(NA, 0.7, NA, 0.45, 
0.91, 0.78), C = c("C1", "PTA", "C1", "PTA", "C1", "PTA"), D = c(NA, 
1.7142857, 2.7257, NA, 0.7032967, NA)), .Names = c("A", "B", "C", 
"D"), class = "data.frame", row.names = c("1", "2", "3", "4", 
"5", "6"))), .Names = c("S1", "S2"))

CodePudding user response:

Using lapply, and subsetting with simple logical tests:

lapply(my.list, function(x) x[!(is.na(x$D) & x$C == "PTA"),])
#> $S1
#>        A   B   C         D
#> 1 101027  NA  C1        NA
#> 2 101031 1.5 PTA 0.8666667
#> 3 101032 1.4  C1 0.5571429
#> 5 101220 9.3  C1 0.7849462
#> 
#> $S2
#>        A    B   C         D
#> 1 102142   NA  C1        NA
#> 2 102143 0.70 PTA 1.7142857
#> 3 102144   NA  C1 2.7257000
#> 5 102151 0.91  C1 0.7032967

Created on 2022-06-16 by the reprex package (v2.0.1)

CodePudding user response:

Or with subset

lapply(my.list, subset, subset = !(is.na(D) & C == 'PTA'))

-output

$S1
       A   B   C         D
1 101027  NA  C1        NA
2 101031 1.5 PTA 0.8666667
3 101032 1.4  C1 0.5571429
5 101220 9.3  C1 0.7849462

$S2
       A    B   C         D
1 102142   NA  C1        NA
2 102143 0.70 PTA 1.7142857
3 102144   NA  C1 2.7257000
5 102151 0.91  C1 0.7032967
  • Related