I cannot share the dataset but I will explain it as best as I can.
The dataset has 50 columns 48 of them are in Y/m/d h:m:s
format. also the data has many NA
, but it must not be removed.
Let's say there is a column B
. I want to remove the rows if the value of B
is not the earliest in that row.
How can I do this in R? For example, the original would be like this:
df <- data.frame(
A = c(11,19,17,6,13),
B = c(18,9,5,16,12),
C = c(14,15,8,87,16))
A B C
1 11 18 14
2 19 9 15
3 17 5 8
4 6 16 87
5 13 12 16
but I want this:
A B C
1 19 9 15
2 17 5 8
3 13 12 16
CodePudding user response:
You could use apply()
to find the minimum for each row.
df |> subset(B == apply(df, 1, min, na.rm = TRUE))
# A B C
# 2 19 9 15
# 3 17 5 8
# 5 13 12 16
The tidyverse
equivalent is
library(tidyverse)
df %>% filter(B == pmap(across(A:C), min, na.rm = TRUE))
CodePudding user response:
If you are willing to use data.table
, you could do the following.
library(data.table)
df <- as.data.table(df)
df[(B < A & B < C)]
A B C
1: 19 9 15
2: 17 5 8
3: 13 12 16