I have a problem with a database. I have a bank with the following format. These are test results from 3 people and the date they were performed on:
Name DATE VALUE
ASX 2001-02-25 25
CDS 2001-02-25 32
ASX 2001-03-25 28
ERS 2001-03-26 40
CDS 2001-03-30 26
ASX 2001-03-30 30
I need, in cases where the test was performed more than once, to consider only the first value in the other evaluations
Name DATE VALUE
ASX 2001-02-25 25
CDS 2001-02-25 32
ASX 2001-03-25 25
ERS 2001-03-26 40
CDS 2001-03-30 32
ASX 2001-03-30 25
CodePudding user response:
data.table
option using @Darren Tsai df (thanks!):
library(data.table)
setDT(df)
df[ , c("VALUE") := df[!duplicated(Name)][.SD, on = .(Name), .(VALUE), roll = Inf]]
df
Output:
Name DATE VALUE
1: ASX 2001-02-25 25
2: CDS 2001-02-25 32
3: ASX 2001-03-25 25
4: ERS 2001-03-26 40
5: CDS 2001-03-30 32
6: ASX 2001-03-30 25
CodePudding user response:
A possible solution:
library(tidyverse)
df %>%
group_by(Name) %>%
mutate(VALUE = first(VALUE)) %>%
ungroup
#> # A tibble: 6 × 3
#> Name DATE VALUE
#> <chr> <chr> <int>
#> 1 ASX 2001-02-25 25
#> 2 CDS 2001-02-25 32
#> 3 ASX 2001-03-25 25
#> 4 ERS 2001-03-26 40
#> 5 CDS 2001-03-30 32
#> 6 ASX 2001-03-30 25
CodePudding user response:
Assume that your data has been sorted by DATE
, then you could use ave()
to extract the first value from VALUE
in each Name
group.
within(df, {
VALUE <- ave(VALUE, Name, FUN = \(x) x[1])
})
# Name DATE VALUE
# 1 ASX 2001-02-25 25
# 2 CDS 2001-02-25 32
# 3 ASX 2001-03-25 25
# 4 ERS 2001-03-26 40
# 5 CDS 2001-03-30 32
# 6 ASX 2001-03-30 25
Data
df <- structure(list(Name = c("ASX", "CDS", "ASX", "ERS", "CDS", "ASX"),
DATE = structure(c(11378, 11378, 11406, 11407, 11411, 11411), class = "Date"),
VALUE = c(25L, 32L, 28L, 40L, 26L, 30L)), row.names = c(NA, -6L),
class = "data.frame")