Is the a R function so replace data in different dates?-CodePudding

I have a problem with a database. I have a bank with the following format. These are test results from 3 people and the date they were performed on:

Name       DATE              VALUE
ASX        2001-02-25           25
CDS        2001-02-25           32
ASX        2001-03-25           28
ERS        2001-03-26           40
CDS        2001-03-30           26
ASX        2001-03-30           30

I need, in cases where the test was performed more than once, to consider only the first value in the other evaluations

Name       DATE               VALUE
ASX        2001-02-25            25
CDS        2001-02-25            32
ASX        2001-03-25            25
ERS        2001-03-26            40
CDS        2001-03-30            32
ASX        2001-03-30            25

CodePudding user response：

data.table option using @Darren Tsai df (thanks!):

library(data.table)
setDT(df)
df[ , c("VALUE") := df[!duplicated(Name)][.SD, on = .(Name), .(VALUE), roll = Inf]]
df

Output:

   Name       DATE VALUE
1:  ASX 2001-02-25    25
2:  CDS 2001-02-25    32
3:  ASX 2001-03-25    25
4:  ERS 2001-03-26    40
5:  CDS 2001-03-30    32
6:  ASX 2001-03-30    25

CodePudding user response：

A possible solution:

library(tidyverse)

df %>% 
  group_by(Name) %>% 
  mutate(VALUE = first(VALUE)) %>% 
  ungroup

#> # A tibble: 6 × 3
#>   Name  DATE       VALUE
#>   <chr> <chr>      <int>
#> 1 ASX   2001-02-25    25
#> 2 CDS   2001-02-25    32
#> 3 ASX   2001-03-25    25
#> 4 ERS   2001-03-26    40
#> 5 CDS   2001-03-30    32
#> 6 ASX   2001-03-30    25

CodePudding user response：

Assume that your data has been sorted by DATE, then you could use ave() to extract the first value from VALUE in each Name group.

within(df, {
  VALUE <- ave(VALUE, Name, FUN = \(x) x[1])
})

#   Name       DATE VALUE
# 1  ASX 2001-02-25    25
# 2  CDS 2001-02-25    32
# 3  ASX 2001-03-25    25
# 4  ERS 2001-03-26    40
# 5  CDS 2001-03-30    32
# 6  ASX 2001-03-30    25

Data

df <- structure(list(Name = c("ASX", "CDS", "ASX", "ERS", "CDS", "ASX"),
DATE = structure(c(11378, 11378, 11406, 11407, 11411, 11411), class = "Date"),
VALUE = c(25L, 32L, 28L, 40L, 26L, 30L)), row.names = c(NA, -6L),
class = "data.frame")