Home > Blockchain >  How to remove duplicate values per day in r?
How to remove duplicate values per day in r?

Time:11-21

I've a simple question but I couldn't find a plausible solution.

My dataframe looks like this:

dput(prec)
structure(list(date = structure(c(19091, 19091, 19092, 19092, 
19093, 19093, 19094, 19094, 19095, 19095, 19096, 19096, 19097, 
19097, 19098, 19098, 19099, 19099, 19100, 19100, 19101, 19101, 
19102, 19102, 19103, 19103, 19104, 19104, 19105, 19105, 19106, 
19106, 19107, 19107, 19109, 19109, 19110, 19110, 19111, 19111, 
19112, 19112, 19113, 19113, 19114, 19114), class = "Date"), target = c("grass", 
"tree", "grass", "tree", "grass", "tree", "grass", "tree", "grass", 
"tree", "grass", "tree", "grass", "tree", "grass", "tree", "grass", 
"tree", "grass", "tree", "grass", "tree", "grass", "tree", "grass", 
"tree", "grass", "tree", "grass", "tree", "grass", "tree", "grass", 
"tree", "grass", "tree", "grass", "tree", "grass", "tree", "grass", 
"tree", "grass", "tree", "grass", "tree"), Precip_Tot = c(0, 
0, 0, 0, 0.0464, 0.0464, 0.0362, 0.0362, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.131, 0.131, 0, 0, 0, 0, 0, 
0, 0.016, 0.016, 0, 0, 0, 0, 0, 0, 0, 0, 0.4506, 0.4506)), row.names = c(NA, 
46L), class = "data.frame")

In the column Precip_Tot I have duplicated values because I have duplicated date.

How can I remove/turn into NA the duplicate values of Precip_Tot per date day? Note: I don't want to remove any row.

Any help is much appreciated.

CodePudding user response:

Try this

prec[duplicated(prec$date),"Precip_Tot"]=NA

         date target Precip_Tot
1  2022-04-09  grass     0.0000
2  2022-04-09   tree         NA
3  2022-04-10  grass     0.0000
4  2022-04-10   tree         NA
5  2022-04-11  grass     0.0464
6  2022-04-11   tree         NA
7  2022-04-12  grass     0.0362
8  2022-04-12   tree         NA
9  2022-04-13  grass     0.0000
    ...

CodePudding user response:

Using dplyr, replace rows after the first row for each date with NA:

library(dplyr)

prec %>%
  group_by(date) %>%
  mutate(Precip_Tot = if_else(row_number() == 1, Precip_Tot, NA_real_)) %>%
  ungroup()
# A tibble: 46 × 3
   date       target Precip_Tot
   <date>     <chr>       <dbl>
 1 2022-04-09 grass      0     
 2 2022-04-09 tree      NA     
 3 2022-04-10 grass      0     
 4 2022-04-10 tree      NA     
 5 2022-04-11 grass      0.0464
 6 2022-04-11 tree      NA     
 7 2022-04-12 grass      0.0362
 8 2022-04-12 tree      NA     
 9 2022-04-13 grass      0     
10 2022-04-13 tree      NA     
# … with 36 more rows
  • Related