I've a dynamic dataframe of the following pattern:
structure(list(Date = structure(c(19304, 19305, 19311,
19311, 19312), class = "Date"), Category = c("4",
"6", "1", "0", "3"), Units_Sold = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), Raised = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), Method = c("Trad",
"Trad", "Unknown", "Trad", "Unknown"), Day = c(8, 9, 15, 15, 16)), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
As you can probably see, there's two categories that have the same date. What I'd like to do is create a condition: if there are two rows with the same date, the df will be subsetted (say call it df_copy), and in that new df, one of the rows will be dropped and the contents of the "Category" column will be changed to say "Check Dataframe", and the "Method" column will be changed to say "Attention". Any advice most appreciated.
In answer to the question, I'd the dataframe to look something like this:
tibble [5 x 6] (S3: tbl_df/tbl/data.frame)
$ Date : Date[1:5], format: "2022-11-08" "2022-11-09" "2022-11-15" "2022-11-16"
$ Category: chr [1:5] "4" "6" "Check Dataframe" "3"
$ Units_Sold: num [1:5] NA NA NA NA
$ Raised: num [1:5] NA NA NA NA
$ Method : chr [1:5] "Trade" "Trad" "Attention" "Unknown"
$ Day: num [1:5] 8 9 15 15 16
CodePudding user response:
Using dplyr, group by Date
, add your warning flags to rows where n() > 1
, then use distinct()
to drop duplicate rows:
library(dplyr)
df_copy <- df_orig %>%
group_by(Date) %>%
mutate(
Category = ifelse(n() > 1, "Check Dataframe", Category),
Method = ifelse(n() > 1, "Attention", Method)
) %>%
ungroup() %>%
distinct(Date, .keep_all = TRUE)
df_copy
Output:
# A tibble: 4 × 6
Date Category Units_Sold Raised Method Day
<date> <chr> <dbl> <dbl> <chr> <dbl>
1 2022-11-08 4 NA NA Trad 8
2 2022-11-09 6 NA NA Trad 9
3 2022-11-15 Check Dataframe NA NA Attention 15
4 2022-11-16 3 NA NA Unknown 16
CodePudding user response:
We may group by 'Date', use an if/else
condtion to change the values in 'Category', 'Method' columns, slice
the first row and ungroup
library(dplyr)
df2 <- df1 %>%
group_by(Date) %>%
mutate(Category = if(n() > 1) "Check Dataframe" else Category,
Method = if(n() > 1) "Attention" else Method) %>%
slice(n=1) %>%
ungroup
-output
df2
# A tibble: 4 × 6
Date Category Units_Sold Raised Method Day
<date> <chr> <dbl> <dbl> <chr> <dbl>
1 2022-11-08 4 NA NA Trad 8
2 2022-11-09 6 NA NA Trad 9
3 2022-11-15 Check Dataframe NA NA Attention 15
4 2022-11-16 3 NA NA Unknown 16
Or with data.table
library(data.table)
setDT(df1)[, c("Category", "Method") := if(.N > 1)
.("Check Dataframe", "Attention") else .(Category, Method), Date]
df2 <- unique(df1, by = 'Date')
-output
> df2
Date Category Units_Sold Raised Method Day
<Date> <char> <num> <num> <char> <num>
1: 2022-11-08 4 NA NA Trad 8
2: 2022-11-09 6 NA NA Trad 9
3: 2022-11-15 Check Dataframe NA NA Attention 15
4: 2022-11-16 3 NA NA Unknown 16