Removing bad data (same device ID and timestamp, 2 different values) from a dataframe-CodePudding

I'm pulling data that has a device ID, a timestamp, and a value. I'll have lots of different device ID's and data points for each hour of the day across an entire month. If I have two pieces of data with the same ID and the same Time, but different Values, how can I remove the lower of the two (which would just be bad data)? Alternatively, I can do this step further below. I'm not sure which would be easier.

Here's an example of the data I'd be working with.

       ID        Time                   Value
1234   64058     2022-03-23 08:00:00    1014.98
1235   64059     2022-03-23 08:00:00    1001.34
1236   64060     2022-03-23 08:00:00    1001.52
1237   64061     2022-03-23 08:00:00    1483.11
1238   64062     2022-03-23 08:00:00    1342.23
1239   64063     2022-03-23 08:00:00    1201.34
1242   64058     2022-03-23 08:00:00    10.55

To make the data easier to see, I merge it with another dataframe that has actual device names rather than just ID's and then pivot it to a wider format with this.

av_device_merge <- merge(av_data, device_list, by.av_data = ID, by.device_list = ID) %>%  select(-c(ID)

pivoted <- av_device_merge %>% pivot_wider(names_from = Device_Short_Name, values_from = Value)

Which yields this result (just including the one hour as an example):

Time                    Device 1              Device 2    Device 3    Device 4    Device 5    Device 6
2022-03-23 08:00:00     c(1014.98, 10.55)     1001.34     1001.52     1483.11     1342.23     1201.34
2022-03-23 09:00:00
...

This is the part where I could also look at each column, and if the data present is a vector, then keep the highest value only and change to a numeric value.

Which of the two would be easier to do? I'm guessing the first option? Any ideas on how to accomplish this goal?

Thanks in advance!

CodePudding user response：

Multiple ways to do this

data %>%
    group_by(ID, Time) %>%
    slice_max(n = 1, order_by = Value, with_ties = FALSE)

data %>%
    group_by(ID, Time)%>%
    top_n(1, wt = Value)

I prefer slice_max and the with_ties = FALSE in case of identical values.