Home > other >  Trying to use ddply to subset a dataframe by two column variables, then find the maximum of a third
Trying to use ddply to subset a dataframe by two column variables, then find the maximum of a third

Time:03-09

I have a dataframe called data with variables for data, time, temperature, and a group number called Box #. I'm trying to subset the data to find the maximum temperature for each day, for each box, along with the time that temperature occurred at. Ideally I could place this data into a new dataframe with the date, time, maximum temperature and the time is occurred at.

I tried using ddply but was the code only returns one line of output

ddply(data, .('Box #', 'Date'), summarize, max('Temp'))

I was able to find the maximum temperatures for each day using tapply on separate dataframes that only contain the values for individual groups

mx_day_2 <- tapply(box2$Temp, box2$Date, max)

I was unable to apply this to the larger dataframe with all groups and cannot figure out how to also get time from this code.

Is it possible to have ddply subset by both Box # and Date, then return two separate outputs of both maximum temperature and time, or do I need to use a different function here?

Edit: I managed to get the maximum times using a version of the code in the answer below, but still haven't figured out how to find the time at which the max occurs in the same data. The code that worked for the first part was

max_data <- data %>%
    group_by(data$'Box #', data$'Date')
max_values <- summarise(max_data, max_temp=max(Temp, na.rm=TRUE))

CodePudding user response:

I would use dplyr/tidyverse in stead of plyr, it's an updated version of the package. And clean the column names with janitor: a space is difficult to work with (it changes 'Box #' to box_number).

library(tidyverse)
library(janitor)

mx_day2 <- data %>%
  clean_names() %>%
  group_by(date,box_number)%>%
  summarise(max_temp=max(temp, na.rm=TRUE)

CodePudding user response:

I found a solution that pulls full rows from the initial dataframe into a new dataframe based on only max values. Full code for the solution below

max_data_v2 <- data %>%
  group_by(data$'Box #', data$'Date') %>%
  filter(Temp == max(Temp, na.rm=TRUE))
  • Related