I have data that is merged down but on every hour mark, there is a row in my data with an elevation value and a row in my data with a NA. I want to be able to just merge those rows together, or in other words, take out the row that has the NA value. It's okay that all of the other values in the table are NA.
Here's my data:
> dput(stackoverflow[1:100,])
structure(list(Time = structure(c(1432425600, 1432426500, 1432427400,
1432428300, 1432429200, 1432430100, 1432431000, 1432431900, 1432432800,
1432433700, 1432434600, 1432435500, 1432436400, 1432437300, 1432438200,
1432439100, 1432440000, 1432440900, 1432441800, 1432442700, 1432443600,
1432444500, 1432445400, 1432446300, 1432447200, 1432448100, 1432449000,
1432449900, 1432450800, 1432450800, 1432451700, 1432452600, 1432453500,
1432454400, 1432454400, 1432455300, 1432456200, 1432457100, 1432458000,
1432458000, 1432458900, 1432459800, 1432460700, 1432461600, 1432461600,
1432462500, 1432463400, 1432464300, 1432465200, 1432465200, 1432466100,
1432467000, 1432467900, 1432468800, 1432468800, 1432469700, 1432470600,
1432471500, 1432472400, 1432472400, 1432473300, 1432474200, 1432475100,
1432476000, 1432476000, 1432476900, 1432477800, 1432478700, 1432479600,
1432479600, 1432480500, 1432481400, 1432482300, 1432483200, 1432483200,
1432484100, 1432485000, 1432485900, 1432486800, 1432486800, 1432487700,
1432488600, 1432489500, 1432490400, 1432490400, 1432491300, 1432492200,
1432493100, 1432494000, 1432494000, 1432494900, 1432495800, 1432496700,
1432497600, 1432498500, 1432499400, 1432500300, 1432501200, 1432502100,
1432503000), tzone = "UTC", class = c("POSIXct", "POSIXt")),
Turtle = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), levels = c("R3L1", "R3L11", "R3L12", "R3L2", "R3L4",
"R3L8", "R3L9", "R4L8", "R8L1", "R8L4", "R8NAT123"), class = "factor"),
elevation = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 282.27, NA, NA, NA, NA, 283.21, NA, NA, NA, NA, 282.14,
NA, NA, NA, NA, 281.63, NA, NA, NA, NA, 281.63, NA, NA, NA,
NA, 281.63, NA, NA, NA, NA, 282.63, NA, NA, NA, NA, 281.63,
NA, NA, NA, NA, 282.14, NA, NA, NA, NA, 281.63, NA, NA, NA,
NA, 282.14, NA, NA, NA, NA, 281.36, NA, NA, NA, NA, 282.14,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA,
-100L), class = c("tbl_df", "tbl", "data.frame"))
Here's also a picture of an example of rows I would like to merge.
Let me know if you have any questions.
CodePudding user response:
Since the data frame (here, df
) has the same Time
and Turtle
values and the NA
values are the only ones repeating and always come after the Elevation
value, a dplyr
approach would be:
library(dplyr)
df %>%
group_by(Time, Turtle) %>%
slice(1)
# or
df %>% # thanks @neilfws
group_by(Time, Turtle) %>%
slice_head()
Which groups the Time
and Turtle
values and takes the first value.
CodePudding user response:
with dplyr
library(dplyr)
df1 %>%
group_by(Time, Turtle) %>%
summarise(elevation = elevation[!is.na(elevation)][1], .groups = 'drop')
-output
# A tibble: 87 × 3
# Groups: Time [87]
Time Turtle elevation
<dttm> <fct> <dbl>
1 2015-05-24 00:00:00 R3L1 NA
2 2015-05-24 00:15:00 R3L1 NA
3 2015-05-24 00:30:00 R3L1 NA
4 2015-05-24 00:45:00 R3L1 NA
5 2015-05-24 01:00:00 R3L1 NA
6 2015-05-24 01:15:00 R3L1 NA
7 2015-05-24 01:30:00 R3L1 NA
8 2015-05-24 01:45:00 R3L1 NA
9 2015-05-24 02:00:00 R3L1 NA
10 2015-05-24 02:15:00 R3L1 NA
Or use distinct
df1 %>%
arrange(Time, Turtle, is.na(elevation)) %>%
distinct()