Home > OS >  Merging specific rows in R where there is a NA value in one
Merging specific rows in R where there is a NA value in one

Time:01-12

I have data that is merged down but on every hour mark, there is a row in my data with an elevation value and a row in my data with a NA. I want to be able to just merge those rows together, or in other words, take out the row that has the NA value. It's okay that all of the other values in the table are NA.

Here's my data:

> dput(stackoverflow[1:100,])
structure(list(Time = structure(c(1432425600, 1432426500, 1432427400, 
1432428300, 1432429200, 1432430100, 1432431000, 1432431900, 1432432800, 
1432433700, 1432434600, 1432435500, 1432436400, 1432437300, 1432438200, 
1432439100, 1432440000, 1432440900, 1432441800, 1432442700, 1432443600, 
1432444500, 1432445400, 1432446300, 1432447200, 1432448100, 1432449000, 
1432449900, 1432450800, 1432450800, 1432451700, 1432452600, 1432453500, 
1432454400, 1432454400, 1432455300, 1432456200, 1432457100, 1432458000, 
1432458000, 1432458900, 1432459800, 1432460700, 1432461600, 1432461600, 
1432462500, 1432463400, 1432464300, 1432465200, 1432465200, 1432466100, 
1432467000, 1432467900, 1432468800, 1432468800, 1432469700, 1432470600, 
1432471500, 1432472400, 1432472400, 1432473300, 1432474200, 1432475100, 
1432476000, 1432476000, 1432476900, 1432477800, 1432478700, 1432479600, 
1432479600, 1432480500, 1432481400, 1432482300, 1432483200, 1432483200, 
1432484100, 1432485000, 1432485900, 1432486800, 1432486800, 1432487700, 
1432488600, 1432489500, 1432490400, 1432490400, 1432491300, 1432492200, 
1432493100, 1432494000, 1432494000, 1432494900, 1432495800, 1432496700, 
1432497600, 1432498500, 1432499400, 1432500300, 1432501200, 1432502100, 
1432503000), tzone = "UTC", class = c("POSIXct", "POSIXt")), 
    Turtle = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L), levels = c("R3L1", "R3L11", "R3L12", "R3L2", "R3L4", 
    "R3L8", "R3L9", "R4L8", "R8L1", "R8L4", "R8NAT123"), class = "factor"), 
    elevation = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, 282.27, NA, NA, NA, NA, 283.21, NA, NA, NA, NA, 282.14, 
    NA, NA, NA, NA, 281.63, NA, NA, NA, NA, 281.63, NA, NA, NA, 
    NA, 281.63, NA, NA, NA, NA, 282.63, NA, NA, NA, NA, 281.63, 
    NA, NA, NA, NA, 282.14, NA, NA, NA, NA, 281.63, NA, NA, NA, 
    NA, 282.14, NA, NA, NA, NA, 281.36, NA, NA, NA, NA, 282.14, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, 
-100L), class = c("tbl_df", "tbl", "data.frame"))

enter image description here Here's also a picture of an example of rows I would like to merge.

Let me know if you have any questions.

CodePudding user response:

Since the data frame (here, df) has the same Time and Turtle values and the NA values are the only ones repeating and always come after the Elevation value, a dplyr approach would be:

library(dplyr)

df %>% 
  group_by(Time, Turtle) %>% 
  slice(1)

# or

df %>% # thanks @neilfws
  group_by(Time, Turtle) %>% 
  slice_head()

Which groups the Time and Turtle values and takes the first value.

CodePudding user response:

with dplyr

library(dplyr)
df1 %>% 
  group_by(Time, Turtle) %>%
  summarise(elevation = elevation[!is.na(elevation)][1], .groups = 'drop')

-output

# A tibble: 87 × 3
# Groups:   Time [87]
   Time                Turtle elevation
   <dttm>              <fct>      <dbl>
 1 2015-05-24 00:00:00 R3L1          NA
 2 2015-05-24 00:15:00 R3L1          NA
 3 2015-05-24 00:30:00 R3L1          NA
 4 2015-05-24 00:45:00 R3L1          NA
 5 2015-05-24 01:00:00 R3L1          NA
 6 2015-05-24 01:15:00 R3L1          NA
 7 2015-05-24 01:30:00 R3L1          NA
 8 2015-05-24 01:45:00 R3L1          NA
 9 2015-05-24 02:00:00 R3L1          NA
10 2015-05-24 02:15:00 R3L1          NA

Or use distinct

df1 %>% 
   arrange(Time, Turtle, is.na(elevation)) %>%
   distinct()
  •  Tags:  
  • r
  • Related