I have a df that looks like this:
df <- data.frame("Logger" = c("119_1", "1","2","3","119_2","5","6","7","119_3","7","8","9"),
"Temp" =c (4.5, 5.7, 3.8, 8.9, 8.6, 10.5, 11.0, 7.8, 5.6, 7.8, 9.9, 17.3),
"RH" = c(6.5, 2.7, 11.8, 4.9, 3.6, 12.5, 115.0, 3.8, 9.6, 1.8, 3.9,5.3))
However, I want to "pad up" the names of the variable "Logger" so that all numbers get replaced by the last name until the next. So basically I want an output like this.
df_desired <- data.frame("Logger" = c("119_1", "119_1","119_1","119_1","119_2","119_2","119_2","119_2","119_3","119_3","119_3","119_3"),
"Temp" =c (4.5, 5.7, 3.8, 8.9, 8.6, 10.5, 11.0, 7.8, 5.6, 7.8, 9.9, 17.3),
"RH" = c(6.5, 2.7, 11.8, 4.9, 3.6, 12.5, 115.0, 3.8, 9.6, 1.8, 3.9,5.3))
How do I do that? I actually have no clue. I know that the padr package exists, but it just works with dates and NAs I guess. Can you guys help me out here? Thanks a lot!! Cheers
CodePudding user response:
You can replace the values that do not have _
with NA
and then use fill
.
library(dplyr)
library(tidyr)
df %>%
mutate(Logger = replace(Logger, !grepl('_', Logger), NA)) %>%
fill(Logger)
# Logger Temp RH
#1 119_1 4.5 6.5
#2 119_1 5.7 2.7
#3 119_1 3.8 11.8
#4 119_1 8.9 4.9
#5 119_2 8.6 3.6
#6 119_2 10.5 12.5
#7 119_2 11.0 115.0
#8 119_2 7.8 3.8
#9 119_3 5.6 9.6
#10 119_3 7.8 1.8
#11 119_3 9.9 3.9
#12 119_3 17.3 5.3
CodePudding user response:
We can use a group by approach i.e. create a group based on the presence of _
in the 'Logger' column, get the cumulative sum of logical vector, update the 'Logger' by taking the first
element and ungroup
library(dplyr)
library(stringr)
df %>%
group_by(grp = cumsum(str_detect(Logger, "_"))) %>%
mutate(Logger = first(Logger)) %>%
ungroup %>%
select(-grp)
-output
# A tibble: 12 x 3
Logger Temp RH
<chr> <dbl> <dbl>
1 119_1 4.5 6.5
2 119_1 5.7 2.7
3 119_1 3.8 11.8
4 119_1 8.9 4.9
5 119_2 8.6 3.6
6 119_2 10.5 12.5
7 119_2 11 115
8 119_2 7.8 3.8
9 119_3 5.6 9.6
10 119_3 7.8 1.8
11 119_3 9.9 3.9
12 119_3 17.3 5.3
Or may also use na.locf
after replace
ing the elements with NA
library(zoo)
df %>%
mutate(Logger = na.locf0(replace(Logger,
str_detect(Logger, "_", negate = TRUE), NA)))
-output
Logger Temp RH
1 119_1 4.5 6.5
2 119_1 5.7 2.7
3 119_1 3.8 11.8
4 119_1 8.9 4.9
5 119_2 8.6 3.6
6 119_2 10.5 12.5
7 119_2 11.0 115.0
8 119_2 7.8 3.8
9 119_3 5.6 9.6
10 119_3 7.8 1.8
11 119_3 9.9 3.9
12 119_3 17.3 5.3
CodePudding user response:
You can use ave
and group by cumsum(grepl("_", df$Logger))
and return the first element.
df$Logger <- ave(df$Logger, cumsum(grepl("_", df$Logger)), FUN=function(x) x[1])
df
# Logger Temp RH
#1 119_1 4.5 6.5
#2 119_1 5.7 2.7
#3 119_1 3.8 11.8
#4 119_1 8.9 4.9
#5 119_2 8.6 3.6
#6 119_2 10.5 12.5
#7 119_2 11.0 115.0
#8 119_2 7.8 3.8
#9 119_3 5.6 9.6
#10 119_3 7.8 1.8
#11 119_3 9.9 3.9
#12 119_3 17.3 5.3
Or using rep
:
i <- grep("_", df$Logger)
df$Logger <- rep(df$Logger[i], diff(c(i, 1 nrow(df))))