Home > Back-end >  How can I "pad" names
How can I "pad" names

Time:09-22

I have a df that looks like this:

df <- data.frame("Logger" = c("119_1", "1","2","3","119_2","5","6","7","119_3","7","8","9"),
                  "Temp"  =c (4.5, 5.7, 3.8, 8.9, 8.6, 10.5, 11.0, 7.8, 5.6, 7.8, 9.9, 17.3),
                  "RH"    = c(6.5, 2.7, 11.8, 4.9, 3.6, 12.5, 115.0, 3.8, 9.6, 1.8, 3.9,5.3))

However, I want to "pad up" the names of the variable "Logger" so that all numbers get replaced by the last name until the next. So basically I want an output like this.

df_desired <- data.frame("Logger" = c("119_1", "119_1","119_1","119_1","119_2","119_2","119_2","119_2","119_3","119_3","119_3","119_3"),
                 "Temp"  =c (4.5, 5.7, 3.8, 8.9, 8.6, 10.5, 11.0, 7.8, 5.6, 7.8, 9.9, 17.3),
                 "RH"    = c(6.5, 2.7, 11.8, 4.9, 3.6, 12.5, 115.0, 3.8, 9.6, 1.8, 3.9,5.3))

How do I do that? I actually have no clue. I know that the padr package exists, but it just works with dates and NAs I guess. Can you guys help me out here? Thanks a lot!! Cheers

CodePudding user response:

You can replace the values that do not have _ with NA and then use fill.

library(dplyr)
library(tidyr)

df %>%
  mutate(Logger = replace(Logger, !grepl('_', Logger), NA)) %>%
  fill(Logger)

#    Logger Temp    RH
#1   119_1  4.5   6.5
#2   119_1  5.7   2.7
#3   119_1  3.8  11.8
#4   119_1  8.9   4.9
#5   119_2  8.6   3.6
#6   119_2 10.5  12.5
#7   119_2 11.0 115.0
#8   119_2  7.8   3.8
#9   119_3  5.6   9.6
#10  119_3  7.8   1.8
#11  119_3  9.9   3.9
#12  119_3 17.3   5.3

CodePudding user response:

We can use a group by approach i.e. create a group based on the presence of _ in the 'Logger' column, get the cumulative sum of logical vector, update the 'Logger' by taking the first element and ungroup

library(dplyr)
library(stringr)
df %>% 
   group_by(grp = cumsum(str_detect(Logger, "_"))) %>%
   mutate(Logger = first(Logger)) %>% 
   ungroup %>% 
   select(-grp)

-output

# A tibble: 12 x 3
   Logger  Temp    RH
   <chr>  <dbl> <dbl>
 1 119_1    4.5   6.5
 2 119_1    5.7   2.7
 3 119_1    3.8  11.8
 4 119_1    8.9   4.9
 5 119_2    8.6   3.6
 6 119_2   10.5  12.5
 7 119_2   11   115  
 8 119_2    7.8   3.8
 9 119_3    5.6   9.6
10 119_3    7.8   1.8
11 119_3    9.9   3.9
12 119_3   17.3   5.3

Or may also use na.locf after replaceing the elements with NA

library(zoo)
df %>%
     mutate(Logger = na.locf0(replace(Logger, 
         str_detect(Logger, "_", negate = TRUE), NA)))

-output

Logger Temp    RH
1   119_1  4.5   6.5
2   119_1  5.7   2.7
3   119_1  3.8  11.8
4   119_1  8.9   4.9
5   119_2  8.6   3.6
6   119_2 10.5  12.5
7   119_2 11.0 115.0
8   119_2  7.8   3.8
9   119_3  5.6   9.6
10  119_3  7.8   1.8
11  119_3  9.9   3.9
12  119_3 17.3   5.3

CodePudding user response:

You can use ave and group by cumsum(grepl("_", df$Logger)) and return the first element.

df$Logger <- ave(df$Logger, cumsum(grepl("_", df$Logger)), FUN=function(x) x[1])

df
#   Logger Temp    RH
#1   119_1  4.5   6.5
#2   119_1  5.7   2.7
#3   119_1  3.8  11.8
#4   119_1  8.9   4.9
#5   119_2  8.6   3.6
#6   119_2 10.5  12.5
#7   119_2 11.0 115.0
#8   119_2  7.8   3.8
#9   119_3  5.6   9.6
#10  119_3  7.8   1.8
#11  119_3  9.9   3.9
#12  119_3 17.3   5.3

Or using rep:

i <- grep("_", df$Logger)
df$Logger <- rep(df$Logger[i], diff(c(i, 1 nrow(df))))
  •  Tags:  
  • r
  • Related