R replace string based on other string in column-CodePudding

I have column in dataframe which has following structure in the rows.

first cycle
first cycle
shifting cycle
shifting cycle
shifting cycle
2nd cycle 
2nd cycle
2nd cycle
shifting cycle
shifting cycle
3rd cycle
3rd cycle

I want to replace all rows with first entry of shifting cycle to shifting cycle 1 and 2nd entry of shifting cycle to shifting cycle 2. Basically it a string operation which I don' how to do it. Right I am doing it based on the value in other column but it is not appropriate to find the value in other column manually since value varies in many files.

My code

df$column <-str_replace(df$column, "Shifting cycle", "Shifting cycle 2")
df <- df %>% mutate(column = case_when(other_column ==30~ 'Shifting cycle 1' ,T~column))

so final output will be

first cycle
first cycle
shifting cycle 1
shifting cycle 1
shifting cycle 1
2nd cycle
2nd cycle 
2nd cycle
shifting cycle 2
shifting cycle 2
3rd cycle
3rd cycle

CodePudding user response：

Run Length Encoding to detect runs, generate a sequence to match number of rows in the dataframe and pass it through integer division by 2
(1 1 2 2 2 3 3 3 4 4 5 5 becomes 0 0 1 1 1 1 1 1 2 2 2 2 ).

The very first run will be coded as 0, to deal with those cases where dataset starts with a shifting cycle , we'll add sc_runs$values[1], this avoids shifting cycle 0

library(dplyr)
sc_str <- "shifting cycle"

sc_runs <- rle(df2$column == sc_str)
sc_runs
#> Run Length Encoding
#>   lengths: int [1:5] 2 3 3 2 2
#>   values : logi [1:5] FALSE TRUE FALSE TRUE FALSE

sc <- rep(seq_along(sc_runs$lengths), sc_runs$lengths) %/% 2   sc_runs$values[1]
sc
#>  [1] 0 0 1 1 1 1 1 1 2 2 2 2

df2 %>%
  mutate(
    column =
      case_when(
        column == sc_str ~ paste(column, sc),
        TRUE ~ column
      )
  )
#> # A tibble: 12 × 1
#>    column          
#>    <chr>           
#>  1 first.cycle     
#>  2 first cycle     
#>  3 shifting cycle 1
#>  4 shifting cycle 1
#>  5 shifting cycle 1
#>  6 2nd cycle       
#>  7 2nd cycle       
#>  8 2nd cycle       
#>  9 shifting cycle 2
#> 10 shifting cycle 2
#> 11 3rd cycle       
#> 12 3rd cycle

Input data:

df2 <- tibble::tribble(
           ~column,
     "first.cycle",
     "first cycle",
  "shifting cycle",
  "shifting cycle",
  "shifting cycle",
       "2nd cycle",
       "2nd cycle",
       "2nd cycle",
  "shifting cycle",
  "shifting cycle",
       "3rd cycle",
       "3rd cycle"
  )

df <- tibble::tribble(
           ~column,
     "first.cycle",
     "first cycle",
  "shifting cycle",
       "2nd cycle",
       "2nd cycle",
       "2nd cycle",
  "shifting cycle",
       "3rd cycle",
       "3rd cycle"
  )

# answer that was targeting the first variant of example data
df %>%
  mutate(
    column =
      case_when(
        column == "shifting cycle" ~ paste(column, cumsum(column == "shifting cycle")),
        TRUE ~ column
      )
  )

^{Created on 2023-01-30 with reprex v2.0.2}

CodePudding user response：

library(tidyverse)

df %>%
  group_by(group = str_detect(cycles, "shifting cycle")) %>%
  mutate(cycles = case_when(
    group == TRUE ~ str_c(cycles, 1:n(), sep = " "),
    TRUE ~ cycles)) %>% 
  ungroup() %>% 
  select(-group)

# A tibble: 9 × 1
  cycles          
  <chr>           
1 first cycle     
2 first cycle     
3 shifting cycle 1
4 2nd cycle       
5 2nd cycle       
6 2nd cycle       
7 shifting cycle 2
8 3rd cycle       
9 3rd cycle