Use a columns data to get the column before it's data in R-CodePudding

I haven't be able to find this anywhere. I want to be able to create a column that uses the data from the column before the column that contains "end".

I may not even be explaining that well.

For Example:

df =

V1     V2  V3     V4     V5   V6
 0  start   1    end  ended    0
 3    end   0  start      5    0
 2  start   3   next      6  end

I want the new column to be the number before the following column says "end".

V1     V2  V3     V4     V5   V6  end_num
 0  start   1    end  ended    0        1
 3    end   0  start      5    0        3
 2  start   3   next      6  end        6

CodePudding user response：

A base solution using max.col() to find the position of "end" for each row:

df$end_num <- df[cbind(1:nrow(df), max.col(df == "end") - 1)]

df
#   V1    V2 V3    V4    V5  V6 end_num
# 1  0 start  1   end ended   0       1
# 2  3   end  0 start     5   0       3
# 3  2 start  3  next     6 end       6

Data

df <- structure(list(V1 = c(0L, 3L, 2L), V2 = c("start", "end", "start"),
V3 = c(1L, 0L, 3L), V4 = c("end", "start", "next"), V5 = c("ended", "5", "6"),
V6 = c("0", "0", "end")), class = "data.frame", row.names = c(NA, -3L))

CodePudding user response：

A possible solution in base R:

df$end_num <- apply(df, 1, \(x) x[which(x == "end") - 1])
df

#>   V1    V2 V3    V4    V5  V6 end_num
#> 1  0 start  1   end ended   0       1
#> 2  3   end  0 start     5   0       3
#> 3  2 start  3  next     6 end       6

CodePudding user response：

Here is a some kind of laborious tidyverse solution: :-)

library(dplyr)
library(tidyr)
library(readr)

df %>% 
  mutate(across(everything(), ~case_when(. == "end" ~ cur_column()), .names = 'new_{col}')) %>%
  unite(New_Col, starts_with('new'), na.rm = TRUE, sep = ' ') %>% 
  mutate(New_Col = paste0("V", parse_number(New_Col)-1)) %>% 
  mutate(end_num = purrr::map2_chr(row_number(),New_Col,~df[.x,.y]), .keep="unused")

  V1    V2 V3    V4    V5  V6 end_num
1  0 start  1   end ended   0       1
2  3   end  0 start     5   0       3
3  2 start  3  next     6 end       6

CodePudding user response：

Tidyverse solution: pivoting your dataframe into a long format, creating a new grouping column to ID the first and second columns, then summarise by row.

One nice feature about this approach is that you can identify if any rows have multiple instances of "end".

df <- data.frame(
  V1 = c(0, 3, 2),
  V2 = c("s", "e", "s"),
  V3 = c(1, 0, 3),
  V4 = c("e", "s", "n"),
  V5 = c("en", 5, 6),
  V6 = c(0, 0, "e")
)

library(dplyr)
library(tidyr)

df_pivot <- df |> 
  mutate(id = row_number()) |> # need a row number to unpivot
  pivot_longer(
    cols = c(everything(), -id), names_pattern = "(\\d )", # don't pivot id
    names_transform = list(name = as.integer), values_transform = as.list
  ) |> 
  mutate(
    col_rem = name %% 2,
    col_group = (name   col_rem) / 2, # round up to lowest divisible by 2
    col_type = ifelse(col_rem == 0, "second", "first")
  ) |> 
  select(-col_rem, -name) |> 
  pivot_wider(names_from = col_type, values_from = value) |> 
  group_by(id) |> 
  summarise(
    new = first[second == "e"]
  )