I haven't be able to find this anywhere. I want to be able to create a column that uses the data from the column before the column that contains "end".
I may not even be explaining that well.
For Example:
df =
V1 V2 V3 V4 V5 V6
0 start 1 end ended 0
3 end 0 start 5 0
2 start 3 next 6 end
I want the new column to be the number before the following column says "end".
V1 V2 V3 V4 V5 V6 end_num
0 start 1 end ended 0 1
3 end 0 start 5 0 3
2 start 3 next 6 end 6
CodePudding user response:
A base
solution using max.col()
to find the position of "end" for each row:
df$end_num <- df[cbind(1:nrow(df), max.col(df == "end") - 1)]
df
# V1 V2 V3 V4 V5 V6 end_num
# 1 0 start 1 end ended 0 1
# 2 3 end 0 start 5 0 3
# 3 2 start 3 next 6 end 6
Data
df <- structure(list(V1 = c(0L, 3L, 2L), V2 = c("start", "end", "start"),
V3 = c(1L, 0L, 3L), V4 = c("end", "start", "next"), V5 = c("ended", "5", "6"),
V6 = c("0", "0", "end")), class = "data.frame", row.names = c(NA, -3L))
CodePudding user response:
A possible solution in base R
:
df$end_num <- apply(df, 1, \(x) x[which(x == "end") - 1])
df
#> V1 V2 V3 V4 V5 V6 end_num
#> 1 0 start 1 end ended 0 1
#> 2 3 end 0 start 5 0 3
#> 3 2 start 3 next 6 end 6
CodePudding user response:
Here is a some kind of laborious tidyverse
solution: :-)
library(dplyr)
library(tidyr)
library(readr)
df %>%
mutate(across(everything(), ~case_when(. == "end" ~ cur_column()), .names = 'new_{col}')) %>%
unite(New_Col, starts_with('new'), na.rm = TRUE, sep = ' ') %>%
mutate(New_Col = paste0("V", parse_number(New_Col)-1)) %>%
mutate(end_num = purrr::map2_chr(row_number(),New_Col,~df[.x,.y]), .keep="unused")
V1 V2 V3 V4 V5 V6 end_num
1 0 start 1 end ended 0 1
2 3 end 0 start 5 0 3
3 2 start 3 next 6 end 6
CodePudding user response:
Tidyverse solution: pivoting your dataframe into a long format, creating a new grouping column to ID the first and second columns, then summarise by row.
One nice feature about this approach is that you can identify if any rows have multiple instances of "end".
df <- data.frame(
V1 = c(0, 3, 2),
V2 = c("s", "e", "s"),
V3 = c(1, 0, 3),
V4 = c("e", "s", "n"),
V5 = c("en", 5, 6),
V6 = c(0, 0, "e")
)
library(dplyr)
library(tidyr)
df_pivot <- df |>
mutate(id = row_number()) |> # need a row number to unpivot
pivot_longer(
cols = c(everything(), -id), names_pattern = "(\\d )", # don't pivot id
names_transform = list(name = as.integer), values_transform = as.list
) |>
mutate(
col_rem = name %% 2,
col_group = (name col_rem) / 2, # round up to lowest divisible by 2
col_type = ifelse(col_rem == 0, "second", "first")
) |>
select(-col_rem, -name) |>
pivot_wider(names_from = col_type, values_from = value) |>
group_by(id) |>
summarise(
new = first[second == "e"]
)