I have this dataset in R:
id = 1:5
col1 = c("12 ABC", "123", "AB", "123344567", "1345677.")
col2 = c("gggw", "12", "567", "abc 123", "p")
col3 = c("abw", "abi", "klo", "poy", "17df")
col4 = c("13 AB", "344", "Huh8", "98", "b")
my_data = data.frame(id, col1, col2, col3, col4)
id col1 col2 col3 col4
1 1 12 ABC gggw abw 13 AB
2 2 123 12 abi 344
3 3 AB 567 klo Huh8
4 4 123344567 abc 123 poy 98
5 5 1345677. p 17df b
I then used the following code to check to see if a specific cell contains AT LEAST one number:
my_data$col1_check = grepl("\\d", my_data$col1)
my_data$col2_check = grepl("\\d", my_data$col2)
my_data$col3_check = grepl("\\d", my_data$col3)
my_data$col4_check = grepl("\\d", my_data$col4)
id col1 col2 col3 col4 col1_check col2_check col3_check col4_check
1 1 12 ABC gggw abw 13 AB TRUE FALSE FALSE TRUE
2 2 123 12 abi 344 TRUE TRUE FALSE TRUE
3 3 AB 567 klo Huh8 FALSE TRUE FALSE TRUE
4 4 123344567 abc 123 poy 98 TRUE TRUE FALSE TRUE
5 5 1345677. p 17df b TRUE FALSE TRUE FALSE
What I am trying to do, is for each row : I would like to take all columns in which the value is FALSE, and paste (with a space) the contents of these columns into a single cell.
This would look something like this:
id new_col
1 1 gggw abw
2 2 abi
3 3 AB klo
4 4 poy
5 5 p b
I have been trying to read about "conditional concatenation" (e.g. conditional concatenation in R), but so far nothing I have read matches the problem I am working on.
Can someone please suggest what to do from here?
Thanks!
CodePudding user response:
Here is one option in tidyverse
- loop across
the columns col1 to col4, get
the corresponding value from the logical column by paste
ing the _check
on the column names (cur_column()
), convert the TRUE values to NA in case_when
and unite
those columns to new_col
library(stringr)
library(dplyr)
library(tidyr)
my_data %>%
transmute(id, across(col1:col4,
~ case_when(!get(str_c(cur_column(), "_check"))~ .x))) %>%
unite(new_col, col1:col4, sep = " ", na.rm = TRUE)
-output
id new_col
1 1 gggw abw
2 2 abi
3 3 AB klo
4 4 poy
5 5 p b
If we want to skip creating the _check
, it will be easier as we can directly convert the elements that are not needed to NA
and unite
my_data %>%
mutate(across(col1:col4,
~ case_when(str_detect(.x, "\\d ", negate = TRUE) ~.x))) %>%
unite(new_col, col1:col4, sep = " ", na.rm = TRUE)
-output
id new_col
1 1 gggw abw
2 2 abi
3 3 AB klo
4 4 poy
5 5 p b
Or using base R
cbind(my_data[1], new_col = gsub("\\s{2,}", " ",
trimws(do.call(paste, replace(my_data[2:5],
as.matrix(my_data[6:9]), '')))))
-output
id new_col
1 1 gggw abw
2 2 abi
3 3 AB klo
4 4 poy
5 5 p b
CodePudding user response:
A base R approach
data.frame(id = my_data$id, new_col = apply(my_data[,-1], 1, function(x)
paste(x[!grepl("[[:digit:]]", x)], collapse=" ")))
id new_col
1 1 gggw abw
2 2 abi
3 3 AB klo
4 4 poy
5 5 p b
CodePudding user response:
Starting from my_data
you could use
library(dplyr)
library(tidyr)
library(stringr)
my_data %>%
pivot_longer(-id) %>%
filter(!str_detect(value, "\\d")) %>%
group_by(id) %>%
summarise(new_col = paste(value, collapse = " "))
This returns
# A tibble: 5 × 2
id new_col
<int> <chr>
1 1 gggw abw
2 2 abi
3 3 AB klo
4 4 poy
5 5 p b
CodePudding user response:
Updated improved code (thanks to @Martin Gal)
my_data %>%
transmute(across(-id, ~case_when(!str_detect(., '\\d') ~ .))) %>%
unite("New_col", col1:col4, na.rm = TRUE, sep = " ")
One more: Similar to @akrun's solution but not identical:
library(dplyr)
library(tidyr)
library(stringr)
my_data %>%
transmute(across(-id, ~case_when(!str_detect(., '\\d')== TRUE ~ .), .names = 'new_{col}')) %>%
unite(New_col, starts_with('new'), na.rm = TRUE, sep = ' ')
New_col
1 gggw abw
2 abi
3 AB klo
4 poy
5 p b
CodePudding user response:
You can do this without using any package. It might looks very tedious, but easy to follow if you have knowledge of apply
function:
data.frame(id, new_col = apply(my_data[, -1], 1, FUN = function(x) {
paste(x[!grepl("\\d", x)], collapse = " ") }))
my_data
id new_col
1 1 gggw abw
2 2 abi
3 3 AB klo
4 4 poy
5 5 p b