I have a dataset that looks something like this:
id col1 col2 col3 col4
1 1 12 ABC Henry Alex 13 AB
2 2 123 12 David 344
3 3 John 567 Luke Huh8
4 4 123344567 abc 123 Paul 98
5 5 1345677. Sam 17df Tom
Goal: For each row, I would like to take every cell that does not contain a numerical value, and create new columns from the existing values of that row:
Name col1 col2 col3 col4
1 Henry 12 ABC <NA> <NA> 13 AB
2 Alex 12 ABC <NA> <NA> 13 AB
3 David 123 12 <NA> 344
4 John <NA> 567 <NA> Huh8
5 Luke <NA> 567 <NA> Huh8
6 Paul 123344567 abc 123 <NA> 98
7 Sam 1345677 <NA> 17df <NA>
8 Tom 1345677 <NA> 17df <NA>
Based on the nature of this question, I think the two following concepts can be used:
To determine if a column contains a numerical value, the following code can be used:
grepl("\\d", my_data$col1)
I think some form of "pivot_wider" and "pivot_longer" might be applicable, but I am not sure exactly how to do this.
Can someone please show me how to do this?
Data
my_data <- structure(list(id = 1:5, col1 = c("12 ABC", "123", "John", "123344567",
"1345677."), col2 = c("Henry", "12", "567", "abc 123", "Sam"),
col3 = c("Alex", "David", "Luke", "Paul", "17df"), col4 = c("13 AB",
"344", "Huh8", "98", "Tom")), class = "data.frame", row.names = c(NA,
-5L))
CodePudding user response:
Making a boolean matrix bo
using grepl
in apply
. Then in Map
cbind
the name identified by which
and replace
with NA
.
bo <- apply(my_data, 1:2, \(x) !grepl('\\d', x))
Map(\(x, y, z) {
lapply(y, \(i) cbind(Name=my_data[x, i], replace(my_data[x, ], y, NA))) |>
do.call(what=rbind)
},
seq_len(nrow(my_data)),
apply(bo, 1, which)) |>
c(make.row.names=FALSE) |>
do.call(what=rbind)
# Name id col1 col2 col3 col4
# 1 Henry 1 12 ABC <NA> <NA> 13 AB
# 2 Alex 1 12 ABC <NA> <NA> 13 AB
# 3 David 2 123 12 <NA> 344
# 4 John 3 <NA> 567 <NA> Huh8
# 5 Luke 3 <NA> 567 <NA> Huh8
# 6 Paul 4 123344567 abc 123 <NA> 98
# 7 Sam 5 1345677. <NA> 17df <NA>
# 8 Tom 5 1345677. <NA> 17df <NA>
CodePudding user response:
It's pretty messy and I think there must be more simple way, but you may try
library(tidyverse)
Name <- unlist(t(my_data), use.names = F)[!grepl("\\d", unlist(t(my_data)))]
key <- unname(sapply(Name, function(x) {c(1:nrow(my_data))[apply(my_data, 1, function(y) any(y %in% x))]}))
cbind(Name, my_data[key,]) %>%
mutate(across(-Name, ~ifelse(grepl("\\d", .), ., NA))) %>%
select(-id) %>%
`rownames<-`(1:length(key))
Name col1 col2 col3 col4
1 Henry 12 ABC <NA> <NA> 13 AB
2 Alex 12 ABC <NA> <NA> 13 AB
3 David 123 12 <NA> 344
4 John <NA> 567 <NA> Huh8
5 Luke <NA> 567 <NA> Huh8
6 Paul 123344567 abc 123 <NA> 98
7 Sam 1345677. <NA> 17df <NA>
8 Tom 1345677. <NA> 17df <NA>
CodePudding user response:
A little inefficient, but simpler to understand,
new_data <- cbind(Names="", my_data)
new_data <- new_data[0,]
for (row in 1:nrow(my_data)) {
temp_row <- my_data[row,]
names <- vector()
for (val in 1:ncol(temp)) {
if (!grepl("\\d", temp_row[val])) {
names <- append(temp_row[val], names)
temp_row[val] <- NA
}
}
for (name in 1:length(names)) {
new_data[nrow(new_data) 1,] <- temp_row
new_data[nrow(new_data),]["Names"] <- names[name]
}
}