Home > Back-end >  How to extract row features, multiply respective rows and add single feature as column?
How to extract row features, multiply respective rows and add single feature as column?

Time:11-21

I have a dataset that looks something like this:

  id      col1    col2  col3  col4
1  1    12 ABC   Henry  Alex 13 AB
2  2       123      12 David   344
3  3      John     567  Luke  Huh8
4  4 123344567 abc 123  Paul    98
5  5  1345677.     Sam  17df   Tom
    

Goal: For each row, I would like to take every cell that does not contain a numerical value, and create new columns from the existing values of that row:

   Name      col1    col2 col3  col4
1 Henry    12 ABC    <NA> <NA> 13 AB
2  Alex    12 ABC    <NA> <NA> 13 AB
3 David       123      12 <NA>   344
4  John      <NA>     567 <NA>  Huh8
5  Luke      <NA>     567 <NA>  Huh8
6  Paul 123344567 abc 123 <NA>    98
7   Sam   1345677    <NA> 17df  <NA>
8   Tom   1345677    <NA> 17df  <NA>

Based on the nature of this question, I think the two following concepts can be used:

  • To determine if a column contains a numerical value, the following code can be used: grepl("\\d", my_data$col1)

  • I think some form of "pivot_wider" and "pivot_longer" might be applicable, but I am not sure exactly how to do this.

Can someone please show me how to do this?

Data

my_data <- structure(list(id = 1:5, col1 = c("12 ABC", "123", "John", "123344567", 
"1345677."), col2 = c("Henry", "12", "567", "abc 123", "Sam"), 
    col3 = c("Alex", "David", "Luke", "Paul", "17df"), col4 = c("13 AB", 
    "344", "Huh8", "98", "Tom")), class = "data.frame", row.names = c(NA, 
-5L))

CodePudding user response:

Making a boolean matrix bo using grepl in apply. Then in Map cbind the name identified by which and replace with NA.

bo <- apply(my_data, 1:2, \(x) !grepl('\\d', x))

Map(\(x, y, z) {
  lapply(y, \(i) cbind(Name=my_data[x, i], replace(my_data[x, ], y, NA))) |>
    do.call(what=rbind)
}, 
seq_len(nrow(my_data)), 
apply(bo, 1, which)) |>
  c(make.row.names=FALSE) |>
  do.call(what=rbind)
#    Name id      col1    col2 col3  col4
# 1 Henry  1    12 ABC    <NA> <NA> 13 AB
# 2  Alex  1    12 ABC    <NA> <NA> 13 AB
# 3 David  2       123      12 <NA>   344
# 4  John  3      <NA>     567 <NA>  Huh8
# 5  Luke  3      <NA>     567 <NA>  Huh8
# 6  Paul  4 123344567 abc 123 <NA>    98
# 7   Sam  5  1345677.    <NA> 17df  <NA>
# 8   Tom  5  1345677.    <NA> 17df  <NA>

CodePudding user response:

It's pretty messy and I think there must be more simple way, but you may try

library(tidyverse)
Name <- unlist(t(my_data), use.names = F)[!grepl("\\d", unlist(t(my_data)))]
key <- unname(sapply(Name, function(x) {c(1:nrow(my_data))[apply(my_data, 1, function(y) any(y %in% x))]}))
cbind(Name, my_data[key,]) %>%
  mutate(across(-Name, ~ifelse(grepl("\\d", .), ., NA))) %>%
  select(-id) %>%
  `rownames<-`(1:length(key))

   Name      col1    col2 col3  col4
1 Henry    12 ABC    <NA> <NA> 13 AB
2  Alex    12 ABC    <NA> <NA> 13 AB
3 David       123      12 <NA>   344
4  John      <NA>     567 <NA>  Huh8
5  Luke      <NA>     567 <NA>  Huh8
6  Paul 123344567 abc 123 <NA>    98
7   Sam  1345677.    <NA> 17df  <NA>
8   Tom  1345677.    <NA> 17df  <NA>

CodePudding user response:

A little inefficient, but simpler to understand,

new_data <- cbind(Names="", my_data)
new_data <- new_data[0,]

for (row in 1:nrow(my_data)) {
    temp_row <- my_data[row,]
    names <- vector()
    for (val in 1:ncol(temp)) {
        if (!grepl("\\d", temp_row[val])) {
            names <- append(temp_row[val], names)
            temp_row[val] <- NA
        }
    }
    for (name in 1:length(names)) {
        new_data[nrow(new_data) 1,] <- temp_row
        new_data[nrow(new_data),]["Names"] <- names[name]
    }
}
  • Related