Home > database >  Scanning and Replacing Values of Rows in R
Scanning and Replacing Values of Rows in R

Time:07-01

I have this dataset:

sample_data = data.frame(col1 = c("james", "john", "henry"), col2 = c("123 forest road", "jason", "tim"), col3 = c("NA", "124 valley street", "peter"), col4 = c("NA", "NA",  "125 ocean road") )

   col1            col2              col3           col4
 james 123 forest road                NA             NA
 john           jason 124 valley street             NA
 henry             tim             peter 125 ocean road

I want to try and figure out a way in which the second column always contains the "address" - the final product would look this would look something like this:

# code to show sample of desired result
 desired_result = data.frame(col1 = c("james", "john", "henry"), col2 = c("123 forest road", "124 valley street", "125 ocean road"))

   col1              col2
 james   123 forest road
  john 124 valley street
 henry    125 ocean road

I have been trying to think of and research functions in R that are able to "scan" if the value contained within a row/column starts with a number, and make a decision accordingly.

I had the following idea - I can check to see if a given column starts with a number or not:

sample_data$is_col2_a_number = grepl("^[0-9]{1,}$", substr(sample_data$col2,1,1))
sample_data$is_col3_a_number = grepl("^[0-9]{1,}$", substr(sample_data$col3,1,1))
sample_data$is_col4_a_number = grepl("^[0-9]{1,}$", substr(sample_data$col4,1,1))

   col1            col2              col3           col4 is_col2_a_number is_col3_a_number is_col4_a_number
1 james 123 forest road                NA             NA             TRUE            FALSE            FALSE
2  john           jason 124 valley street             NA            FALSE             TRUE            FALSE
3 henry             tim             peter 125 ocean road            FALSE            FALSE             TRUE

Next, I would try to figure out how to code the following logic:

  • For a given row, find the first cell that contains the value TRUE
  • Keep the column corresponding to that condition

I tried this row-by-row:

first_row = sample_data[1,]

ifelse(first_row$is_col2_a_number == "TRUE", first_row[,c(1,2)], ifelse(first_row$is_col3_a_number, first_row[, c(1,3)], first_row[, c(1,4)]))

But I think I have made this unnecessarily complicated. Can someone please give me a hand and suggest how I can continue solving this problem?

Thank you!

CodePudding user response:

This should work:

library(dplyr)
library(tidyr)
library(stringr)
sample_data = data.frame(col1 = c("james", "john", "henry"), col2 = c("123 forest road", "jason", "tim"), col3 = c("NA", "124 valley street", "peter"), col4 = c("NA", "NA",  "125 ocean road") )

tmp <- sample_data %>% 
  mutate(across(col2:col4, ~case_when(str_detect(.x, "^\\d") ~ .x, 
                                      TRUE ~ NA_character_)), 
  address = coalesce(col2, col3, col4)) %>% 
  select(col1, address)
tmp
#>    col1           address
#> 1 james   123 forest road
#> 2  john 124 valley street
#> 3 henry    125 ocean road

Created on 2022-06-30 by the reprex package (v2.0.1)

CodePudding user response:

I thought of a (very ineffective) way to solve my own problem!

sample_data = data.frame(col1 = c("james", "john", "henry"), col2 = c("123 forest road", "jason", "tim"), col3 = c("NA", "124 valley street", "peter"), col4 = c("NA", "NA",  "125 ocean road") )


sample_data$is_col2_a_number = grepl("^[0-9]{1,}$", substr(sample_data$col2,1,1))
sample_data$is_col3_a_number = grepl("^[0-9]{1,}$", substr(sample_data$col3,1,1))
sample_data$is_col4_a_number = grepl("^[0-9]{1,}$", substr(sample_data$col4,1,1))

a1 <- sample_data[which(sample_data$is_col2_a_number == "TRUE"), ]
a1 <- a1[,c(1,2)]
colnames(a1)[2] <- "i"

b1 <- sample_data[which(sample_data$is_col3_a_number == "TRUE"), ]
b1 <- b1[,c(1,3)]
colnames(b1)[2] <- "i"

c1 <- sample_data[which(sample_data$is_col4_a_number == "TRUE"), ]
c1 <- c1[,c(1,4)]
colnames(c1)[2] <- "i"

final = rbind(a1,b1,c1)

Here is the desired output:

   col1                 i
1 james   123 forest road
2  john 124 valley street
3 henry    125 ocean road
  • Related