Home > Software engineering >  Search rows with condition and change them
Search rows with condition and change them

Time:04-13

i have following dataframe

head(ca.spe)

 Dataset recording_date grid_cell
1       2 September 1987   7914/2 
2       1 September 1937   8025/3 
3       1 September 1933   8114/1 
4       1 September 1926   8021/4 
5       1 September 1926   8021/4 
6       1 September 1865   7522/21

In this dataframe are some rows, that I want to change.

Rows with grid cells that have two or three numbers after the / need to be changed.

for example row 6 has 7522/21

I want to converte this row into two rows with 7522/1 and 7522/2 every other information of the row should be left untouched.

The result of the operation should look like this:

head(ca.spe)

 Dataset recording_date grid_cell
1       2 September 1987   7914/2 
2       1 September 1937   8025/3 
3       1 September 1933   8114/1 
4       1 September 1926   8021/4 
5       1 September 1926   8021/4 
6       1 September 1865   7522/1
7       1 September 1865   7522/2

Has somebody a clue how to do this?

Cheers Mike

CodePudding user response:

Here is one option with tidyverse - separate the column 'grid_cell' into two by splitting at the /, then create rows by splitting between each character (or digit) in the 'grid_cell2' column and finally unite the 'grid_cell' columns to create the 'grid_cell' column

library(dplyr)
library(tidyr)
ca.spe %>%
   separate(grid_cell, into = c("grid_cell1", "grid_cell2"), sep = "/") %>% 
   separate_rows(grid_cell2, sep = "(?<=.)(?=.)") %>% 
   unite(grid_cell, starts_with('grid_cell'), sep = "/")

-output

# A tibble: 7 × 3
  Dataset recording_date grid_cell
    <int> <chr>          <chr>    
1       2 September 1987 7914/2   
2       1 September 1937 8025/3   
3       1 September 1933 8114/1   
4       1 September 1926 8021/4   
5       1 September 1926 8021/4   
6       1 September 1865 7522/2   
7       1 September 1865 7522/1  

data

ca.spe <- structure(list(Dataset = c(2L, 1L, 1L, 1L, 1L, 1L), 
recording_date = c("September 1987", 
"September 1937", "September 1933", "September 1926", "September 1926", 
"September 1865"), grid_cell = c("7914/2", "8025/3", "8114/1", 
"8021/4", "8021/4", "7522/21")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

CodePudding user response:

Another option using base R, where we first split on /, then create new columns with the part before and after. Then, we can split on the additional numbers (i.e., 21 into 2 and 1). Then, we create a new dataframe by replicating the length for the additional rows, and paste the grid columns back together.

ca.spe$grid_cell1 <- sapply(strsplit(ca.spe$grid_cell, split = "/"), "[", 1 )
ca.spe$grid_cell2 <- sapply(strsplit(ca.spe$grid_cell, split = "/"), "[", 2 )

s <- strsplit(ca.spe$grid_cell2, split = "")

ca.spe <- data.frame(Dataset = rep(ca.spe$Dataset, sapply(s, length)),
           recording_date = rep(ca.spe$recording_date, sapply(s, length)),
           grid_cell = paste0(rep(ca.spe$grid_cell1, sapply(s, length)), "/", unlist(s)))

Output

  Dataset   recording_date grid_cell
1       1 2 September 1987    7914/2
2       2 1 September 1937    8025/3
3       3 1 September 1933    8114/1
4       4 1 September 1926    8021/4
5       5 1 September 1926    8021/4
6       6 1 September 1865    7522/2
7       6 1 September 1865    7522/1

Data

ca.spe <- structure(list(Dataset = 1:6, recording_date = c("2 September 1987", 
"1 September 1937", "1 September 1933", "1 September 1926", "1 September 1926", 
"1 September 1865"), grid_cell = c("7914/2", "8025/3", "8114/1", 
"8021/4", "8021/4", "7522/21")), class = "data.frame", row.names = c(NA, 
-6L))
  • Related