Home > Back-end >  comparing pairs as a separate entry in rows in R
comparing pairs as a separate entry in rows in R

Time:09-06

I have a column name file which is character variable giving information about left and right wing of insect (L.dw.png and R.dw.png) along with some other attributes.

I would like to see if any file entry does not exist in pair of left and right wing? Every odd row denotes left wing and every even row denotes right wing.

wings <- read.table("https://zenodo.org/record/6950928/files/AT-raw-coordinates.csv", header = TRUE, sep = ";")

First six entries are follow

                          file  sample country  x1  y1  x2  y2  x3  y3  x4  y4
 1 AT-0001-031-003678-L.dw.png AT-0001      AT 219 191 238 190 292 270 287 216
 2 AT-0001-031-003678-R.dw.png AT-0001      AT 213 190 234 189 289 268 281 211
 3 AT-0001-031-003679-L.dw.png AT-0001      AT 218 182 235 181 284 262 286 210
 4 AT-0001-031-003679-R.dw.png AT-0001      AT 214 185 234 183 283 264 285 211
 5 AT-0001-031-003680-L.dw.png AT-0001      AT 207 181 225 178 276 261 273 206
 6 AT-0001-031-003680-R.dw.png AT-0001      AT 203 181 222 180 271 261 267 206

If anyone can help me, i cannot write code script because i tried with few random codes after looking through search engines, which did not satisfy my query.

If anyone can lead, i shall be greatly thankful.

CodePudding user response:

This is a way to filter out id's where only one row exists, i.e. either L or R. I added one row to show that case:

library(dplyr)
    df %>% 
      mutate(code = substr(file, 13,18), 
             wing = substr(file, 20, 20)) %>% 
      group_split(code) %>% 
      purrr::keep(~nrow(.) == 1) %>% 
      bind_rows()

Output:

 file                        sample  country    x1    y1    x2    y2    x3    y3    x4    y4 code   wing 
  <chr>                       <chr>   <chr>   <int> <int> <int> <int> <int> <int> <int> <int> <chr>  <chr>
1 AT-0001-031-003681-R.dw.png AT-0001 AT        203   181   222   180   271   261   267   206 003681 R    

Data:

df <- read.table(text = "                          file  sample country  x1  y1  x2  y2  x3  y3  x4  y4
           1 AT-0001-031-003678-L.dw.png AT-0001      AT 219 191 238 190 292 270 287 216
           2 AT-0001-031-003678-R.dw.png AT-0001      AT 213 190 234 189 289 268 281 211
           3 AT-0001-031-003679-L.dw.png AT-0001      AT 218 182 235 181 284 262 286 210
           4 AT-0001-031-003679-R.dw.png AT-0001      AT 214 185 234 183 283 264 285 211
           5 AT-0001-031-003680-L.dw.png AT-0001      AT 207 181 225 178 276 261 273 206
           6 AT-0001-031-003680-R.dw.png AT-0001      AT 203 181 222 180 271 261 267 206
           7 AT-0001-031-003681-R.dw.png AT-0001      AT 203 181 222 180 271 261 267 206", h = TRUE)

CodePudding user response:

Another solution with base R:

wings <- read.table("https://zenodo.org/record/6950928/files/AT-raw-coordinates.csv", 
                    header = TRUE, sep = ";")

wings$wing_side <- ifelse(grepl("L", wings$file), "L", "R")
wings$file_name <- substr(wings$file, 1, 19)

table <- as.data.frame.matrix(table(wings$file_name, wings$wing_side))
wings[which(table$L != table$R), ]$file

Output:

 character(0) # only complete pairs in data.frame

Intentional removal of two arbitrary rows:

wings <- read.table("https://zenodo.org/record/6950928/files/AT-raw-coordinates.csv", 
                    header = TRUE, sep = ";")
wings <- wings[-c(1,5), ] # remove two arbitrary rows
wings$wing_side <- ifelse(grepl("L", wings$file), "L", "R")
wings$file_name <- substr(wings$file, 1, 19)

table <- as.data.frame.matrix(table(wings$file_name, wings$wing_side))
wings[which(table$L != table$R), ]$file

Output:

"AT-0001-031-003678-R.dw.png" "AT-0001-031-003679-R.dw.png" # list of incomplete files
  •  Tags:  
  • r
  • Related