I have a column name file
which is character variable giving information about left and right wing of insect (L.dw.png
and R.dw.png
) along with some other attributes.
I would like to see if any file entry does not exist in pair of left and right wing? Every odd row denotes left wing and every even row denotes right wing.
wings <- read.table("https://zenodo.org/record/6950928/files/AT-raw-coordinates.csv", header = TRUE, sep = ";")
First six entries are follow
file sample country x1 y1 x2 y2 x3 y3 x4 y4
1 AT-0001-031-003678-L.dw.png AT-0001 AT 219 191 238 190 292 270 287 216
2 AT-0001-031-003678-R.dw.png AT-0001 AT 213 190 234 189 289 268 281 211
3 AT-0001-031-003679-L.dw.png AT-0001 AT 218 182 235 181 284 262 286 210
4 AT-0001-031-003679-R.dw.png AT-0001 AT 214 185 234 183 283 264 285 211
5 AT-0001-031-003680-L.dw.png AT-0001 AT 207 181 225 178 276 261 273 206
6 AT-0001-031-003680-R.dw.png AT-0001 AT 203 181 222 180 271 261 267 206
If anyone can help me, i cannot write code script because i tried with few random codes after looking through search engines, which did not satisfy my query.
If anyone can lead, i shall be greatly thankful.
CodePudding user response:
This is a way to filter out id's where only one row exists, i.e. either L or R. I added one row to show that case:
library(dplyr)
df %>%
mutate(code = substr(file, 13,18),
wing = substr(file, 20, 20)) %>%
group_split(code) %>%
purrr::keep(~nrow(.) == 1) %>%
bind_rows()
Output:
file sample country x1 y1 x2 y2 x3 y3 x4 y4 code wing
<chr> <chr> <chr> <int> <int> <int> <int> <int> <int> <int> <int> <chr> <chr>
1 AT-0001-031-003681-R.dw.png AT-0001 AT 203 181 222 180 271 261 267 206 003681 R
Data:
df <- read.table(text = " file sample country x1 y1 x2 y2 x3 y3 x4 y4
1 AT-0001-031-003678-L.dw.png AT-0001 AT 219 191 238 190 292 270 287 216
2 AT-0001-031-003678-R.dw.png AT-0001 AT 213 190 234 189 289 268 281 211
3 AT-0001-031-003679-L.dw.png AT-0001 AT 218 182 235 181 284 262 286 210
4 AT-0001-031-003679-R.dw.png AT-0001 AT 214 185 234 183 283 264 285 211
5 AT-0001-031-003680-L.dw.png AT-0001 AT 207 181 225 178 276 261 273 206
6 AT-0001-031-003680-R.dw.png AT-0001 AT 203 181 222 180 271 261 267 206
7 AT-0001-031-003681-R.dw.png AT-0001 AT 203 181 222 180 271 261 267 206", h = TRUE)
CodePudding user response:
Another solution with base R:
wings <- read.table("https://zenodo.org/record/6950928/files/AT-raw-coordinates.csv",
header = TRUE, sep = ";")
wings$wing_side <- ifelse(grepl("L", wings$file), "L", "R")
wings$file_name <- substr(wings$file, 1, 19)
table <- as.data.frame.matrix(table(wings$file_name, wings$wing_side))
wings[which(table$L != table$R), ]$file
Output:
character(0) # only complete pairs in data.frame
Intentional removal of two arbitrary rows:
wings <- read.table("https://zenodo.org/record/6950928/files/AT-raw-coordinates.csv",
header = TRUE, sep = ";")
wings <- wings[-c(1,5), ] # remove two arbitrary rows
wings$wing_side <- ifelse(grepl("L", wings$file), "L", "R")
wings$file_name <- substr(wings$file, 1, 19)
table <- as.data.frame.matrix(table(wings$file_name, wings$wing_side))
wings[which(table$L != table$R), ]$file
Output:
"AT-0001-031-003678-R.dw.png" "AT-0001-031-003679-R.dw.png" # list of incomplete files