how to select a row from multiple rows with the same value in a column keep rows without null value-CodePudding

Assuming I have an id column, a Gene_ID, and a value column. more than one row of data has same Gene_ID and there is no value in some rows.

I'd like to search for rows by non null value in that column and only need one row contains each Gene_ID. For example, I have the below data frames:

 # ID Gene_ID  Value
 # 6  26470  1.137318
 # 7  10878  -1.051181
 # 8   ""    -1.316229
 # 9 26470  -1.015734

And I want the result to be:

 # ID Gene_ID  Value
 # 6  26470  1.137318
 # 7  10878  -1.051181

CodePudding user response：

library(tidyverse)

df %>%
  filter(Gene_ID != '') %>%
  group_by(Gene_ID) %>%
  slice(1) %>%
  ungroup()

This will keep the first row per Gene_Id.

Note that the filter command depends on the structure of your Gene ID column.