Home > Enterprise >  R trying to bin with group_by and summarise, but output full of NAs?
R trying to bin with group_by and summarise, but output full of NAs?

Time:10-10

I have a long tibble with 3 columns. Example:

Spectrum Mass Intensity
Spectrum A 14.1 5
Spectrum A 14.2 3
Spectrum A 14.1 20
Spectrum A 20.3 7
Spectrum B 11.7 4
Spectrum B 13.4 5
Spectrum B 33.0 5
Spectrum B 33.1 10
Spectrum Z n n

For each Spectrum, I wish to round the Mass values to the nearest integer. Then, if there are multiple entries of the same integer Mass, only keep the row with the highest Intensity value.

Example desired output:

Spectrum Mass Intensity
Spectrum A 14 20
Spectrum A 20 7
Spectrum B 12 4
Spectrum B 13 5
Spectrum B 33 10
Spectrum Z n n

To do this, I thought this code should work:

df_binned <- df %>%
  mutate(Mass = round(Mass, 0)) %>%
  group_by(Spectrum, Mass) %>%
  summarise(Intensity = max(Intensity)) %>%
  ungroup()

Unfortunately, the actual output's Intensity is almost fully NAs. For many Spectrums, it's 100% NAs. I have verified that the Mass and Intensity columns are indeed numerics, not characters.

Does anyone know why this is happening? And/or how can I implement this "binning" transformation?

CodePudding user response:

You need to do

df %>%
  mutate(Mass = round(Mass, 0)) %>%
  group_by(Spectrum, Mass) %>%
  summarise(Intensity = max(Intensity, na.rm = TRUE)) %>%
  ungroup()

Otherwise even a single missing value in a group will cause the whole group to be NA

  • Related