R trying to bin with group_by and summarise, but output full of NAs?-CodePudding

I have a long tibble with 3 columns. Example:

Spectrum	Mass	Intensity
Spectrum A	14.1	5
Spectrum A	14.2	3
Spectrum A	14.1	20
Spectrum A	20.3	7
Spectrum B	11.7	4
Spectrum B	13.4	5
Spectrum B	33.0	5
Spectrum B	33.1	10
…	…	…
Spectrum Z	n	n

For each Spectrum, I wish to round the Mass values to the nearest integer. Then, if there are multiple entries of the same integer Mass, only keep the row with the highest Intensity value.

Example desired output:

Spectrum	Mass	Intensity
Spectrum A	14	20
Spectrum A	20	7
Spectrum B	12	4
Spectrum B	13	5
Spectrum B	33	10
…	…	…
Spectrum Z	n	n

To do this, I thought this code should work:

df_binned <- df %>%
  mutate(Mass = round(Mass, 0)) %>%
  group_by(Spectrum, Mass) %>%
  summarise(Intensity = max(Intensity)) %>%
  ungroup()

Unfortunately, the actual output's Intensity is almost fully NAs. For many Spectrums, it's 100% NAs. I have verified that the Mass and Intensity columns are indeed numerics, not characters.

Does anyone know why this is happening? And/or how can I implement this "binning" transformation?

CodePudding user response：

You need to do

df %>%
  mutate(Mass = round(Mass, 0)) %>%
  group_by(Spectrum, Mass) %>%
  summarise(Intensity = max(Intensity, na.rm = TRUE)) %>%
  ungroup()

Otherwise even a single missing value in a group will cause the whole group to be NA