R: summarise a dataframe with NAN in columns-CodePudding

I have the following data set:

  Data1      Data2      Data3
      3        NAN        NAN
      2        NAN        NAN
      3        NAN        NAN
    NAN          3        NAN
    NAN          5        NAN
    NAN          3        NAN
    NAN        NAN          7
    NAN        NAN          5
    NAN        NAN          1

I'd like to summarise it into this:

  Data1   Data2      Data3
      3       3          7
      2       5          5
      3       3          1

I've trying creating a row number, grouping by row number and applying summarise, but it's just filling the NaNs with 0.

data = data %>% 
  mutate(row = row_number()) %>% 
  dplyr::group_by(row) %>% 
  dplyr::summarise(across(c("Data1","Data2","Data3"), ~sum(., na.rm=T))) %>% 
  distinct(.)

CodePudding user response：

If you have same number of NaN's in each column as shown in the example you can use na.omit to drop those values.

library(dplyr)

df %>% summarise(across(.fns = na.omit))
#If in your data values are string 'NAN' then use the below
#df %>% summarise(across(.fns = ~.x[.x!= 'NAN']))

#  Data1 Data2 Data3
#1     3     3     7
#2     2     5     5
#3     3     3     1

In base R -

as.data.frame(sapply(df, na.omit))

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(Data1 = c(3, 2, 3, NaN, NaN, NaN, NaN, NaN, NaN
), Data2 = c(NaN, NaN, NaN, 3, 5, 3, NaN, NaN, NaN), Data3 = c(NaN, 
NaN, NaN, NaN, NaN, NaN, 7, 5, 1)), row.names = c(NA, -9L), class = "data.frame")

CodePudding user response：

If you just want to compute the sums then this will do:

data %>% 
  summarise(across(c("Data1","Data2","Data3"), ~sum(., na.rm=T)))

EDIT:

You can ge rid of the NaNvalues by using a combination of pivot_longer and pivot_wider:

data %>%
  pivot_longer(starts_with('Data'), values_drop_na = TRUE) %>%
  arrange(name) %>%
  pivot_wider(names_from = name, values_from = value, values_fn = list) %>%
  unnest()
# A tibble: 3 x 3
  Data1 Data2 Data3
  <dbl> <dbl> <dbl>
1     3     3     7
2     2     5     5
3     1     3     1

Or even more nicely:

library(purrr)
map_dfr(df, na.omit)