Home > Mobile >  Reducing dataframe by removing NAs in column R
Reducing dataframe by removing NAs in column R

Time:11-01

I have a dataframe where each respondent provides a score to two profiles. Here is how the data is formated. I am looking to remove the NAs such that score_1 and score_2 are only scores, without NAs. Obviously this would reduce the number of rows, eliminate the profile column, and reduce the id column to one index per respondent (instead of the actual 2). Is there anyway to do this?

df <- data.frame(
  score_1 = c(1, NA, 4, NA, 9, NA, 12, NA),
  score_2 = c(NA, 4, NA, 29, NA, 12, NA, 9),
  profile = c(1, 2, 1, 2, 1, 2, 1, 2),
  id = c(1, 1, 2, 2, 3, 3, 4, 4), 
  id_attribute = c(2, 2, 4, 4, 8, 8, 5, 5)
)

  score_1 score_2 profile id id_attribute
1       1      NA       1  1            2
2      NA       4       2  1            2
3       4      NA       1  2            4
4      NA      29       2  2            4
5       9      NA       1  3            8
6      NA      12       2  3            8
7      12      NA       1  4            5
8      NA       9       2  4            5

This is how I see the final df:

df_final <- data.frame(
  score_1 = c(1, 4, 9, 12),
  score_2 = c(4, 29, 12, 9),
  id = c(1, 2, 3, 4), 
  id_attribute = c(2, 4, 8, 5)
)

score_1 score_2 id id_attribute
1       1       4  1            2
2       4      29  2            4
3       9      12  3            8
4      12       9  4            5

My intention is to substract each score with the id_attribute, which I can only do if NAs are removed.

CodePudding user response:

Using tidyr::fill():

library(tidyr)
library(dplyr)

df %>%
  fill(score_1) %>%
  drop_na() %>%
  select(!profile)
  score_1 score_2 id id_attribute
1       1       4  1            2
2       4      29  2            4
3       9      12  3            8
4      12       9  4            5

CodePudding user response:

df %>% as_tibble() %>% pivot_longer(cols = contains("score"), names_to = c(".value", "rep"), names_sep = "_") %>% drop_na() %>% select(-profile) %>% pivot_wider(names_from = rep, values_from = score

I tried to use "pivot longer" and "pivot wider" to drop NA, but fill() seems more convenient.

  • Related