Home > Software engineering >  Is there R codes to organise these data in R?
Is there R codes to organise these data in R?

Time:11-21

I would like to remove NA from my data set and then organise them by IDs.

My dataset is similar to this:

  

  df<-read.table (text="ID  Name    Surname Group   A1  A2  A3  Goal    Sea
    21  Goal    Robi    A   4   4   4   G   No
    21  Goal    Robi    B   NA  NA  NA  NA  NA
    21  Goal    Robi    C   NA  NA  NA  NA  NA
    21  Goal    Robi    D   3   4   4   G   No
    33  Nami    Si  O   NA  NA  NA  NA  NA
    33  Nami    Si  P   NA  NA  NA  NA  NA
    33  Nami    Si  Q   3   4   4   G   No
    33  Nami    Si  Z   3   3   3   S   No
    98  Sara    Bat MT  4   4   4   S   No
    98  Sara    Bat NC  4   3   2   D   No
    98  Sara    Bat MF  NA  NA  NA  NA  NA
    98  Sara    Bat LC  NA  NA  NA  NA  NA
    66  Noor    Shor    MF  NA  NA  NA  NA  NA
    66  Noor    Shor    LC  NA  NA  NA  NA  NA
    66  Noor    Shor    MT1 4   4   4   G   No
    66  Noor    Shor    NC1 2   3   3   D   No

    ", header=TRUE)
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

By removing NA, rows and columns get a datframe with a lack of NA. So I would like to get this table

ID  Name    Surname Group_1 A1  A2  A3  Goal_1  Sea_1   Group_2 A1_1    A2_2    A3_3    Goal_2  Sea_2
21  Goal    Robi    A   4   4   4   G   No  D   3   4   4   G   No
33  Nami    Si  Q   3   4   4   G   No  Z   3   3   3   S   No
98  Sara    Bat MT  4   4   4   S   No  NC  4   3   2   D   No
66  Noor    Shor    Mt1 4   4   4   G   No  NC1 2   3   3   D   No
<iframe name="sif2" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

Is it possible to get it. It seems we could do it using pivot_longer, but I do not know ho to get it

CodePudding user response:

search for complete.cases()

final = final[complete.cases(final), ]

CodePudding user response:

A possible solution with the Tidyverse:

df <- structure(list(ID = c(21L, 21L, 21L, 21L, 33L, 33L, 33L, 33L, 
98L, 98L, 98L, 98L, 66L, 66L, 66L, 66L), Name = c("Goal", "Goal", 
"Goal", "Goal", "Nami", "Nami", "Nami", "Nami", "Sara", "Sara", 
"Sara", "Sara", "Noor", "Noor", "Noor", "Noor"), Surname = c("Robi", 
"Robi", "Robi", "Robi", "Si", "Si", "Si", "Si", "Bat", "Bat", 
"Bat", "Bat", "Shor", "Shor", "Shor", "Shor"), Group = c("A", 
"B", "C", "D", "O", "P", "Q", "Z", "MT", "NC", "MF", "LC", "MF", 
"LC", "MT1", "NC1"), A1 = c(4L, NA, NA, 3L, NA, NA, 3L, 3L, 4L, 
4L, NA, NA, NA, NA, 4L, 2L), A2 = c(4L, NA, NA, 4L, NA, NA, 4L, 
3L, 4L, 3L, NA, NA, NA, NA, 4L, 3L), A3 = c(4L, NA, NA, 4L, NA, 
NA, 4L, 3L, 4L, 2L, NA, NA, NA, NA, 4L, 3L), Goal = c("G", NA, 
NA, "G", NA, NA, "G", "S", "S", "D", NA, NA, NA, NA, "G", "D"
), Sea = c("No", NA, NA, "No", NA, NA, "No", "No", "No", "No", 
NA, NA, NA, NA, "No", "No")), class = "data.frame", row.names = c(NA, 
-16L))

new_df <- df %>% 
  drop_na() %>% 
  group_by(ID) %>% 
  mutate(n = row_number()) %>% 
  pivot_wider(
    names_from = n, 
    values_from= c(Group, A1, A2, A3, Goal, Sea)
  ) %>% 
  relocate(ends_with("2"), .after= last_col())

print(new_df)

CodePudding user response:

We can group_by the ID columns and then filter out rows with all NAs in the target columns:

df %>% group_by(ID, Name, Surname) %>%
    filter(!if_all(A1:Sea, is.na))%>%
    slice_head(n=1)

# A tibble: 4 × 9
# Groups:   ID, Name, Surname [4]
     ID Name  Surname Group    A1    A2    A3 Goal  Sea  
  <int> <chr> <chr>   <chr> <int> <int> <int> <chr> <chr>
1    21 Goal  Robi    A         4     4     4 G     No   
2    33 Nami  Si      Q         3     4     4 G     No   
3    66 Noor  Shor    MT1       4     4     4 G     No   
4    98 Sara  Bat     MT        4     4     4 S     No   
  •  Tags:  
  • r
  • Related