Home > Software engineering >  How can I add NA values conditionally in a vector in R?
How can I add NA values conditionally in a vector in R?

Time:11-19

Let's say my data is df <- c("Author1","Reference1","Abstract1","Author2","Reference2","Abstract2","Author3","Reference3","Author4","Reference4","Abstract4").

This is a series in which the order is Author, Reference and Abstract. But in some cases, the Abstract data is missing. (In this example, the third Abstract is missing.) So, how can I add NA values in place of Abstract, when Abstract is missing?

In other words, If an element in the vector starts with the word "Reference", but its next element doesn't start with the word "Abstract", I want to add an NA value just after the element starting with "Reference". The result vector should be result <- c("Author1","Reference1","Abstract1","Author2","Reference2","Abstract2","Author3","Reference3",NA,"Author4","Reference4","Abstract4") How can I do it?

I have tried the append function in R, but for using it, I need to have the index number of the element where I want to add NA. So, it takes a manual entry for each NA element.

CodePudding user response:

Here's an approach.

Bascially you get two vectors:

  1. which tests whether that element containts Reference, the other that checks that the element does not contain Abstract
  2. You offset one vector by 1, because you want to test whether abstract follows reference.
  3. you take the logical and
  4. then you insert NAs into the positions where abstract should be but isn't with append()
ab_missing <- grepl("Reference", df) & c(!grepl("Abstract", df)[-1], FALSE)
df <- append(df, NA, which(ab_missing))

df
 [1] "Author1"    "Reference1" "Abstract1"  "Author2"    "Reference2" "Abstract2"  "Author3"    "Reference3" NA           "Author4"   
[11] "Reference4" "Abstract4" 

CodePudding user response:

One way (and the only way I get these things done) is to think in tibbles or data frames: (So this is not the best approach)!

  1. We create a tibble of one column calling x,
  2. then we group by the numbers e.g. 1,1,1 with parse_number() function from readr (I love parse_number()),
  3. With summarise(cur_data()[seq(3),]) see expand each group to the max rows, see here Expand each group to the max n of rows 3a stop here and pull if NA is desired otherwise continue
  4. finally we use paste with r's recycling ability and pull the vector:

1. In case NA is desired:

library(dplyr)
library(readr)

my_vector <- tibble(x = c("Author1","Reference1","Abstract1","Author2","Reference2",
        "Abstract2","Author3","Reference3","Author4","Reference4","Abstract4")) %>% 
  group_by(group= parse_number(x)) %>% 
  summarise(cur_data()[seq(3),]) %>% 
  pull(x)

[1] "Author1"    "Reference1" "Abstract1"  "Author2"    "Reference2" "Abstract2"  "Author3"   
 [8] "Reference3" NA           "Author4"    "Reference4" "Abstract4"

2. In case the lacking word is desired:

library(dplyr)
library(readr)
my_vector <- tibble(x = c("Author1","Reference1","Abstract1","Author2","Reference2",
        "Abstract2","Author3","Reference3","Author4","Reference4","Abstract4")) %>% 
  group_by(group= parse_number(x)) %>% 
  summarise(cur_data()[seq(3),]) %>% 
  mutate(group = paste0(c("Author", "Reference", "Abstract"), group)) %>% 
  pull(group)
 [1] "Author1"    "Reference1" "Abstract1"  "Author2"    "Reference2" "Abstract2"  "Author3"   
 [8] "Reference3" "Abstract3"  "Author4"    "Reference4" "Abstract4" 

CodePudding user response:

A slightly different approach might be:

c(sapply(split(x, cumsum(grepl("Author", x))), function(x) head(c(x, NA_character_), 3)))

 [1] "Author1"    "Reference1" "Abstract1"  "Author2"    "Reference2" "Abstract2"  "Author3"   
 [8] "Reference3" NA           "Author4"    "Reference4" "Abstract4" 
  • Related