Home > Software design >  Add column to dataset
Add column to dataset

Time:06-27

I have a dataset with 2 columns: a name of a director and a certrain award he or she has achieved.

Here is my data:

df <- structure(list(Name = c("Mark", "Joseph", "Lucas"), Achievement = c("Cyber Award", 
"Biology Award", "Co-author of 'New are of technology safety'"
)), class = "data.frame", row.names = c(NA, -3L))
    Name                                 Achievement
1   Mark                                 Cyber Award
2 Joseph                               Biology Award
3  Lucas Co-author of 'New are of technology safety'

Now I want to add a third column which indicates if the achievement has anything to do with strings in a vector:

my_vector <- c("cyber", "Cyber", "technology", "Technology", "computer", "Computer")

(so three conditions with capital and normal letter).

Desired output:

    Name                                 Achievement Cyber Achievement
1   Mark                                 Cyber Award                 1
2 Joseph                               Biology Award                 0
3  Lucas Co-author of 'New are of technology safety'                 1

I have no clue where to start, hope anyone can help me.

CodePudding user response:

First create a pattern using paste with the collapse argument.

Then look with str_detect if any of these pattern strings are in the column string (Achievment).

If so 1 else 0:

library(dplyr)
library(stringr)

pattern <- paste(c("cyber", "Cyber", "technology", "Technology", "computer", "Computer"), collapse = "|")


df %>% 
  mutate(`Cyber Achievement` = ifelse(str_detect(Achievement, pattern), 1, 0))

OR base R using grepl:

df$Cyber_Achievemnt <- ifelse(grepl(pattern, df$Achievement), 1, 0)
    Name                                 Achievement Cyber Achievement
1   Mark                                 Cyber Award                 1
2 Joseph                               Biology Award                 0
3  Lucas Co-author of 'New are of technology safety'                 1

data:

structure(list(Name = c("Mark", "Joseph", "Lucas"), Achievement = c("Cyber Award", 
"Biology Award", "Co-author of 'New are of technology safety'"
)), class = "data.frame", row.names = c(NA, -3L))

CodePudding user response:

Another option:

library(dplyr)
library(stringr)
condition <- c("Cyber", "cyber", "Technology", "technology", "Computer", "Computer")
df %>% 
  rowwise() %>% 
  mutate(`Cyber Achievement` = sum(str_detect(Achievement, condition)))

Output:

# A tibble: 3 × 3
# Rowwise: 
  Name   Achievement                                 `Cyber Achievement`
  <chr>  <chr>                                                     <int>
1 Mark   Cyber Award                                                   1
2 Joseph Biology Award                                                 0
3 Lucas  Co-author of 'New are of technology safety'                   1
  • Related