Home > Mobile >  Assign new variable (from different df) to dataframe based on dates
Assign new variable (from different df) to dataframe based on dates

Time:04-15

I have two dataframes that I want to combine, but I'm having trouble. My first (and main) df consist of 6600 observations and approx. 235 columns. One of these columns is a date-variable, that describes on what date the respondent filled out the survey.

I want to add an extra column to the main df that shows the daily percentage of positive covid-tests on the relevant date ("PosPct"). I have another df that just consists of two columns; date PosPct. Therefore this df has much fewer rows than the main one, since the data only covers 311 dates (2 columns, 311 rows)

Date PosPct
2020-09-13 0.7
2020-09-14 0.7
2020-09-15 0.8
2020-09-16 1.0

I have tried merging the two datasets in many different ways now (with "merge", "cbind", "left_join"), but none of them can recognize that I want the values assigned to the dates in second df to be added to EVERY observation with the same date in the main df. For some dates in the main df I have more than a 100 observations (since more than 100 respondents filled out the survey on that date), and for some dates I have 0.

So I'm not even sure that merging the two dataframes is the best way to go? Maybe I should create a new variable in the main df with some sort of reference to the second df?

Ideally I want the main df to look sort of like this after the merger:

Date Respondent_ID Optimism_score Fear_score PosPct
2020-09-13 3022 High degree Some degree 0.7
2020-09-14 1906 Some degree Low degree 0.7
2020-09-16 14417 High degree No degree 1.0
2020-09-16 16836 Some degree Low degree 1.0
2020-09-16 1333 No degree High degree 1.0

So basically I just want every respondent ("Respondent_ID") to be assigned a numeric value ("PosPct") based on the day they filled out the survey ("Date").

I'm not the strongest user of R, so I'm not sure if I missed out an obvious way to fix this.

Can anyone help me?

EDIT: Tried adding some sample data below:

Main df:

          Id             q3_fear         q4_optimism date_string
157198  3022    To a high degree To a certain degree    20200913
157199  6984 To a certain degree To a certain degree    20200913
157200 12182 To a certain degree    To a high degree    20200913
157201  7744 To a certain degree  To a lesser degree    20200913
157202  3935 To a certain degree  To a lesser degree    20200913
157203  1906 To a certain degree To a certain degree    20200913

Small df:

  Date       PosPct
  <chr>       <dbl>
1 2020-09-13    0.7
2 2020-09-14    0.7
3 2020-09-15    0.8
4 2020-09-16    1  
5 2020-09-17    1  
6 2020-09-18    0.9

CodePudding user response:

full_join will keep all data from both tables. You need to convert the dates to the same format prior joining:

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

a <- tribble(
  ~Id, ~q3_fear, ~q4_optimism, ~date_string,
  3022, "To a high degree", "To a certain degree", 20200913,
  6984, "To a certain degree", "To a certain degree", 20200913,
  12182, "To a certain degree", "To a high degree", 20200913,
  7744, "To a certain degree", "To a lesser degree", 20200913,
  3935, "To a certain degree", "To a lesser degree", 20200913,
  1906, "To a certain degree", "To a certain degree", 20200913,
)

b <- tribble(
  ~Date, ~PosPct,
  "2020-09-13", 0.7,
  "2020-09-14", 0.7,
  "2020-09-15", 0.8,
  "2020-09-16", 1,
  "2020-09-17", 1,
  "2020-09-18", 0.9,
)

a %>%
  mutate(
    Date = date_string %>% as.character() %>% parse_date_time("ymd") %>% as.Date()
  ) %>%
  full_join(
    b %>% type_convert()
  )
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   Date = col_date(format = "")
#> )
#> Joining, by = "Date"
#> # A tibble: 11 × 6
#>       Id q3_fear             q4_optimism         date_string Date       PosPct
#>    <dbl> <chr>               <chr>                     <dbl> <date>      <dbl>
#>  1  3022 To a high degree    To a certain degree    20200913 2020-09-13    0.7
#>  2  6984 To a certain degree To a certain degree    20200913 2020-09-13    0.7
#>  3 12182 To a certain degree To a high degree       20200913 2020-09-13    0.7
#>  4  7744 To a certain degree To a lesser degree     20200913 2020-09-13    0.7
#>  5  3935 To a certain degree To a lesser degree     20200913 2020-09-13    0.7
#>  6  1906 To a certain degree To a certain degree    20200913 2020-09-13    0.7
#>  7    NA <NA>                <NA>                         NA 2020-09-14    0.7
#>  8    NA <NA>                <NA>                         NA 2020-09-15    0.8
#>  9    NA <NA>                <NA>                         NA 2020-09-16    1  
#> 10    NA <NA>                <NA>                         NA 2020-09-17    1  
#> 11    NA <NA>                <NA>                         NA 2020-09-18    0.9

Created on 2022-04-14 by the reprex package (v2.0.0)

CodePudding user response:

library(data.table)

sample data

dt1 <- fread('Id             q3_fear         q4_optimism date_string
                3022    "To a high degree" "To a certain degree"    20200913
                6984 "To a certain degree" "To a certain degree"    20200913
               12182 "To a certain degree"    "To a high degree"    20200913
                7744 "To a certain degree"  "To a lesser degree"    20200913
                3935 "To a certain degree"  "To a lesser degree"    20200913
                1906 "To a certain degree" "To a certain degree"    20200913')

dt2 <- fread("  Date       PosPct
 2020-09-13    0.7
 2020-09-14    0.7
 2020-09-15    0.8
 2020-09-16    1  
 2020-09-17    1  
 2020-09-18    0.9")

code

# set dates to date format, so we can join
dt1[, Date := as.Date(as.character(date_string), format = "%Y%m%d")]
dt2[, date := as.Date(Date)]
# join PosPct value to dt1
dt1[dt2, PosPct := i.PosPct, on = .(Date)]

output

#       Id             q3_fear         q4_optimism date_string       Date PosPct
# 1:  3022    To a high degree To a certain degree    20200913 2020-09-13    0.7
# 2:  6984 To a certain degree To a certain degree    20200913 2020-09-13    0.7
# 3: 12182 To a certain degree    To a high degree    20200913 2020-09-13    0.7
# 4:  7744 To a certain degree  To a lesser degree    20200913 2020-09-13    0.7
# 5:  3935 To a certain degree  To a lesser degree    20200913 2020-09-13    0.7
# 6:  1906 To a certain degree To a certain degree    20200913 2020-09-13    0.7
  •  Tags:  
  • r
  • Related