Home > OS >  index a dataframe with repeated values according to vector
index a dataframe with repeated values according to vector

Time:07-22

I am trying to average values in different months over vectors of dates. Basically, I have a dataframe with monthly values of a variable, and I'm trying to get a representative average of the experienced values for samples that sometimes span month boundaries.

I've ended up with a dataframe of monthly values, and vectors of the representative number of "month-year" combinations of every sampling duration (e.g. if a sample was out from Jan 28, 2000 to Feb 1, 2000, the vector would show 4 values of Jan 2000, 1 value of Feb 2000). Later I'm going to average the values with these weights, so it's important that the returned variable values appear in representative numbers.

I am having trouble figuring out how to index the dataframe pulling the representative value repeatedly. See below.

# data frame of monthly values
reprex_df <- 
  tribble(
    ~my,   ~value,
    "2000-01",  10,
    "2000-02",  11,
    "2000-03",  15,
    "2000-04",  9,
    "2000-05",  13
  ) %>%
  as.data.frame()

# vector of month-year dates from Jan 28 to Feb 1:
reprex_vec <- c("2000-01","2000-01","2000-01","2000-01","2000-02")

# I want to index the df using the vector to get a return vector of 
# January value*4, Feb value*1, or 10, 10, 10, 10, 11

# I tried this:

reprex_df[reprex_df$my %in% reprex_vec,"value"]

# but %in% only returns each value once ("10 11", not "10 10 10 10 11").
# is there a different way I should be indexing to account for repeated values?

# eventually I will take an average, e.g.:

mean(reprex_df[reprex_df$my %in% reprex_vec,"value"])

# but I want this average to equal 10.2 for mean(c(10,10,10,10,11)), not 10.5 for mean(c(10,11))

CodePudding user response:

Simple tidy solution with inner_join:

dplyr::inner_join(reprex_df, data.frame(my = reprex_vec), by = "my")$value

CodePudding user response:

in base R:

merge(reprex_df, list(my = reprex_vec))
       my value
1 2000-01    10
2 2000-01    10
3 2000-01    10
4 2000-01    10
5 2000-02    11

CodePudding user response:

Perhaps use match from base R to get the index

 reprex_df[match(reprex_vec, reprex_df$my),]
         my value
1   2000-01    10
1.1 2000-01    10
1.2 2000-01    10
1.3 2000-01    10
2   2000-02    11

CodePudding user response:

Another base R option using setNames

with(
  reprex_df,
  data.frame(
    my = reprex_vec,
    value = setNames(value, my)[reprex_vec]
  )
)

gives

       my value
1 2000-01    10
2 2000-01    10
3 2000-01    10
4 2000-01    10
5 2000-02    11
  • Related