How to adjust the numeric sequence in a column by group in R?-CodePudding

Before getting to the question, I browsed StackOverflow and found similar related questions to my own, though I could not find something that is specific to my issue.

I recently used R to pull some satellite data from an API that provide monthly summary statistics based on locations derived from a shapefile. Unfortunately, I could not submit the entire shapefile to the API as the request was simply too large, resulting in having to split the shapefile into 200 row segments.

The data returned from the API identifies each location in the "location" column with values labeled "location0001", "location0002", "location0003", etc. The dataset looks as follows:

location           date       data.stats1      data.stats2
------------   ------------  -------------   --------------
location0001    2012-01-01       data            data
location0001    2012-02-01       data            data
location0001    2012-03-01       data            data
location0001    2012-04-01       etc.            etc.
location0001    2012-05-01
location0001    2012-06-01
location0001    2012-07-01
location0001    2012-08-01
location0001    2012-09-01
location0001    2012-10-01
location0001    2012-11-01
location0001    2012-12-01
location0002    2012-01-01
location0002    2012-02-01
location0002    2012-03-01

You can see there are 12 iterations of "location0001" for each month of data throughout the year, followed by "location0002" for the next set of monthly data. This continues up to "location0200." This is fine for the first dataset, however, for subsequent datasets, I need the values to continue past 200 (e.g. I need "location0201", "location0202", "location0203" etc.).

What I would like to do is update the values in the "location" column to start on "0201" all the way to "0400" so that it looks like this:

location           date       data.stats1      data.stats2
------------   ------------  -------------   --------------
location0201    2012-01-01       data            data
location0201    2012-02-01       data            data
location0201    2012-03-01       data            data
location0201    2012-04-01       etc.            etc.
location0201    2012-05-01
location0201    2012-06-01
location0201    2012-07-01
location0201    2012-08-01
location0201    2012-09-01
location0201    2012-10-01
location0201    2012-11-01
location0201    2012-12-01
location0202    2012-01-01
location0202    2012-02-01
location0202    2012-03-01

I have tried mutating a new column using a combination of paste0(), seq(), and str_pad():

my.dataset <- my.dataset %>%
    group_by(location) %>%
    mutate(location_2 = paste0('location', str_pad(seq(201,400), 4, side = 'left', pad = '0'))) %>%
    ungroup()

This returns an error stating that "location_2 must be size 12 or 1, not 200" which seems to indicate that problem is occurring in the seq(201,400) portion (length is 200).

Is there a way I can update these values by group to reflect the numeric sequence I need?

Thanks!

CodePudding user response：

One option with tidyverse would be to separate the number, then change the numbers (here I just add 200), then put the columns back together.

library(tidyverse)

df %>%
  separate(location, c("location", "number"), "(?<=[A-Za-z])(?=[0-9])") %>%
  mutate(number = paste0("0", as.numeric(number)   200)) %>%
  unite("location", location:number, sep = "")

Output

       location       date
1  location0201 2012-01-01
2  location0201 2012-02-01
3  location0201 2012-03-01
4  location0201 2012-04-01
5  location0201 2012-05-01
6  location0201 2012-06-01
7  location0201 2012-07-01
8  location0201 2012-08-01
9  location0201 2012-09-01
10 location0201 2012-10-01
11 location0201 2012-11-01
12 location0201 2012-12-01
13 location0202 2012-01-01
14 location0202 2012-02-01
15 location0202 2012-03-01

Update

If you would like to turn this into a function and have a different starting number for each dataframe in a list, then we can use map2 to pass a vector of starting numbers.

sep_unite <- function(x, y) {
  x %>%
    separate(location, c("location", "number"), "(?<=[A-Za-z])(?=[0-9])") %>%
    mutate(number = paste0("0", as.numeric(number)   y)) %>%
    unite("location", location:number, sep = "")
}

df.list <- list(df, df, df, df)
start_num <- seq(200, 800, 200)

output.list <- map2(.x = df.list, .y = start_num, .f = sep_unite)

Output

map(output.list, head, 2)

[[1]]
      location       date
1 location0201 2012-01-01
2 location0201 2012-02-01

[[2]]
      location       date
1 location0401 2012-01-01
2 location0401 2012-02-01

[[3]]
      location       date
1 location0601 2012-01-01
2 location0601 2012-02-01

[[4]]
      location       date
1 location0801 2012-01-01
2 location0801 2012-02-01

Data

df <- structure(list(location = c("location0001", "location0001", "location0001", 
"location0001", "location0001", "location0001", "location0001", 
"location0001", "location0001", "location0001", "location0001", 
"location0001", "location0002", "location0002", "location0002"
), date = c("2012-01-01", "2012-02-01", "2012-03-01", "2012-04-01", 
"2012-05-01", "2012-06-01", "2012-07-01", "2012-08-01", "2012-09-01", 
"2012-10-01", "2012-11-01", "2012-12-01", "2012-01-01", "2012-02-01", 
"2012-03-01")), class = "data.frame", row.names = c(NA, -15L))