Home > front end >  fill column of dataframes within a list with substring of dataframes names in R
fill column of dataframes within a list with substring of dataframes names in R

Time:07-06

I have a list of dataframes that look like this>


crops_1990.tempor <- data.frame(study_unit=c("unit1", "unit2", "unit3"),
                                cropp=c("crop1", "crop2", "crop3"),
                                area=c(1,2,3),
                                year=NA)

crops_1991.tempor <- data.frame(study_unit=c("unit1", "unit2", "unit3"),
                                cropp=c("crop1", "crop2", "crop3"),
                                area=c(4,5,6),
                                year=NA)

crops_1992.tempor <- data.frame(study_unit=c("unit1", "unit2", "unit3"),
                                cropp=c("crop1", "crop2", "crop3"),
                                area=c(7,8,9),
                                year=NA)


df_list <- list(crops_1990.tempor, crops_1991.tempor, crops_1992.tempor)

I would like to fill the column 'year' with the year information that is in the name of each df within the list (1990, 1991 and 1992, respectively in this example).

I thought it would be very easy but I'm struggling a lot!

I've tried stuff like:

df_list <- lapply(df_list, function(x) {x$year <- as.character(x$year); x}) 
 
df_list <- lapply(df_list, function(x) {x$year <- substring(names(df_list), 7,10); x}) # add years from object name in list

but nothing seems to work. My expected result would be the dataframes within the list looking like this:


crops_1990.tempor <- data.frame(study_unit=c("unit1", "unit2", "unit3"),
                                cropp=c("crop1", "crop2", "crop3"),
                                area=c(1,2,3),
                                year=c("1990", "1990", "1990"))

crops_1991.tempor <- data.frame(study_unit=c("unit1", "unit2", "unit3"),
                                cropp=c("crop1", "crop2", "crop3"),
                                area=c(4,5,6),
                                year=c("1991", "1991", "1991"))

crops_1992.tempor <- data.frame(study_unit=c("unit1", "unit2", "unit3"),
                                cropp=c("crop1", "crop2", "crop3"),
                                area=c(7,8,9),
                                year=c("1992", "1992", "1992"))

CodePudding user response:

Using tidyverse (lst names the list automatically*) you could do:

library(tidyverse)

lst(crops_1990.tempor, crops_1991.tempor, crops_1992.tempor) |>
  imap(~ .x |> mutate(year = .y |> str_extract("\\d ")))

Alternatively, you could put all of the objects of your environment containing crops_ into a list using mget and ls (faster if you have many data frames!):

mget(ls(pattern = "crops_")) |>
  imap(~ .x |> mutate(year = .y |> str_extract("\\d ")))

Output:

$crops_1990.tempor
  study_unit cropp area year
1      unit1 crop1    1 1990
2      unit2 crop2    2 1990
3      unit3 crop3    3 1990

$crops_1991.tempor
  study_unit cropp area year
1      unit1 crop1    4 1991
2      unit2 crop2    5 1991
3      unit3 crop3    6 1991

$crops_1992.tempor
  study_unit cropp area year
1      unit1 crop1    7 1992
2      unit2 crop2    8 1992
3      unit3 crop3    9 1992

NB! You should consider to putting your data into a list in the first place when you load your data. See e.g. on why: How do I make a list of data frames?

(*) One of the reasons why your approach isn't working is that the list is not named.

CodePudding user response:

Another potential way is:

## Creating list of dataframes
df_list <- list(crops_1990.tempor, crops_1991.tempor, crops_1992.tempor)

## Getting the name of all dataframes stored in R's global environment
names_of_dataframes <- ls.str(mode = "list")

## Inserting the values in Year column
for (i in 1:length(names(which(unlist(eapply(.GlobalEnv,is.data.frame)))))) {
    df_list[[i]]$year = as.numeric(str_extract_all(names(which(unlist(eapply(.GlobalEnv,is.data.frame))))[i], "[0-9] "))
}

## Unlisting all dataframes from the df_list
for (i in seq(df_list))
      assign(names(which(unlist(eapply(.GlobalEnv,is.data.frame))))[i], df_list[[i]])

Output

> crops_1990.tempor
  study_unit cropp area year
1      unit1 crop1    1 1990
2      unit2 crop2    2 1990
3      unit3 crop3    3 1990
> crops_1991.tempor
  study_unit cropp area year
1      unit1 crop1    7 1991
2      unit2 crop2    8 1991
3      unit3 crop3    9 1991
> crops_1992.tempor
  study_unit cropp area year
1      unit1 crop1    4 1992
2      unit2 crop2    5 1992
3      unit3 crop3    6 1992
  • Related