Home > Blockchain >  R Studio: Creating a factor column based on unknown numbers in another column
R Studio: Creating a factor column based on unknown numbers in another column

Time:08-12

Im trying to add columns to a data frame with factor levels based on a value in another column. Doing this through piping is easy enough and I have a script that does largely what I want:

Dai3D_evening_allBNR<- list.files(path = 'Z:/fishproj/Cambodia Dai project/Analytic/Flux/River_Width/Dai3C',                               #identifies all .csv files associated with Dai15 full water column Sv measurements and compiles them into one data frame
                                         pattern = "^Dai3D_ABC_10mbin_20211209_fullwatercolumn_evening_BNR*.*csv", full.names = TRUE) %>%
  map_dfr(read_csv) %>%
  mutate(BNR = case_when(
    Region_ID == 10 ~ "BNR1",
    Region_ID == 13 ~ "BNR2",
    Region_ID == 15 ~ "BNR3",
    TRUE ~ as.character(Region_ID)))

It produces a dataframe that looks like this:

     Region_ID  Sv_mean  BNR
1        10    -64.01115 BNR1
2        10    -64.96363 BNR1
3        10    -67.98841 BNR1
4        13    -66.88734 BNR2
5        13    -69.79789 BNR2
6        13    -69.94071 BNR2
7        15    -66.04855 BNR3
8        15    -68.31167 BNR3
9        15    -68.67383 BNR3

The 'mutate' function creates a factor column with those 3 levels. The issue is, that the numbers in the 'Region_ID' column are randomly generated for each file (in this instance it is 10, 13, and 15) so I have to edit the numbers with each iteration manually. The nice part is there are only 3 different numbers. I want to have the script automatically recognize the three different numbers and apply factor levels based on those. Some type of ordering is necessary, for example, the first number is always 'BNR1' the second value is always 'BNR2', etc. I did try this using conditional variables but had no luck. Perhaps someone else knows this better than I do.

CodePudding user response:

We could do this automatically by either match applied on the unique values of 'Region_ID' to return the index and then paste with 'BNR' substring or convert to factor with levels specified as unique(Region_ID) and coerce to integer with as.integer

list.files(path = 'Z:/fishproj/Cambodia Dai project/Analytic/Flux/River_Width/Dai3C',                               #identifies all .csv files associated with Dai15 full water column Sv measurements and compiles them into one data frame
                                         pattern = "^Dai3D_ABC_10mbin_20211209_fullwatercolumn_evening_BNR*.*csv", full.names = TRUE) %>%
  map_dfr(read_csv) %>%
  mutate(BNR  = str_c("BNR", match(Region_ID, unique(Region_ID))))
  • Related