Home > Net >  How to apply code to dataframe by condition?
How to apply code to dataframe by condition?

Time:04-23

I have the following dataframe:

library(dplyr)
library(tidyverse)
library(concordance)
Year <- c(2016,2016,2017,2019,2020,2020,2020,2013,2010,2010)
Pf <- c("HS4","HS4","HS4","HS5","HS5","HS5","HS5","HS4","HS3","HS3")
Code <- c("391890","440929","851660","732399","720839","050510","830241","321590","010210","010210")
Slen <- c("6","6","6","6","6","6","6","6","6","6")
df <-  data.frame(Year,Pf,Code,Slen)

'Pf' column contains 3 different types of rows: "HS3", "HS4" and "HS5". I want to perform a vectorized operation and apply concord() function to the 'Code' column", however in order to do that, 'Pf' must be Unique that's why before I sebset datarames where 'Pf' column is unique

# Subset data where Pf column is unique
df.H5 <- subset(df, Pf == "HS5")
df.H4  <- subset(df, Pf == "HS4")
df.H3  <- subset(df, Pf == "HS3")

Now I apply a function to each dataframe. Here concord() function applies to 'Code' column and converts these characters to different ones. However, if destination (argument) and values in 'Pf' column are the same it does not work, for instance, if Pf="HS3" (in df) and destination = "HS3", the code does not run, that's why I don't apply code to df.H3

# Apply function to df.H5
df.H5<- df.H5 %>% 
  group_by(Pf, Slen) %>%
  mutate(
    Code2 = concord(Code, origin = unique(Pf), dest.digit = unique(Slen), destination = "HS3", all = FALSE)
  ) %>%
  ungroup()

# Apply function to df.H4
df.H4<- df.H4 %>% 
  group_by(Pf, Slen) %>%
  mutate(
    Code2 = concord(Code, origin = unique(Pf), dest.digit = unique(Slen), destination = "HS3", all = FALSE)
  ) %>%
  ungroup()

#add column todf.H3 in order to merge these 3 tafarames
df.H3$Code2 <- df.H3$Code

#merge
df2 <- rbind(df.H4, df.H5, df.H3)

My goal is to somehow automate the process. For instance, if destination = "HS3", the code applies whole data without pre-subsetting and if destination (argument) and rows in Pf match each other, the code does not apply to it and just copy-paste values from 'Code' to generated 'Code2' column in that case

CodePudding user response:

You could put the logic in a function and use it in a by approach which splits data and applies functions. In the function you could do a case handling where supposedly P == 'HS3' should not be processed. Finally unsplit.

cf <- \(x) {
  Code2 <- if (!any(x$P == 'HS3')) {
    concordance::concord(x$Code, x$Pf[1], x$Slen[1], 
                         destination="HS3", all=FALSE)
  } else {
    x$Code
  }
  cbind(x, Code2)
}

by(df, df$Pf, cf) |>
  unsplit(df$Pf)
#    Year  Pf   Code Slen  Code2
# 1  2016 HS4 391890    6 391890
# 2  2016 HS4 440929    6 440929
# 3  2017 HS4 851660    6 851660
# 4  2019 HS5 732399    6 732399
# 5  2020 HS5 720839    6 720839
# 6  2020 HS5 050510    6 050510
# 7  2020 HS5 830241    6 830241
# 8  2013 HS4 321590    6 321590
# 9  2010 HS3 010210    6 010210
# 10 2010 HS3 010210    6 010210

Data:

df <- structure(list(Year = c(2016, 2016, 2017, 2019, 2020, 2020, 2020, 
2013, 2010, 2010), Pf = c("HS4", "HS4", "HS4", "HS5", "HS5", 
"HS5", "HS5", "HS4", "HS3", "HS3"), Code = c("391890", "440929", 
"851660", "732399", "720839", "050510", "830241", "321590", "010210", 
"010210"), Slen = c("6", "6", "6", "6", "6", "6", "6", "6", "6", 
"6")), class = "data.frame", row.names = c(NA, -10L))
  • Related