Home > Software design >  R loop to create datasets and modify variables based on loop number
R loop to create datasets and modify variables based on loop number

Time:12-07

I have the following dataset with number of rows in the millions:

| operating aln | operating fltnum | aln1 | flt1 | aln2| flt2| aln3| flt3|...|aln8|flt8|

Each operating airline and flight may have up to 8 code share flights (aln1-8 and flt 1-8). So some columns (aln2- aln8) may contain data, and some may not. My goal is to get a unique airline and flight number from all the columns combined. I wrote a code that I know is very inefficient (but it does the trick)

#separate each airline and flight number into its own dataframe

t1 <- df[,c("aln1","flt1")]
t2 <- df[,c("aln2","flt2")]
t3 <- df[,c("aln3","flt3")]
t4 <- df[,c("aln4","flt4")]
t5 <- df[,c("aln5","flt5")]
t6 <- df[,c("aln6","flt6")]
t7 <- df[,c("aln7","flt7")]
t8 <- df[,c("aln8","flt8")]

# get rid of all rows with NA in them

t1 <- na.omit(t1)
t2 <- na.omit(t2)
t3 <- na.omit(t3)
t4 <- na.omit(t4)
t5 <- na.omit(t5)
t6 <- na.omit(t6)
t7 <- na.omit(t7)
t8 <- na.omit(t8)

# rename all variables so I may combine them together in one dataframe

t1 <- rename(t1,aln = aln1, fltnum = flt1)
t2 <- rename(t2,aln = aln2, fltnum = flt2)
t3 <- rename(t3,aln = aln3, fltnum = flt3)
t4 <- rename(t4,aln = aln4, fltnum = flt4)
t5 <- rename(t5,aln = aln5, fltnum = flt5)
t6 <- rename(t6,aln = aln6, fltnum = flt6)
t7 <- rename(t7,aln = aln7, fltnum = flt7)
t8 <- rename(t8,aln = aln8, fltnum = flt8)

# combine and get a unique sort
cshares_010 <- rbind(t1,t2,t3,t4,t5,t6,t7,t8)
cshares_010<- distinct(cshares_010, aln, fltnum)

My question is if anyone knows a more 'efficient' way of doing this?

I tried creating a function that will use a loop to do all the above but that can't seem to make it work (when I worked with SAS I created a macro and used the loop number in {} to create and use variables).

For example to create tables t1-t8:

names <- list("t1,"t2,"t3","t4","t5","t6","t7","t8")
pos <- list(1,2,3,4,5,6,7,8)


for (j in 1:8)
{
t[[j]] <- df[, c(3 (pos[[j]]*2),(4 (pos[[j]]*2)] # so that I get a c(5,6) for t1, c(7,8) for t2,etc..
}

The code seems to choose the correct columns in each loop (so the equations work when i test them separately), but R gives an error when running the function stating "more elements supplied than they are to replace"

Appreciate your thoughts and advice.

CodePudding user response:

The following should work.
You can assign the values to a new dataframe, then work on this new dataframe. Finally you can assign the correct dataframe name at the end. It is sometimes complex to assign the name of your dataframe at the start of the loop because harder to work on it.

You can also use the function get(paste("t", i, sep="")) to output a dataframe in a loop.

### Initiating empty list
listOfDf <- list()

### Loop
for(i in 1:8){
  # Assign you variables to a dummy variable, simpler when you want to work on a df in a loop
  assign("newDf", df[, c(paste("aln", i, sep=""), paste("flt", i, sep=""))])
  
  # NA omit
  newDf <- na.omit(newDf)
  
  # Rename variables
  newDf <- rename(newDf, 
                  aln=paste("aln", i, sep=""), 
                  flt=paste("flt", i, sep=""))
  
  # Assign the reworked dataframe to ti
  assign(paste("t", i, sep=""), newDf)
  
  # Store each dataframe in a list
  listOfDf[[i]] <- newDf
}

### Rbind all your dataframes
do.call(rbind, listOfDf)

CodePudding user response:

Maybe build a new data.frame after unlisting.

unique(
  na.omit(
    data.frame(
      aln = unlist(df[,grep("aln", colnames(df))], 0, 0),
      flt = unlist(df[,grep("flt", colnames(df))], 0, 0)
    )
  )
)
#>   aln flt
#> 1   A   1
#> 2   B   2
#> 3   C   3
#> 4   D   4
#> 5   E   5
#> 6   F   6
#> 7   G   7
#> 8   H   8

Data

df <- as.data.frame(
  c(
    as.list(LETTERS[1:8]),
    as.list(1:8)
  )[rep(1:8, each = 2)   c(0, 8)]
)
df <- setNames(df, paste0(rep(c("aln", "flt"), 8), rep(1:8, each = 2)))
df
#>   aln1 flt1 aln2 flt2 aln3 flt3 aln4 flt4 aln5 flt5 aln6 flt6 aln7 flt7 aln8
#> 1    A    1    B    2    C    3    D    4    E    5    F    6    G    7    H
#>   flt8
#> 1    8
  • Related