Home > Mobile >  How to trim the beginning of the dataframe
How to trim the beginning of the dataframe

Time:10-21

I have a dataframe like this (scroll down for dummy data):

         Date    col1  col2
1  24/06/2002   20000     0
2  25/06/2002 1000000     0
3  26/06/2002   10000     0
4  27/06/2002  100000     0
5  28/06/2002  100000     0
6  02/07/2002    1000     0
7  03/07/2002  500000     0
8  24/07/2002    1000     0
9  28/07/2002   12000     0
10 29/07/2002    1200  1200
11 15/08/2002   12000 12000
12 17/08/2002   12000 12000
13 22/08/2002    2000  2000
14 23/08/2002   56700 56700
15 24/08/2002   56700 56700
16 29/08/2002     200   200

Code I use:

  column_names<- colnames(df)
  for (i in column_names){
    test_df <- cbind(df[,"Date"], df[,i])
#remove outliers
    test_df.to_csv("...")
   }

So for every loop I have date column with the data column.

  • Col1 will be saved as a full dataframe from row 1 to end
  • The Col2 had values starting from random place in the column. Is there any way I can extract/trim the test_df only from where the col2 values start??

so that I can save the data accorindingly to csv

Also, How to remove outliers in each column before saving?

Dummy Data:

 df <- structure(list(Date = c("24/06/2002", "25/06/2002", "26/06/2002","27/06/2002", "28/06/2002", 
                                   "02/07/2002","03/07/2002","24/07/2002", "28/07/2002",
                                   "29/07/2002", "15/08/2002", "17/08/2002", 
                                   "22/08/2002", "23/08/2002", "24/08/2002", "29/08/2002"), 
                          col1 = c(20000, 1000000, 10000, 100000, 100000,
                                      1000,500000,1000, 12000,
                                      1200, 12000, 12000,
                                      2000, 56700, 56700, 200), 
                          col2 = c(0, 0, 0, 0, 0,
                                      0,0,0, 0,
                                      1200, 12000, 12000,
                                      2000, 56700, 56700, 200)), row.names = c(NA, -16L), class = "data.frame")

CodePudding user response:

We may use

library(dplyr)
df %>%
   slice(which(col2 > 0)[1]:n())

If we need to loop over the columns except the 'Date', use map to loop over the column names, select the 'Date' and the column looped, create a condition to slice the rows only if the column name is 'col2' and then write it to the csv file (make changes to the path - here we used getwd - instead can save it on the desired directory)

library(purrr)
 map(names(df)[-1], ~
   {tmp <- df %>% 
      select(Date, all_of(.x))
     if(.x == 'col2') {
     tmp <- tmp %>%
       slice(which(col2 > 0)[1]:n())
    }
   write.csv(tmp, file.path(getwd(), paste0(.x, ".csv")), row.names = FALSE))
tmp
})
  • Related