I have a dataframe like this (scroll down for dummy data):
Date col1 col2
1 24/06/2002 20000 0
2 25/06/2002 1000000 0
3 26/06/2002 10000 0
4 27/06/2002 100000 0
5 28/06/2002 100000 0
6 02/07/2002 1000 0
7 03/07/2002 500000 0
8 24/07/2002 1000 0
9 28/07/2002 12000 0
10 29/07/2002 1200 1200
11 15/08/2002 12000 12000
12 17/08/2002 12000 12000
13 22/08/2002 2000 2000
14 23/08/2002 56700 56700
15 24/08/2002 56700 56700
16 29/08/2002 200 200
Code I use:
column_names<- colnames(df)
for (i in column_names){
test_df <- cbind(df[,"Date"], df[,i])
#remove outliers
test_df.to_csv("...")
}
So for every loop I have date column with the data column.
- Col1 will be saved as a full dataframe from row 1 to end
- The Col2 had values starting from random place in the column. Is there any way I can extract/trim the test_df only from where the col2 values start??
so that I can save the data accorindingly to csv
Also, How to remove outliers in each column before saving?
Dummy Data:
df <- structure(list(Date = c("24/06/2002", "25/06/2002", "26/06/2002","27/06/2002", "28/06/2002",
"02/07/2002","03/07/2002","24/07/2002", "28/07/2002",
"29/07/2002", "15/08/2002", "17/08/2002",
"22/08/2002", "23/08/2002", "24/08/2002", "29/08/2002"),
col1 = c(20000, 1000000, 10000, 100000, 100000,
1000,500000,1000, 12000,
1200, 12000, 12000,
2000, 56700, 56700, 200),
col2 = c(0, 0, 0, 0, 0,
0,0,0, 0,
1200, 12000, 12000,
2000, 56700, 56700, 200)), row.names = c(NA, -16L), class = "data.frame")
CodePudding user response:
We may use
library(dplyr)
df %>%
slice(which(col2 > 0)[1]:n())
If we need to loop over the columns except the 'Date', use map
to loop over the column names, select
the 'Date' and the column looped, create a condition to slice
the rows only if the column name is 'col2' and then write it to the csv file (make changes to the path - here we used getwd
- instead can save it on the desired directory)
library(purrr)
map(names(df)[-1], ~
{tmp <- df %>%
select(Date, all_of(.x))
if(.x == 'col2') {
tmp <- tmp %>%
slice(which(col2 > 0)[1]:n())
}
write.csv(tmp, file.path(getwd(), paste0(.x, ".csv")), row.names = FALSE))
tmp
})