I have a for loop im trying to convert to a mapply, as I have read that it is faster than for (for loop takes about 2 minutes).
The loop does this: it creates subsets filtering by the unique names of column "OrdenFab" and then, it keeps the unrepeated values on the "Valor" column. Then it adds this filtered subset to a new data frame, and it keeps adding them all as the loop goes on, getting a filtered dataframe with no repeated values in column "Valor" for each unique value of the column "OrdenFab".
i<-unique(datapesomolde$OrdenFab)
datapesomoldefiltered<-data.frame()
for (j in i){
datapesomoldetemp<-datapesomolde%>%
filter(OrdenFab==j)%>%
filter(!duplicated(Valor))
datapesomoldefiltered<-rbind(datapesomoldefiltered,datapesomoldetemp)
}
Original dataframe is this one (first 20 rows, it has 20626):
> datapesomolde
PartNumber OrdenFab Valor Fecha_Registro LimInf LimSup Nominal
1 11012501 226549204 14.50000 2022-04-25 07:18:00 12.65 14.71 13.68
2 11012501 226549204 14.50000 2022-04-25 07:18:00 12.65 14.71 13.68
3 11013610 226548648 47.30000 2022-04-25 05:52:00 42.38 49.26 45.82
4 11013047 226548234 15.20000 2022-04-23 02:47:00 14.43 16.77 15.60
5 11013047 226548234 15.20000 2022-04-23 02:47:00 14.43 16.77 15.60
6 11013047 226548234 15.20000 2022-04-23 02:48:00 14.43 16.77 15.60
7 11013047 226548234 15.20000 2022-04-23 02:48:00 14.43 16.77 15.60
8 11013052 226548332 16.30000 2022-04-23 01:49:00 15.63 18.17 16.90
9 11013052 226548332 16.30000 2022-04-23 01:49:00 15.63 18.17 16.90
10 11013052 226548332 16.30000 2022-04-23 01:49:00 15.63 18.17 16.90
11 11013052 226548332 16.30000 2022-04-23 01:49:00 15.63 18.17 16.90
12 11012501 226548204 14.70000 2022-04-23 01:44:00 12.65 14.71 13.68
13 11012501 226548204 14.70000 2022-04-23 01:44:00 12.65 14.71 13.68
14 11012501 226548200 14.55000 2022-04-23 01:43:00 12.65 14.71 13.68
15 11012501 226548200 14.55000 2022-04-23 01:43:00 12.65 14.71 13.68
16 11012501 226548201 14.65000 2022-04-23 01:42:00 12.65 14.71 13.68
17 11012501 226548201 14.65000 2022-04-23 01:42:00 12.65 14.71 13.68
18 11013943 226548154 134.00000 2022-04-23 00:07:00 131.76 153.13 142.44
19 11013943 226547066 144.00000 2022-04-22 23:31:00 131.76 153.13 142.44
20 11013050 226547200 15.10000 2022-04-22 23:27:00 14.34 16.66 15.50
Filtered result is this one (first 10 rows):
>datapesomoldefiltered
PartNumber OrdenFab Valor Fecha_Registro LimInf LimSup Nominal
1 11012501 226549204 14.50000 2022-04-25 07:18:00 12.65 14.71 13.68
2 11013610 226548648 47.30000 2022-04-25 05:52:00 42.38 49.26 45.82
3 11013047 226548234 15.20000 2022-04-23 02:47:00 14.43 16.77 15.60
4 11013052 226548332 16.30000 2022-04-23 01:49:00 15.63 18.17 16.90
5 11012501 226548204 14.70000 2022-04-23 01:44:00 12.65 14.71 13.68
6 11012501 226548200 14.55000 2022-04-23 01:43:00 12.65 14.71 13.68
7 11012501 226548201 14.65000 2022-04-23 01:42:00 12.65 14.71 13.68
8 11013943 226548154 134.00000 2022-04-23 00:07:00 131.76 153.13 142.44
9 11013943 226547066 144.00000 2022-04-22 23:31:00 131.76 153.13 142.44
10 11013050 226547200 15.10000 2022-04-22 23:27:00 14.34 16.66 15.50
Im strugling to convert it to mapply, I am getting a Matrix not a dataframe. I have tried this:
i<-unique(datapesomolde$OrdenFab)
datapesomoldefiltered<-data.frame()
limpiarof<-function(i){
subset<-filter(datapesomolde,OrdenFab==i)
datapesomoldetemp<-filter(subset,!duplicated(subset$Valor))
return(datapesomoldefiltered<-rbind(datapesomoldefiltered,datapesomoldetemp))
}
datapesomoldefiltered<-mapply(limpiarof,i)
With this try I get a Matrix of 2.2GB, it just has the value of all the colomns for each unique value of the "OrdenFab" column.
Can you help me please? Thanks in advance.
CodePudding user response:
Here are two ways. The difference is that in the first solution the original rows order is kept in the final result. If this doesn't matter, the 2nd solution skips the creation of a temp list sp
.
x <- " PartNumber OrdenFab Valor Fecha_Registro LimInf LimSup Nominal
1 11012501 226549204 14.50000 '2022-04-25 07:18:00' 12.65 14.71 13.68
2 11012501 226549204 14.50000 '2022-04-25 07:18:00' 12.65 14.71 13.68
3 11013610 226548648 47.30000 '2022-04-25 05:52:00' 42.38 49.26 45.82
4 11013047 226548234 15.20000 '2022-04-23 02:47:00' 14.43 16.77 15.60
5 11013047 226548234 15.20000 '2022-04-23 02:47:00' 14.43 16.77 15.60
6 11013047 226548234 15.20000 '2022-04-23 02:48:00' 14.43 16.77 15.60
7 11013047 226548234 15.20000 '2022-04-23 02:48:00' 14.43 16.77 15.60
8 11013052 226548332 16.30000 '2022-04-23 01:49:00' 15.63 18.17 16.90
9 11013052 226548332 16.30000 '2022-04-23 01:49:00' 15.63 18.17 16.90
10 11013052 226548332 16.30000 '2022-04-23 01:49:00' 15.63 18.17 16.90
11 11013052 226548332 16.30000 '2022-04-23 01:49:00' 15.63 18.17 16.90
12 11012501 226548204 14.70000 '2022-04-23 01:44:00' 12.65 14.71 13.68
13 11012501 226548204 14.70000 '2022-04-23 01:44:00' 12.65 14.71 13.68
14 11012501 226548200 14.55000 '2022-04-23 01:43:00' 12.65 14.71 13.68
15 11012501 226548200 14.55000 '2022-04-23 01:43:00' 12.65 14.71 13.68
16 11012501 226548201 14.65000 '2022-04-23 01:42:00' 12.65 14.71 13.68
17 11012501 226548201 14.65000 '2022-04-23 01:42:00' 12.65 14.71 13.68
18 11013943 226548154 134.00000 '2022-04-23 00:07:00' 131.76 153.13 142.44
19 11013943 226547066 144.00000 '2022-04-22 23:31:00' 131.76 153.13 142.44
20 11013050 226547200 15.10000 '2022-04-22 23:27:00' 14.34 16.66 15.50"
datapesomolde <- read.table(textConnection(x), header = TRUE)
suppressPackageStartupMessages({
library(dplyr)
library(purrr)
})
datapesomolde$Fecha_Registro <- as.POSIXct(datapesomolde$Fecha_Registro)
sp <- split(datapesomolde, datapesomolde$OrdenFab)
sp %>%
map_dfr( ~ .x %>% filter(!duplicated(Valor))) %>%
arrange(as.integer(row.names(.)))
#> PartNumber OrdenFab Valor Fecha_Registro LimInf LimSup Nominal
#> 1 11012501 226549204 14.50 2022-04-25 07:18:00 12.65 14.71 13.68
#> 3 11013610 226548648 47.30 2022-04-25 05:52:00 42.38 49.26 45.82
#> 4 11013047 226548234 15.20 2022-04-23 02:47:00 14.43 16.77 15.60
#> 8 11013052 226548332 16.30 2022-04-23 01:49:00 15.63 18.17 16.90
#> 12 11012501 226548204 14.70 2022-04-23 01:44:00 12.65 14.71 13.68
#> 14 11012501 226548200 14.55 2022-04-23 01:43:00 12.65 14.71 13.68
#> 16 11012501 226548201 14.65 2022-04-23 01:42:00 12.65 14.71 13.68
#> 18 11013943 226548154 134.00 2022-04-23 00:07:00 131.76 153.13 142.44
#> 19 11013943 226547066 144.00 2022-04-22 23:31:00 131.76 153.13 142.44
#> 20 11013050 226547200 15.10 2022-04-22 23:27:00 14.34 16.66 15.50
rm(sp) # tidy up
Created on 2022-06-01 by the reprex package (v2.0.1)
datapesomolde %>%
group_split(OrdenFab) %>%
map_dfr( ~ .x %>% filter(!duplicated(Valor)))
#> # A tibble: 10 × 7
#> PartNumber OrdenFab Valor Fecha_Registro LimInf LimSup Nominal
#> <int> <int> <dbl> <dttm> <dbl> <dbl> <dbl>
#> 1 11013943 226547066 144 2022-04-22 23:31:00 132. 153. 142.
#> 2 11013050 226547200 15.1 2022-04-22 23:27:00 14.3 16.7 15.5
#> 3 11013943 226548154 134 2022-04-23 00:07:00 132. 153. 142.
#> 4 11012501 226548200 14.6 2022-04-23 01:43:00 12.6 14.7 13.7
#> 5 11012501 226548201 14.6 2022-04-23 01:42:00 12.6 14.7 13.7
#> 6 11012501 226548204 14.7 2022-04-23 01:44:00 12.6 14.7 13.7
#> 7 11013047 226548234 15.2 2022-04-23 02:47:00 14.4 16.8 15.6
#> 8 11013052 226548332 16.3 2022-04-23 01:49:00 15.6 18.2 16.9
#> 9 11013610 226548648 47.3 2022-04-25 05:52:00 42.4 49.3 45.8
#> 10 11012501 226549204 14.5 2022-04-25 07:18:00 12.6 14.7 13.7
Created on 2022-06-01 by the reprex package (v2.0.1)
CodePudding user response:
I would suggest solving this problem using a more abstract approach, using e.g. tidyverse
:
This should be much faster and clearer:
library(tidyverse)
datapesomoldefiltered <-
datapesomolde |>
group_by(PartNumber) |>
distinct(Valor, .keep_all = TRUE) |>
ungroup()
datapesomoldefiltered