I'm getting duplicated data after doing some transformations.
grouping:
paid <- df%>%
group_by(date) %>%
summarise(sessions = sum(sessions),
revenue_usd = sum(transaction_revenue)/3.8,
bounceRate = bounces/sessions,
transactions = sum(transactions))
original data:
df <- structure(list(date = structure(c(19186, 19186, 19186, 19186,
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186,
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186,
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186,
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186,
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186,
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186,
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186,
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186,
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186,
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186,
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186,
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186,
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186,
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186,
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186,
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186,
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186,
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186,
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186,
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186,
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186,
19186, 19186, 19186, 19186, 19186, 19186, 19186), class = "Date"),
source = c("(direct)", "(not set)", "127.0.0.1:20489", "13d612549ae98efa4eebca49fbda5aa9.safeframe.googlesyndication.com",
"214791a1533c2a50e91a0172b59da402.safeframe.googlesyndication.com",
"408b8fdfaa1d06bb140607437bd90f10.safeframe.googlesyndication.com",
"5e1452d421694493662f87729e4e130d.safeframe.googlesyndication.com",
"8042dc59e10779732adc12d6359c3ab9.safeframe.googlesyndication.com",
"84c4aa5f9449946d1cb5812ca3ea633a.safeframe.googlesyndication.com",
"89a3e733827acf8133fab396ec550646.safeframe.googlesyndication.com",
"9ad91f3bc7d10dce0ba6fb1c18d32c0e.safeframe.googlesyndication.com",
"9d81febecab8a704ce297781effecc07.safeframe.googlesyndication.com",
"accounts.google.com", "actitudfem.com", "admin.pagoefectivo.pe",
"ads.us.criteo.com", "adsintegrity.net", "anmosugoi.com",
"beneficios.bbva.pe", "biggestchef.com", "bing", "content.jwplatform.com",
"criteo", "criteo", "Criteo", "Criteo,criteo", "cse.google.com",
"cuotealo.viabcp.com", "cyberwow.pe", "diariocorreo.pe",
"dinersclub.pe", "docs.google.com", "duckduckgo", "ecosia.org",
"edmmkt", "elcomercio", "elcomercio", "estudiantes.samsung.com.pe",
"facebook", "facebook", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network", "facebook-network",
"facebook-network", "facebook-network"), medium = c("(none)",
"(not set)", "referral", "referral", "referral", "referral",
"referral", "referral", "referral", "referral", "referral",
"referral", "referral", "referral", "referral", "referral",
"referral", "referral", "referral", "referral", "organic",
"referral", "consideration", "retargeting", "(not set)",
"consideration", "referral", "referral", "referral", "referral",
"referral", "referral", "organic", "organic", "image", "cpm",
"cpm", "referral", "cpm", "social", "(not set)", "cpa", "cpa",
"cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa",
"cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa",
"cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa",
"cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa",
"cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa",
"cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa",
"cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa",
"cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa",
"cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa",
"cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa",
"cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa",
"cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa",
"cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa",
"cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa",
"cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa",
"cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa",
"cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa",
"cpa", "cpa", "cpa", "cpa"), sessions = c(1746, 4, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 40, 1, 1, 13, 1, 40, 1,
3636, 2921, 4, 1, 4, 54, 20, 1, 66, 1, 1, 3, 1, 1, 29, 9,
1, 86, 75, 1, 1, 3, 1, 1, 7, 1, 3, 1, 1, 3, 1, 2, 3, 1, 1,
2, 1, 4, 6, 1, 2, 6, 12, 5, 73, 2, 7, 1, 131, 22, 5, 14,
69, 8, 1, 4, 72, 2, 3, 1, 6, 21, 20, 72, 17, 7, 1, 15, 218,
148, 278, 159, 30, 2, 373, 1, 1, 3, 5, 1, 3, 1, 1, 1, 1,
1, 5, 1, 1, 2, 2, 2, 2, 1, 2, 12, 1, 1, 2, 1, 1, 1, 2, 1,
144, 14, 1, 4, 54, 1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 5, 1,
1, 5, 1, 10, 156, 2, 9, 2, 19, 34, 179, 1, 2, 4, 447, 1,
4, 2, 321, 1, 1, 5, 163, 3, 4, 318, 6, 2, 9, 162, 2, 59,
1, 5, 414, 1, 4, 3, 312, 7, 701, 2, 2, 2, 147, 1, 4, 30,
137, 28, 1, 14, 1, 1, 1, 451), transactions = c(17, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0,
0, 2, 0, 0, 0, 29, 0, 0, 3, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 3), transaction_revenue = c(44183, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1115.52, 0, 0, 1375.67,
0, 0, 0, 0, 4198, 0, 0, 0, 59269, 0, 0, 6147, 0, 0, 0, 0,
0, 0, 1999, 0, 599, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 2599, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 4199, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2599, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7397), bounces = c(1278,
4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 1, 1, 6, 1,
23, 1, 3294, 2333, 2, 1, 3, 4, 12, 0, 32, 1, 1, 3, 1, 1,
29, 7, 1, 66, 49, 0, 1, 3, 1, 1, 5, 1, 0, 0, 1, 2, 1, 2,
3, 0, 1, 2, 1, 4, 6, 0, 1, 4, 9, 5, 61, 2, 6, 1, 115, 20,
4, 12, 63, 7, 1, 4, 62, 2, 2, 1, 5, 17, 16, 59, 17, 6, 1,
10, 194, 132, 235, 136, 27, 2, 274, 1, 1, 3, 4, 1, 3, 1,
1, 0, 1, 1, 3, 1, 1, 1, 2, 1, 2, 0, 2, 7, 0, 1, 1, 1, 1,
1, 1, 1, 129, 12, 1, 4, 49, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1,
0, 5, 1, 1, 2, 1, 3, 133, 2, 5, 2, 16, 27, 150, 0, 1, 4,
377, 1, 4, 1, 254, 1, 1, 3, 120, 3, 3, 248, 6, 1, 9, 123,
2, 47, 1, 4, 340, 1, 4, 3, 244, 7, 557, 2, 2, 2, 117, 1,
4, 23, 100, 22, 1, 12, 0, 1, 0, 325)), totals = list(list(
sessions = "89600", transactions = "247", transactionRevenue = "547126.09",
bounces = "73354")), minimums = list(list(sessions = "1",
transactions = "0", transactionRevenue = "0.0", bounces = "0")), maximums = list(
list(sessions = "8355", transactions = "65", transactionRevenue = "145010.41",
bounces = "7614")), isDataGolden = TRUE, rowCount = 737L, row.names = c(NA,
200L), class = "data.frame")
sessioninfo:
> sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)
Matrix products: default
locale:
[1] LC_COLLATE=Spanish_Peru.utf8 LC_CTYPE=Spanish_Peru.utf8 LC_MONETARY=Spanish_Peru.utf8
[4] LC_NUMERIC=C LC_TIME=Spanish_Peru.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] openxlsx_4.2.5 janitor_2.1.0 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.9
[6] purrr_0.3.4 readr_2.1.2 tidyr_1.2.0 tibble_3.1.7 ggplot2_3.3.6
[11] tidyverse_1.3.1 googleAnalyticsR_1.0.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.8.3 lubridate_1.8.0 assertthat_0.2.1 digest_0.6.29
[5] utf8_1.2.2 R6_2.5.1 cellranger_1.1.0 backports_1.4.1
[9] reprex_2.0.1 httr_1.4.3 pillar_1.7.0 rlang_1.0.3
[13] curl_4.3.2 readxl_1.4.0 rstudioapi_0.13 whisker_0.4
[17] googleAuthR_2.0.0 munsell_0.5.0 broom_1.0.0 compiler_4.2.1
[21] modelr_0.1.8 askpass_1.1 pkgconfig_2.0.3 openssl_2.0.2
[25] tidyselect_1.1.2 fansi_1.0.3 crayon_1.5.1 tzdb_0.3.0
[29] dbplyr_2.2.1 withr_2.5.0 rappdirs_0.3.3 grid_4.2.1
[33] jsonlite_1.8.0 gtable_0.3.0 lifecycle_1.0.1 DBI_1.1.3
[37] magrittr_2.0.3 scales_1.2.0 zip_2.2.0 cli_3.3.0
[41] stringi_1.7.6 cachem_1.0.6 fs_1.5.2 snakecase_0.11.0
[45] xml2_1.3.3 ellipsis_0.3.2 generics_0.1.3 vctrs_0.4.1
[49] tools_4.2.1 measurementProtocol_0.1.0 glue_1.6.2 hms_1.1.1
[53] fastmap_1.1.0 colorspace_2.0-3 gargle_1.2.0 rvest_1.0.2
[57] memoise_2.0.1 haven_2.5.0 usethis_2.1.6
CodePudding user response:
As Ritchie's comment suggests, this is happening because bounces/sessions
returns a vector.
Your example data only contains one date, so the output of the following only gives one row. However, if you create the bounceRate
after the summarise I think you will get what you want.
paid <- df%>%
group_by(date) %>%
summarise(sessions = sum(sessions),
revenue_usd = sum(transaction_revenue)/3.8,
bounces = sum(bounces),
transactions = sum(transactions)) %>%
mutate(bounceRate = bounces/sessions)
CodePudding user response:
Almost the same as duncanw but there's a simpler solution which gives the same output columns as your original:
paid <- df%>%
group_by(transaction_date) %>%
summarise(sessions=sum(sessions),
revenue_usd= sum(transaction_revenue)/3.8,
bounceRate=sum(bounces)/sum(sessions),
transactions = sum(transactions))
CodePudding user response:
I tested it from my side and it resulted same as yours. I am quite amaze and started looking into it. This is what I found,
The calculation of column bounceRate does not have any aggregating function, it just a simple divide, which should be happening/executing at row level and as there is no function given for summarization the result remains at row level.
Let me explain,
paid <- df%>%
group_by(date) %>%
summarise(sessions = sum(sessions),
revenue_usd = sum(transaction_revenue)/3.8,
bounceRate = bounces/sessions,
transactions = sum(transactions))
This code result with all rows intact as bounceRate = bounces/sessions is row operation.
If you comment this row and calculate this will summaries the data into single row as there is only one date 2022-07-13
paid <- df%>%
group_by(date) %>%
summarise(sessions = sum(sessions),
revenue_usd = sum(transaction_revenue)/3.8,
# bounceRate = bounces/sessions,
transactions = sum(transactions))
So in my opinion you can use either sum over this ratio, and if you feel its incorrect you can summaries first then take the ratio. My suggestion would be to use 2nd approach take ratio after summarization. I am putting the code for both below,
Approch 01 :
paid <- df%>%
group_by(date) %>%
summarise(sessions = sum(sessions),
revenue_usd = sum(transaction_revenue)/3.8,
bounceRate = sum(bounces/sessions),
transactions = sum(transactions))
This will give you result
Approch 02 :
paid <- df%>%
group_by(date) %>%
summarise(sessions = sum(sessions),
revenue_usd = sum(transaction_revenue)/3.8,
transactions = sum(transactions),
total_bounces = sum(bounces),
total_sessions = sum(sessions)) %>%
mutate(bounceRate = total_bounces/total_sessions)
This will result into below result, which I think is more appropriate.
Like/Up vote if this resolves your problem. Happy coding !