Home > Enterprise >  same values for all raws after grouping?
same values for all raws after grouping?

Time:07-15

I'm getting duplicated data after doing some transformations.

enter image description here

grouping:


paid <- df%>%
                 group_by(date) %>%
                 summarise(sessions = sum(sessions),
                           revenue_usd = sum(transaction_revenue)/3.8,
                           bounceRate = bounces/sessions,
                           transactions = sum(transactions))

original data:

df <- structure(list(date = structure(c(19186, 19186, 19186, 19186, 
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
19186, 19186, 19186, 19186, 19186, 19186, 19186), class = "Date"), 
    source = c("(direct)", "(not set)", "127.0.0.1:20489", "13d612549ae98efa4eebca49fbda5aa9.safeframe.googlesyndication.com", 
    "214791a1533c2a50e91a0172b59da402.safeframe.googlesyndication.com", 
    "408b8fdfaa1d06bb140607437bd90f10.safeframe.googlesyndication.com", 
    "5e1452d421694493662f87729e4e130d.safeframe.googlesyndication.com", 
    "8042dc59e10779732adc12d6359c3ab9.safeframe.googlesyndication.com", 
    "84c4aa5f9449946d1cb5812ca3ea633a.safeframe.googlesyndication.com", 
    "89a3e733827acf8133fab396ec550646.safeframe.googlesyndication.com", 
    "9ad91f3bc7d10dce0ba6fb1c18d32c0e.safeframe.googlesyndication.com", 
    "9d81febecab8a704ce297781effecc07.safeframe.googlesyndication.com", 
    "accounts.google.com", "actitudfem.com", "admin.pagoefectivo.pe", 
    "ads.us.criteo.com", "adsintegrity.net", "anmosugoi.com", 
    "beneficios.bbva.pe", "biggestchef.com", "bing", "content.jwplatform.com", 
    "criteo", "criteo", "Criteo", "Criteo,criteo", "cse.google.com", 
    "cuotealo.viabcp.com", "cyberwow.pe", "diariocorreo.pe", 
    "dinersclub.pe", "docs.google.com", "duckduckgo", "ecosia.org", 
    "edmmkt", "elcomercio", "elcomercio", "estudiantes.samsung.com.pe", 
    "facebook", "facebook", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network", "facebook-network", 
    "facebook-network", "facebook-network"), medium = c("(none)", 
    "(not set)", "referral", "referral", "referral", "referral", 
    "referral", "referral", "referral", "referral", "referral", 
    "referral", "referral", "referral", "referral", "referral", 
    "referral", "referral", "referral", "referral", "organic", 
    "referral", "consideration", "retargeting", "(not set)", 
    "consideration", "referral", "referral", "referral", "referral", 
    "referral", "referral", "organic", "organic", "image", "cpm", 
    "cpm", "referral", "cpm", "social", "(not set)", "cpa", "cpa", 
    "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
    "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
    "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
    "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
    "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
    "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
    "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
    "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
    "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
    "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
    "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
    "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
    "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
    "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
    "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
    "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
    "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
    "cpa", "cpa", "cpa", "cpa"), sessions = c(1746, 4, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 40, 1, 1, 13, 1, 40, 1, 
    3636, 2921, 4, 1, 4, 54, 20, 1, 66, 1, 1, 3, 1, 1, 29, 9, 
    1, 86, 75, 1, 1, 3, 1, 1, 7, 1, 3, 1, 1, 3, 1, 2, 3, 1, 1, 
    2, 1, 4, 6, 1, 2, 6, 12, 5, 73, 2, 7, 1, 131, 22, 5, 14, 
    69, 8, 1, 4, 72, 2, 3, 1, 6, 21, 20, 72, 17, 7, 1, 15, 218, 
    148, 278, 159, 30, 2, 373, 1, 1, 3, 5, 1, 3, 1, 1, 1, 1, 
    1, 5, 1, 1, 2, 2, 2, 2, 1, 2, 12, 1, 1, 2, 1, 1, 1, 2, 1, 
    144, 14, 1, 4, 54, 1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 5, 1, 
    1, 5, 1, 10, 156, 2, 9, 2, 19, 34, 179, 1, 2, 4, 447, 1, 
    4, 2, 321, 1, 1, 5, 163, 3, 4, 318, 6, 2, 9, 162, 2, 59, 
    1, 5, 414, 1, 4, 3, 312, 7, 701, 2, 2, 2, 147, 1, 4, 30, 
    137, 28, 1, 14, 1, 1, 1, 451), transactions = c(17, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 
    0, 2, 0, 0, 0, 29, 0, 0, 3, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 3), transaction_revenue = c(44183, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1115.52, 0, 0, 1375.67, 
    0, 0, 0, 0, 4198, 0, 0, 0, 59269, 0, 0, 6147, 0, 0, 0, 0, 
    0, 0, 1999, 0, 599, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 2599, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 4199, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2599, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7397), bounces = c(1278, 
    4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 1, 1, 6, 1, 
    23, 1, 3294, 2333, 2, 1, 3, 4, 12, 0, 32, 1, 1, 3, 1, 1, 
    29, 7, 1, 66, 49, 0, 1, 3, 1, 1, 5, 1, 0, 0, 1, 2, 1, 2, 
    3, 0, 1, 2, 1, 4, 6, 0, 1, 4, 9, 5, 61, 2, 6, 1, 115, 20, 
    4, 12, 63, 7, 1, 4, 62, 2, 2, 1, 5, 17, 16, 59, 17, 6, 1, 
    10, 194, 132, 235, 136, 27, 2, 274, 1, 1, 3, 4, 1, 3, 1, 
    1, 0, 1, 1, 3, 1, 1, 1, 2, 1, 2, 0, 2, 7, 0, 1, 1, 1, 1, 
    1, 1, 1, 129, 12, 1, 4, 49, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 
    0, 5, 1, 1, 2, 1, 3, 133, 2, 5, 2, 16, 27, 150, 0, 1, 4, 
    377, 1, 4, 1, 254, 1, 1, 3, 120, 3, 3, 248, 6, 1, 9, 123, 
    2, 47, 1, 4, 340, 1, 4, 3, 244, 7, 557, 2, 2, 2, 117, 1, 
    4, 23, 100, 22, 1, 12, 0, 1, 0, 325)), totals = list(list(
    sessions = "89600", transactions = "247", transactionRevenue = "547126.09", 
    bounces = "73354")), minimums = list(list(sessions = "1", 
    transactions = "0", transactionRevenue = "0.0", bounces = "0")), maximums = list(
    list(sessions = "8355", transactions = "65", transactionRevenue = "145010.41", 
        bounces = "7614")), isDataGolden = TRUE, rowCount = 737L, row.names = c(NA, 
200L), class = "data.frame")

sessioninfo:

> sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=Spanish_Peru.utf8  LC_CTYPE=Spanish_Peru.utf8    LC_MONETARY=Spanish_Peru.utf8
[4] LC_NUMERIC=C                  LC_TIME=Spanish_Peru.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] openxlsx_4.2.5         janitor_2.1.0          forcats_0.5.1          stringr_1.4.0          dplyr_1.0.9           
 [6] purrr_0.3.4            readr_2.1.2            tidyr_1.2.0            tibble_3.1.7           ggplot2_3.3.6         
[11] tidyverse_1.3.1        googleAnalyticsR_1.0.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.8.3              lubridate_1.8.0           assertthat_0.2.1          digest_0.6.29            
 [5] utf8_1.2.2                R6_2.5.1                  cellranger_1.1.0          backports_1.4.1          
 [9] reprex_2.0.1              httr_1.4.3                pillar_1.7.0              rlang_1.0.3              
[13] curl_4.3.2                readxl_1.4.0              rstudioapi_0.13           whisker_0.4              
[17] googleAuthR_2.0.0         munsell_0.5.0             broom_1.0.0               compiler_4.2.1           
[21] modelr_0.1.8              askpass_1.1               pkgconfig_2.0.3           openssl_2.0.2            
[25] tidyselect_1.1.2          fansi_1.0.3               crayon_1.5.1              tzdb_0.3.0               
[29] dbplyr_2.2.1              withr_2.5.0               rappdirs_0.3.3            grid_4.2.1               
[33] jsonlite_1.8.0            gtable_0.3.0              lifecycle_1.0.1           DBI_1.1.3                
[37] magrittr_2.0.3            scales_1.2.0              zip_2.2.0                 cli_3.3.0                
[41] stringi_1.7.6             cachem_1.0.6              fs_1.5.2                  snakecase_0.11.0         
[45] xml2_1.3.3                ellipsis_0.3.2            generics_0.1.3            vctrs_0.4.1              
[49] tools_4.2.1               measurementProtocol_0.1.0 glue_1.6.2                hms_1.1.1                
[53] fastmap_1.1.0             colorspace_2.0-3          gargle_1.2.0              rvest_1.0.2              
[57] memoise_2.0.1             haven_2.5.0               usethis_2.1.6      

CodePudding user response:

As Ritchie's comment suggests, this is happening because bounces/sessions returns a vector.

Your example data only contains one date, so the output of the following only gives one row. However, if you create the bounceRate after the summarise I think you will get what you want.

paid <- df%>%
  group_by(date) %>%
  summarise(sessions = sum(sessions),
            revenue_usd = sum(transaction_revenue)/3.8,
            bounces = sum(bounces),
            transactions = sum(transactions)) %>% 
  mutate(bounceRate = bounces/sessions) 

CodePudding user response:

Almost the same as duncanw but there's a simpler solution which gives the same output columns as your original:

    paid <- df%>%
      group_by(transaction_date) %>% 
      summarise(sessions=sum(sessions),
          revenue_usd= sum(transaction_revenue)/3.8,
          bounceRate=sum(bounces)/sum(sessions),
          transactions = sum(transactions))

CodePudding user response:

I tested it from my side and it resulted same as yours. I am quite amaze and started looking into it. This is what I found,

The calculation of column bounceRate does not have any aggregating function, it just a simple divide, which should be happening/executing at row level and as there is no function given for summarization the result remains at row level.

Let me explain,

paid <- df%>%
group_by(date) %>%
summarise(sessions = sum(sessions),
        revenue_usd = sum(transaction_revenue)/3.8,
        bounceRate = bounces/sessions,
        transactions = sum(transactions))

This code result with all rows intact as bounceRate = bounces/sessions is row operation.

If you comment this row and calculate this will summaries the data into single row as there is only one date 2022-07-13

paid <- df%>%
group_by(date) %>%
summarise(sessions = sum(sessions),
        revenue_usd = sum(transaction_revenue)/3.8,
        # bounceRate = bounces/sessions,
        transactions = sum(transactions))

So in my opinion you can use either sum over this ratio, and if you feel its incorrect you can summaries first then take the ratio. My suggestion would be to use 2nd approach take ratio after summarization. I am putting the code for both below,

Approch 01 :

   paid <- df%>%
   group_by(date) %>%
   summarise(sessions = sum(sessions),
        revenue_usd = sum(transaction_revenue)/3.8,
        bounceRate = sum(bounces/sessions),
        transactions = sum(transactions))

This will give you result

enter image description here

Approch 02 :

paid <- df%>%
group_by(date) %>%
summarise(sessions = sum(sessions),
        revenue_usd = sum(transaction_revenue)/3.8,
        transactions = sum(transactions),
        total_bounces = sum(bounces),
        total_sessions = sum(sessions)) %>%
mutate(bounceRate = total_bounces/total_sessions)

This will result into below result, which I think is more appropriate.

enter image description here

Like/Up vote if this resolves your problem. Happy coding !

  •  Tags:  
  • r
  • Related