assume my original dataframe is :
a b d e
1 1 1 2 1
2 20 30 40 30
3 1 2 6 2
4 40 50 40 50
5 5 5 3 5
6 60 60 60 60
I want to add a percentage row below each row.
a b d e
1 1.00 1.00 2.00 1.00
2 0.79 0.66 1.57 0.66
3 20.00 30.00 40.00 30.00
4 13.51 20.27 27.03 20.27
5 1.00 2.00 6.00 2.00
6 0.66 1.57 3.97 1.57
7 40.00 50.00 40.00 50.00
8 27.03 33.78 27.03 33.78
9 5.00 5.00 3.00 5.00
10 3.94 3.31 2.36 3.31
11 60.00 60.00 60.00 60.00
12 40.54 40.54 40.54 40.54
but as you see, my odd rows get .00
which I do not want.
library(dplyr)
df <- data.frame(a=c(1,20,1,40,5,60),
b=c(1,30,2,50,5,60),
d=c(2,40,6,40,3,60),
e = c(1,30,2,50,5,60))
df <- df %>% slice(rep(1:n(), each=2))
df[seq_len(nrow(df)) %% 2 ==0, ] <- round(100*df[seq_len(nrow(df)) %% 2 ==0,
]/colSums(df[seq_len(nrow(df)) %% 2 ==0, ]),2)
how can I keep my odd rows without decimals?
CodePudding user response:
The problem is that columns in data frames can only hold one type of data. If some of the columns in your data frame have decimals, then the whole column must be of type double
. The only way to change how your data frame appears is via its print method.
Fortunately, you can easily turn your data frame into a tibble
. This is a type of data frame, but prints in such a way that the integers don't have decimal points afterwards.
df
#> a b d e
#> 1 1.00 1.00 2.00 1.00
#> 2 0.79 0.66 1.57 0.66
#> 3 20.00 30.00 40.00 30.00
#> 4 13.51 20.27 27.03 20.27
#> 5 1.00 2.00 6.00 2.00
#> 6 0.66 1.57 3.97 1.57
#> 7 40.00 50.00 40.00 50.00
#> 8 27.03 33.78 27.03 33.78
#> 9 5.00 5.00 3.00 5.00
#> 10 3.94 3.31 2.36 3.31
#> 11 60.00 60.00 60.00 60.00
#> 12 40.54 40.54 40.54 40.54
dplyr::tibble(df)
#> # A tibble: 12 x 4
#> a b d e
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 2 1
#> 2 0.79 0.66 1.57 0.66
#> 3 20 30 40 30
#> 4 13.5 20.3 27.0 20.3
#> 5 1 2 6 2
#> 6 0.66 1.57 3.97 1.57
#> 7 40 50 40 50
#> 8 27.0 33.8 27.0 33.8
#> 9 5 5 3 5
#> 10 3.94 3.31 2.36 3.31
#> 11 60 60 60 60
#> 12 40.5 40.5 40.5 40.5
Created on 2022-04-26 by the reprex package (v2.0.1)
CodePudding user response:
Allan Cameron is right, that a tibble
prints better and does what you want. To offer another solution, though, if you're trying to print something that you might send to a text file (rather than just look at on the screen), you could print the values to character strings as follows:
library(dplyr)
df <- data.frame(a=c(1,20,1,40,5,60),
b=c(1,30,2,50,5,60),
d=c(2,40,6,40,3,60),
e = c(1,30,2,50,5,60))
df %>%
mutate(obs = row_number(),
across(-obs, ~.x/sum(.x)),
type = "pct") %>%
bind_rows(df %>% mutate(obs = row_number(),
type = "raw")) %>%
mutate(type = factor(type, levels=c("raw", "pct"))) %>%
arrange(obs, type) %>%
mutate(across(a:e, ~case_when(
type == "raw" ~ sprintf("%.0f", .x),
TRUE ~ sprintf("%.2f%%", .x*100)))) %>%
select(-c(obs, type))
#> a b d e
#> 1 1 1 2 1
#> 2 0.79% 0.68% 1.32% 0.68%
#> 3 20 30 40 30
#> 4 15.75% 20.27% 26.49% 20.27%
#> 5 1 2 6 2
#> 6 0.79% 1.35% 3.97% 1.35%
#> 7 40 50 40 50
#> 8 31.50% 33.78% 26.49% 33.78%
#> 9 5 5 3 5
#> 10 3.94% 3.38% 1.99% 3.38%
#> 11 60 60 60 60
#> 12 47.24% 40.54% 39.74% 40.54%
Created on 2022-04-26 by the reprex package (v2.0.1)
Also note, I think the percentages you calculated are wrong. When I used your data, I get:
sum(df$a[c(2,4,6,8,10,12)])
#> [1] 86.47
And when I use mine, that are different from yours, I get 100 (if we turn them back into numbers from strings).