Home > front end >  Plot percentage of clients that earn
Plot percentage of clients that earn

Time:04-14

I have a data set that shows how much someone earns. Generating syntethic data:

set.seed (100)
ddata <- data.frame(amount = rbeta(10000,2,20) * 1000)
ddata <- ddata %>%
  group_by(amount) %>%
  summarise(proportion = n()) %>%
  mutate(Perc = cumsum(100*proportion/sum(proportion)), 
        reverse = length(.)-Perc)

To see the distribution of the data, it's left skewed.

hist(ddata$amount)

I have created a percentage rank column as 'Perc', that shows me the number of clients that earn X or less than X amount. Below is the code for the chart:

ddata %>% 
ggplot()  
    geom_line(aes(x = amount, y = Perc, color = '#EF010C'))  
ylab("% Clients")  
xlab("Amount earned")

I need to create the reverse of this chart, in other words I also need to create a chart showing me the number of clients that earn X or more than X. I created column 'reverse' for this purpose, the curve shape seems to be correct but the percentage showing on the y-axis is in negatives. How can I fix this? Any help will be appreciated. Finally, the curve should follow a shape similar to the histogram shown earlier.

Chart with the wrong axis:

ddata %>% 
ggplot()  
    geom_line(aes(x = amount, y = reverse, color = '#EF010C'))  
ylab("% Clients")  
xlab("Amount earned")

CodePudding user response:

set.seed (100)
ddata <- data.frame(amount = rbeta(10000,2,20) * 1000)
ddata <- ddata %>%
  group_by(amount) %>%
  summarise(proportion = n()) %>%
  mutate(Perc = cumsum(100*proportion/sum(proportion)), 
         reverse = 100-Perc) #Changed this

ddata %>% 
  ggplot()  
  geom_line(aes(x = amount, y = Perc, color = '#EF010C'))  
  ylab("% Clients")  
  xlab("Amount earned")

ddata %>% 
  ggplot()  
  geom_line(aes(x = amount, y = reverse, color = '#EF010C'))  
  ylab("% Clients")  
  xlab("Amount earned")

Like this?

CodePudding user response:

You can get the empirical cumulative distribution of your values by using ecdf. The inverse is just 100% - the ecdf:

set.seed (100)
ddata <- data.frame(amount = rbeta(10000,2,20) * 1000)
ddata <- ddata %>%
  mutate(Perc = ecdf(amount)(amount) * 100,
         reverse = 100 - Perc)

ddata %>% 
ggplot()  
    geom_line(aes(x = amount, y = Perc), color = '#EF010C')  
ylab("% Clients")  
xlab("Amount earned")

enter image description here

ddata %>% 
ggplot()  
    geom_line(aes(x = amount, y = reverse), color = '#EF010C')  
ylab("% Clients")  
xlab("Amount earned")

enter image description here

  • Related