I have a data set that shows how much someone earns. Generating syntethic data:
set.seed (100)
ddata <- data.frame(amount = rbeta(10000,2,20) * 1000)
ddata <- ddata %>%
group_by(amount) %>%
summarise(proportion = n()) %>%
mutate(Perc = cumsum(100*proportion/sum(proportion)),
reverse = length(.)-Perc)
To see the distribution of the data, it's left skewed.
hist(ddata$amount)
I have created a percentage rank column as 'Perc', that shows me the number of clients that earn X or less than X amount. Below is the code for the chart:
ddata %>%
ggplot()
geom_line(aes(x = amount, y = Perc, color = '#EF010C'))
ylab("% Clients")
xlab("Amount earned")
I need to create the reverse of this chart, in other words I also need to create a chart showing me the number of clients that earn X or more than X. I created column 'reverse' for this purpose, the curve shape seems to be correct but the percentage showing on the y-axis is in negatives. How can I fix this? Any help will be appreciated. Finally, the curve should follow a shape similar to the histogram shown earlier.
Chart with the wrong axis:
ddata %>%
ggplot()
geom_line(aes(x = amount, y = reverse, color = '#EF010C'))
ylab("% Clients")
xlab("Amount earned")
CodePudding user response:
set.seed (100)
ddata <- data.frame(amount = rbeta(10000,2,20) * 1000)
ddata <- ddata %>%
group_by(amount) %>%
summarise(proportion = n()) %>%
mutate(Perc = cumsum(100*proportion/sum(proportion)),
reverse = 100-Perc) #Changed this
ddata %>%
ggplot()
geom_line(aes(x = amount, y = Perc, color = '#EF010C'))
ylab("% Clients")
xlab("Amount earned")
ddata %>%
ggplot()
geom_line(aes(x = amount, y = reverse, color = '#EF010C'))
ylab("% Clients")
xlab("Amount earned")
Like this?
CodePudding user response:
You can get the empirical cumulative distribution of your values by using ecdf
. The inverse is just 100% - the ecdf:
set.seed (100)
ddata <- data.frame(amount = rbeta(10000,2,20) * 1000)
ddata <- ddata %>%
mutate(Perc = ecdf(amount)(amount) * 100,
reverse = 100 - Perc)
ddata %>%
ggplot()
geom_line(aes(x = amount, y = Perc), color = '#EF010C')
ylab("% Clients")
xlab("Amount earned")
ddata %>%
ggplot()
geom_line(aes(x = amount, y = reverse), color = '#EF010C')
ylab("% Clients")
xlab("Amount earned")