I would like to add my own data into a dataframe, and eventually use that data to generate a scatter plot.
But I am having trouble adding 2D points in R. It seems to me that the data for x-axis and y-axis has to be added separately. However, I have multiple points with the same x-coordinate but different y-coordinates.
I have figured out a way to input my data as shown below. But surely there has to be a more efficient way?
i1 <- seq (85,85, length.out=5)
c1 <- c(55, 62, 61, 73, 76)
i2 <- seq (105,105, length.out=6)
c2 <- c(64, 72, 73, 82, 87, 88)
CodePudding user response:
It's easiest to put data into long format for plotting. Here, I convert all data to numeric, then I pivot to long format, and we can just drop the rows that have NA
(i.e., the ones that were blank). I also do not save the column names, but can keep them if important for color coding the points.
library(tidyverse)
df %>%
mutate(across(everything(), ~ as.numeric(.))) %>%
pivot_longer(-x, names_to = NULL, values_to = "y", values_drop_na = TRUE) %>%
ggplot(aes(x = x, y = y))
geom_point()
Output
Data
df <- structure(list(x = c(85, 105, 122), y1 = c(55, 64, 79), y2 = c(62,
72, 84), y3 = c("", "88", "")), class = "data.frame", row.names = c(NA,
-3L))
CodePudding user response:
The most normal is to introduce data by col:
# several ways to create obs
df = data.frame(X = c(85,105,122,143,162,182,203,224,242,262),
Y_1 = 1:10,
Y_2 = rep(5,10),
Y_3 = c(NA,88,NA,113,125,140,NA,160,189,182))
X Y_1 Y_2 Y_3
1 85 1 5 NA
2 105 2 5 88
3 122 3 5 NA
4 143 4 5 113
5 162 5 5 125
6 182 6 5 140
7 203 7 5 NA
8 224 8 5 160
9 242 9 5 189
10 262 10 5 182
And then we can transform it to longer shape:
library(tidyr)
df_longer = df %>% pivot_longer(!X)
# A tibble: 30 × 3
X name value
<dbl> <chr> <dbl>
1 85 Y_1 1
2 85 Y_2 5
3 85 Y_3 NA
4 105 Y_1 2
5 105 Y_2 5
6 105 Y_3 88
7 122 Y_1 3
8 122 Y_2 5
9 122 Y_3 NA
10 143 Y_1 4
# … with 20 more rows
Scatter plot using base plot:
plot(df_longer$X, df_longer$value)
Using ggplot:
library(ggplot2)
ggplot(df_longer)
geom_point(aes(x = X, y = value, col = name))