I have a text file like this.
Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Toyota 3.5 4.3 6.8 5.3 5.2 4.0 4.6 4.5 4.2 3.7
Honda 7.7 7.4 7.3 7.9 7.0 6.0 6.5 6.5 5.9 5.9
Audi 3.4 4.1 5.3 6.2 5.2 5.1 4.6 4.5 4.2 5.9
Ford 7.7 7.4 7.3 7.2 7.0 6.6 6.5 6.1 5.9 5.7
first column is the variable so I don't know how to do. and I have to make a graph, x-axis is Year and Y-axis is the number range they have.
CodePudding user response:
Your want to use the sep
argument of read.table()
.
car_data <- read.table("path/to/file.txt", sep = " ")
Then, you probably want to transpose this data and pivot it to use in ggplot2
.
car_data <- car_data %>% t() %>% as.data.frame() # transpose
colnames(car_data) <- car_data[1, ] # assign column names
rownames(car_data) <- NULL # remove row names
car_data <- car_data[-1, ] # remove first line
car_data <- car_data %>%
tidyr::pivot_longer(!.data$Year, names_to = "make") %>% # pivot for grouped graph
dplyr::mutate(value = as.numeric(.data$value))
The graph looks like this:
# graph
ggplot2::ggplot(
car_data,
ggplot2::aes(
x = .data$Year,
y = .data$value,
group = .data$make,
color = .data$make)
)
ggplot2::geom_point()
CodePudding user response:
There are a few things to look at here.
Reading this in should be simple, with
read.table(filename, header=TRUE, check.names=FALSE)
. For this example, I'll usedat <- read.table(text = " Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Toyota 3.5 4.3 6.8 5.3 5.2 4.0 4.6 4.5 4.2 3.7 Honda 7.7 7.4 7.3 7.9 7.0 6.0 6.5 6.5 5.9 5.9 Audi 3.4 4.1 5.3 6.2 5.2 5.1 4.6 4.5 4.2 5.9 Ford 7.7 7.4 7.3 7.2 7.0 6.6 6.5 6.1 5.9 5.7", header = TRUE, check.names = FALSE)
Without
check.names=FALSE
, the column names would instead beX2000
,x2001
, etc. This is because R is geared to prefer column names that start with one or more letters before any numbers, to facilitate such things as the$
-operator. For instance, using thedat
above, you cannot dodat$2000
, that will err, instead you can do eitherdat$`2000`
ordat[["2000"]]
. (Had we not used that argument,dat$X2000
works just fine.)We can easily omit
check.names=FALSE
and, after the first stage of reshaping below, remove the leadingX
before converting to integers (usingsub("^X", "", ...)
). Both paths lead to the same result, stick with what you understand/prefer.From here, many tools prefer/recommend to have data in a "long" format. Especially since it appears that your column names (
2000
, etc) are really intended to be integer-data, we can reshape this into the long format:longdat <- reshape2::melt(dat, id.vars = "Year") head(longdat, 2) # Year variable value # 1 Toyota 2000 3.5 # 2 Honda 2000 7.7 names(longdat)[1:2] <- c("Manu", "Year") longdat$Year <- as.integer(longdat$Year) # I'm inferring you may want this, not required longdat # Manu Year value # 1 Toyota 2000 3.5 # 2 Honda 2000 7.7 # 3 Audi 2000 3.4 # 4 Ford 2000 7.7 # 5 Toyota 2001 4.3 # 6 Honda 2001 7.4 # ... # 35 Audi 2008 4.2 # 36 Ford 2008 5.9 # 37 Toyota 2009 3.7 # 38 Honda 2009 5.9 # 39 Audi 2009 5.9 # 40 Ford 2009 5.7
If you want it in a "wide" format, we can reshape it again using
dcast
.widedat <- reshape2::dcast(longdat, Year ~ Manu, value.var = "value") widedat # Year Audi Ford Honda Toyota # 1 2000 3.4 7.7 7.7 3.5 # 2 2001 4.1 7.4 7.4 4.3 # 3 2002 5.3 7.3 7.3 6.8 # 4 2003 6.2 7.2 7.9 5.3 # 5 2004 5.2 7.0 7.0 5.2 # 6 2005 5.1 6.6 6.0 4.0 # 7 2006 4.6 6.5 6.5 4.6 # 8 2007 4.5 6.1 6.5 4.5 # 9 2008 4.2 5.9 5.9 4.2 # 10 2009 5.9 5.7 5.9 3.7
The above is base R. If you prefer a solution including dplyr
, then
library(dplyr)
library(tidyr)
dat %>%
pivot_longer(-Year) %>%
rename(Manu = Year, Year = name) %>%
mutate(Year = as.integer(Year)) %>% # stop here for 'longdat'
pivot_wider(Year, names_from = "Manu", values_from = "value") # this gives 'widedat'
# # A tibble: 10 x 5
# Year Toyota Honda Audi Ford
# <int> <dbl> <dbl> <dbl> <dbl>
# 1 2000 3.5 7.7 3.4 7.7
# 2 2001 4.3 7.4 4.1 7.4
# 3 2002 6.8 7.3 5.3 7.3
# 4 2003 5.3 7.9 6.2 7.2
# 5 2004 5.2 7 5.2 7
# 6 2005 4 6 5.1 6.6
# 7 2006 4.6 6.5 4.6 6.5
# 8 2007 4.5 6.5 4.5 6.1
# 9 2008 4.2 5.9 4.2 5.9
# 10 2009 3.7 5.9 5.9 5.7