Home > database >  How to import text file in R?
How to import text file in R?

Time:03-19

I have a text file like this.

Year       2000  2001  2002  2003  2004  2005  2006  2007  2008  2009 
Toyota     3.5   4.3   6.8   5.3   5.2   4.0   4.6   4.5   4.2   3.7
Honda      7.7   7.4   7.3   7.9   7.0   6.0   6.5   6.5   5.9   5.9
Audi       3.4   4.1   5.3   6.2   5.2   5.1   4.6   4.5   4.2   5.9
Ford       7.7   7.4   7.3   7.2   7.0   6.6   6.5   6.1   5.9   5.7

first column is the variable so I don't know how to do. and I have to make a graph, x-axis is Year and Y-axis is the number range they have.

CodePudding user response:

Your want to use the sep argument of read.table().

car_data <- read.table("path/to/file.txt", sep = " ")

Then, you probably want to transpose this data and pivot it to use in ggplot2.

car_data <- car_data %>% t() %>% as.data.frame() # transpose
colnames(car_data) <- car_data[1, ] # assign column names
rownames(car_data) <- NULL # remove row names
car_data <- car_data[-1, ] # remove first line
car_data <- car_data %>%
  tidyr::pivot_longer(!.data$Year, names_to = "make") %>% # pivot for grouped graph
  dplyr::mutate(value = as.numeric(.data$value))

The graph looks like this:

# graph
ggplot2::ggplot(
  car_data,
  ggplot2::aes(
    x = .data$Year,
    y = .data$value,
    group = .data$make,
    color = .data$make)
  )  
ggplot2::geom_point()

CodePudding user response:

There are a few things to look at here.

  1. Reading this in should be simple, with read.table(filename, header=TRUE, check.names=FALSE). For this example, I'll use

    dat <- read.table(text = "
    Year       2000  2001  2002  2003  2004  2005  2006  2007  2008  2009 
    Toyota     3.5   4.3   6.8   5.3   5.2   4.0   4.6   4.5   4.2   3.7
    Honda      7.7   7.4   7.3   7.9   7.0   6.0   6.5   6.5   5.9   5.9
    Audi       3.4   4.1   5.3   6.2   5.2   5.1   4.6   4.5   4.2   5.9
    Ford       7.7   7.4   7.3   7.2   7.0   6.6   6.5   6.1   5.9   5.7",
    header = TRUE, check.names = FALSE)
    

    Without check.names=FALSE, the column names would instead be X2000, x2001, etc. This is because R is geared to prefer column names that start with one or more letters before any numbers, to facilitate such things as the $-operator. For instance, using the dat above, you cannot do dat$2000, that will err, instead you can do either dat$`2000` or dat[["2000"]]. (Had we not used that argument, dat$X2000 works just fine.)

    We can easily omit check.names=FALSE and, after the first stage of reshaping below, remove the leading X before converting to integers (using sub("^X", "", ...)). Both paths lead to the same result, stick with what you understand/prefer.

  2. From here, many tools prefer/recommend to have data in a "long" format. Especially since it appears that your column names (2000, etc) are really intended to be integer-data, we can reshape this into the long format:

    longdat <- reshape2::melt(dat, id.vars = "Year")
    head(longdat, 2)
    #     Year variable value
    # 1 Toyota     2000   3.5
    # 2  Honda     2000   7.7
    names(longdat)[1:2] <- c("Manu", "Year")
    longdat$Year <- as.integer(longdat$Year) # I'm inferring you may want this, not required
    longdat
    #      Manu Year value
    # 1  Toyota 2000   3.5
    # 2   Honda 2000   7.7
    # 3    Audi 2000   3.4
    # 4    Ford 2000   7.7
    # 5  Toyota 2001   4.3
    # 6   Honda 2001   7.4
    # ...
    # 35   Audi 2008   4.2
    # 36   Ford 2008   5.9
    # 37 Toyota 2009   3.7
    # 38  Honda 2009   5.9
    # 39   Audi 2009   5.9
    # 40   Ford 2009   5.7
    
  3. If you want it in a "wide" format, we can reshape it again using dcast.

    widedat <- reshape2::dcast(longdat, Year ~ Manu, value.var = "value")
    widedat
    #    Year Audi Ford Honda Toyota
    # 1  2000  3.4  7.7   7.7    3.5
    # 2  2001  4.1  7.4   7.4    4.3
    # 3  2002  5.3  7.3   7.3    6.8
    # 4  2003  6.2  7.2   7.9    5.3
    # 5  2004  5.2  7.0   7.0    5.2
    # 6  2005  5.1  6.6   6.0    4.0
    # 7  2006  4.6  6.5   6.5    4.6
    # 8  2007  4.5  6.1   6.5    4.5
    # 9  2008  4.2  5.9   5.9    4.2
    # 10 2009  5.9  5.7   5.9    3.7
    

The above is base R. If you prefer a solution including dplyr, then

library(dplyr)
library(tidyr)
dat %>%
  pivot_longer(-Year) %>%
  rename(Manu = Year, Year = name) %>%
  mutate(Year = as.integer(Year)) %>%                           # stop here for 'longdat'
  pivot_wider(Year, names_from = "Manu", values_from = "value") # this gives 'widedat'
# # A tibble: 10 x 5
#     Year Toyota Honda  Audi  Ford
#    <int>  <dbl> <dbl> <dbl> <dbl>
#  1  2000    3.5   7.7   3.4   7.7
#  2  2001    4.3   7.4   4.1   7.4
#  3  2002    6.8   7.3   5.3   7.3
#  4  2003    5.3   7.9   6.2   7.2
#  5  2004    5.2   7     5.2   7  
#  6  2005    4     6     5.1   6.6
#  7  2006    4.6   6.5   4.6   6.5
#  8  2007    4.5   6.5   4.5   6.1
#  9  2008    4.2   5.9   4.2   5.9
# 10  2009    3.7   5.9   5.9   5.7
  •  Tags:  
  • r
  • Related