I am trying to import data from Excel template into R. Below you can see how is look like table
This table contain title (e.g 2021-2017 which is for Male and Female) for both columns and this makes problems when I try to import data into R. For uploading I used this code of lines
Export_I01 <- data.table(
read.xlsx("Export_I01.xlsx",startRow = 2, endRow = 4, sheetIndex=1, header =T))
and output look like pic below
This is not good input because there is suffix of (e.g Male is first column but also Thrid is Male.1). So can anybody help me how to input this data in format which is same like table from Excel ?
CodePudding user response:
R adds a sufix to differenciate columns with the same name. I'd suggest to change the names of the columns in excel to differenciate males and females by year, like "Male_2021_2017" and "Male_2016_2012".
You could also reformat the table to have only one "Male" and "Female" column and add a "date" column where you specify the years that correspond to each value.
CodePudding user response:
There is no easy way to do is but you can come up with some workarounds. This one uses readxl
, dplyr
, zoo
, and purrr
:
dt <- read_excel("test.xlsx", col_names = F)
dt <- dt %>% mutate(pmap_df(., ~ na.locf0(c(...))))
colnames(dt) <- gsub("\\.NA", "", paste(dt[1,], dt[2,], sep = "."))
dt <- dt %>% filter(row_number() > 2)
Which results in:
# A tibble: 2 x 6
English Country `2021-2017.Male` `2021-2017.Female` `2016-2012.Male` `2016-2012.Female`
<chr> <chr> <chr> <chr> <chr> <chr>
1 1 Czech Republic 5 55 5 5
2 2 Denmark 5 0 0 0
CodePudding user response:
You have to make a choice and that might depend on how flexible you are in changing the format of the excel file. If this is the way it must be formatted or is provided by you and not flexible you have to make a choice.
Choose between a wide or long format of your data.
Wide would mean you end up with 2 rows, with "Male_2021_2017" and "Male_2016_2012" as seperate columns (same for Female), while long would mean you end up with 4 rows with a new group column "year_range" and only having one column for Male and one for Female.
Either way is fine as there are several options to reshape in R between long and wide back and forth (melt/dcast)