I'm trying to import data from a table in SQL Server and then write it into a .txt
file. I'm doing it in the following way. However when I do that all numbers having leading 0 s seems to get trimmed.
For example if I have 000124
in the database, it's shown as 124
in the .txt
as well as if I check x_1 it's 124 in there as well.
How can I avoid this? I want to keep the leading 0 s in x_1 and also need them in the output .txt file.
library(RODBC)
library(lubridate)
library(data.table)
cn_1 <- odbcConnect('channel_name')
qry <- "
select
*
from table_name
"
x_1 <- sqlQuery(channel=cn_1, query=qry, stringsAsFactors=FALSE)
rm(qry)
setDT(x_1)
fwrite(x=x_1, file=paste0(export_location, "file_name", date_today, ".txt"), sep="|", quote=TRUE, row.names=FALSE, na="")
CodePudding user response:
Let x_1
be the result data.table from your SQL query. Then you can convert numeric columns (e.g. value
) to formatted strings using sprintf
to get leading zeros:
library(data.table)
x_1 <- data.table(value = c(1,12,123,1234))
x_1
#> value
#> 1: 1
#> 2: 12
#> 3: 123
#> 4: 1234
x_1$value <- x_1$value |> sprintf(fmt = "d")
x_1
#> value
#> 1: 0001
#> 2: 0012
#> 3: 0123
#> 4: 1234
Created on 2021-10-08 by the reprex package (v2.0.1)
CodePudding user response:
Assuming that the underlying data in the DBMS is indeed "string"-like ...
RODBC::sqlQuery
has the as.is=
argument that can prevent it from trying to convert values. The default is FALSE
, and when false and not a clear type like "date"
or "timestamp"
, RODBC calls type.convert
which will see the number-like field and convert it to integers or numbers.
Try:
x_1 <- sqlQuery(channel=cn_1, query=qry, stringsAsFactors=FALSE, as.is = TRUE)
and that will stop auto-conversion of all columns.
That is a bit nuclear, to be honest, and will stop conversion of dates/times, and perhaps other columns that should be converted. We can narrow this down; ?sqlQuery
says that read.table
's documentation on as.is
is relevant, and it says:
as.is: controls conversion of character variables (insofar as they
are not converted to logical, numeric or complex) to factors,
if not otherwise specified by 'colClasses'. Its value is
either a vector of logicals (values are recycled if
necessary), or a vector of numeric or character indices which
specify which columns should not be converted to factors.
so if you know which column (by name or column index) is being unnecessarily converted, then you can include it directly. Perhaps
## by column name
x_1 <- sqlQuery(channel=cn_1, query=qry, stringsAsFactors=FALSE, as.is = "somename")
## or by column index
x_1 <- sqlQuery(channel=cn_1, query=qry, stringsAsFactors=FALSE, as.is = 7)
(Side note: while I use select * ...
on occasion as well, the presumption of knowing columns by-number is predicated on know all of the columns included in that table/query. If anything changes, perhaps it's actually a SQL view and somebody updates it ... or if somebody changes the order of columns, than your assumptions of column indices is a little fragile. All of my "production" queries in my internal packages have all columns spelled out, no use of select *
. I have been bitten once when I used it, which is why I'm a little defensive about it.)
If you don't know, a hastily-dynamic way (that double-taps the query, unfortunately) could be something like
qry10 <- "
select
*
from table_name
limit 10"
x_1 <- sqlQuery(channel=cn_1, query=qry10, stringsAsFactors=FALSE, as.is = TRUE)
leadzero <- sapply(x_1, function(z) all(grepl("^0 [1-9]", z)))
x_1 <- sqlQuery(channel=cn_1, query=qry, stringsAsFactors=FALSE, as.is = which(leadzero))
Caveat: I don't use RODBC
nor have I set up a temporary database with appropriately-fashioned values, so this untested.