Home > other >  import CSV in another language to SAS
import CSV in another language to SAS

Time:02-10

I am attempting to import a CSV file which is in French to my US based analysis. I have noticed several issues in the import related to the use of accents. I put the csv file into a text reader and found that the data look like this

example from text file

I am unsure how to get rid of the [sub] pieces and format this properly.

I am on SAS 9.3 and am unable to edit the CSV as it is a shared CSV with French researchers. I am also limited to what I can do in terms of additional languages within SAS because of admin rights.

I have tried the following fixes:

data want(encoding=asciiany);
set have;
comment= Compress(comment,'0D0A'x);
comment= TRANWRD(comment,'0D0A'x,'');
comment= TRANWRD(comment,'0D'x,'');
comment= TRANWRD(comment,"\u001a",''); 

How can I resolve these issues?

CodePudding user response:

While this would have been a major issue a few decades ago, nowadays, it's very simple to determine the encoding and then run your SAS in the right mode.

First, open the CSV in a text editor, not the basic Notepad but almost any other; Notepad is free, for example, or Ultraedit or Textpad, on Windows, or on the Mac, BBEdit, or several others will do. I'll assume Notepad for the rest of this answer, but all of them have some way of doing this. If you're in a restricted no-admin-rights environment, good news: Notepad can be installed in your user folder with no admin rights (or even on a USB!). (Also, an advanced text editor is a vital data science tool, so you should have one anyway.)

In Notepad , once you open the file there will be an encoding in the bottom right: "UTF-8", "WLATIN1", "ASCII", etc., depending on the encoding of the file. Look and see what that is, and write it down.

Once you have that, you can try starting SAS in that encoding. For the rest of this, I assume it is in UTF-8 as that is fairly standard, but replace UTF-8 with whatever the encoding you determined. earlier.

See this article for more details; the instructions are for 9.4, but they have been the same for years. If this doesn't work, you'll need to talk to your SAS administrator, and they may need to modify your SAS installation.

You can either:

  • Make a new shortcut (a copy of the one you run SAS with) and add -encoding UTF-8 to the command line
  • Create a new configuration file, point SAS to it, and include ENCODING=UTF-8 in the configuration file.

Note that this will have some other impacts - the datasets you create will be encoded in UTF-8, and while SAS is capable of handling that, it will add some extra notes to the log and some extra time if you later do work in non-UTF8 SAS with this, or if you use non-UTF8 SAS datasets in this mode.

CodePudding user response:

This worked:

data want;
  array f[8] $4 _temporary_ ('ä'  'ö'  'ü'  'ß'  'Ä'  'Ö'  'Ü'  'É');
  array t[8] $4 _temporary_ ('ae' 'oe' 'ue' 'ss' 'Ae' 'Oe' 'Ue' 'E');
  set have;
  newvar=oldvar;

newvar = Compress(newvar,'0D0A'x);
newvar = TRANWRD(newvar,'0D0A'x,'');
newvar = TRANWRD(newvar,'0D'x,'');
newvar = TRANWRD(newvar,"\u001a",'');
newvar = compress(newvar, , 'kw');
  do _n_=1 to dim(f);
    d=tranwrd(d, trim(f[_n_]), trim(t[_n_]));
  end;
run;
  • Related