Search DSS

DSS

Finding Data

Analyzing Data

Software

Stata
R

Getting Started
Consultants

Citing data

About Us

DSS lab consultation schedule
(Monday-Friday)

Sep 1-Oct 16	By Zoom appt. here
Oct 19-Dec 4	Virtual Zoom Walk-ins
Dec 7-Jan 31	By Zoom appt. here
Feb 1-April 30	Virtual Zoom Walk-ins
May 3-Aug 31	By appt. here

For quick questions email data@princeton.edu.
*No appts. necessary during walk-in hrs.
Note: the DSS lab is open as long as Firestone is open, no appointments necessary to use the lab computers for your own analysis.

Home Online Help Statistical Packages S-Plus Reading in Data

Reading in Data

Reading in SAS Formated Data
Reading Data Using read.table():

Data comes in many formats, and Splus has a number of different commands for reading in different formats. I'll talk about two commands here: sas.get and read.table.

Reading in SAS formated data

If you have data in sas format, the command sas.get is handy because it involves no extra formating problems or anything like that. You can also use a package like DBMS/Copy to transfer data from other formats to sas format. To use the command you type:

sas.get("directoryname", "filename")

The only complicated part is that you have to specify the directory exactly. For example, to read in a sas data file called data.ssd01, type:

sas.get("/u/fred/dss", "data").

To find out what directory you are in, type pwd at the unix prompt, or !pwd from within S-Plus. This command means print working directory (in unix-ese, print means display on the screen). Also note that you do not include the entire file name of the sas file, only the part before the extension .ssd01.

Reading in data using read.table()

Reading in data using read.table() is the most common way to get data. This command is used for reading in raw data stored as text in a file.

The data may be organised in any one of several formats, including free-format, fixed format, or delimited. Now you will learn to read in all three with the read.table() command.

Free-format Data

Free-format data is in columns and has some amount of space between the columns. This is the way the data looks in Figure 1.

Fig. 1

country         ud      sd	
Sweden		82.4	111.84 
Israel		80	73.17	
Iceland		74.3	17.25	
Finland		73.3	59.33	
Belgium		71.9	43.25	
Denamrk		69.8	90.24	
Ireland		68.1	0	
Austria		65.6	48.67	
NZ		59.4	60	
Norway		58.9	83.08	
Australia	51.4	33.74	
Italy		50.6	0	
UK		48	43.67	
Germany		39.6	35.33	
Netherlands	37.7	31.50	
Switzerland	35.4	11.87	
Canada		31.2	0	
Japan		31	1.92	
France		28.2	8.67	
USA		24.5	0

If this data was in a file called freeformat.dat, you would read this data with the command:

data <- read.table("freeformat.dat")

Now you should be able to type the name of the object, data, and see the data you've inputed. You will note that S-Plus knows to treat ud and sd as header information, and not variables. In some cases, S-Plus will not recognize the first line as a header, but you can force it do so:

data <- read.table("freeformat.dat", header=TRUE)

Since the first column contains only characters which are always different and of varying widths, S-Plus knows that this is not really a variable, but instead tags for the data set, each one corresponding to a country in this case.

Fixed-format Data

Another very common way to get data is in fixed-format. Fixed format means that the data has no column headings and that all the data is laid out in specific columns of a specific width. This is a very common way to get data and you should know how to read it in. Imagine if the data from above looked like it does in Figure 2.

Fig. 2

Sweden82.4111.84
Israel80  73.17	
Icelan74.317.25	
Finlan73.359.33	
Belgiu71.943.25	
Denamr69.890.24	
Irelan68.10	
Austri65.648.67	
NZ    59.460	
Norway58.983.08	
Austra51.433.74	
Italy 50.60	
UK    48  43.67	
German39.635.33	
Nether37.731.50	
Switze35.411.87	
Canada31.20	
Japan 31  1.92	
France28.28.67	
USA   24.50

Fixed-format data can be read in with the read.table command as well. Here's an example using the above data.

data2 <- read.table("fixeddata.dat",
col.names=c("country", "ud", "sd"),
sep = c(1,7,11))

This identifies the data file for S-Plus, and tells it that there are three columns - which get the three lables specified in col.names - and that the separators for the data occur in columns 1, 7, and 11 - that is, columns 1 through 6 contain the first variable, 7 through 10 the second variable, and 12 through the end (17) the last variable.

Note the specific usage of the subcommand c when you're defining the variable names and the separators. This is standard S-Plus usage. It stands for concatenate, or combine, and it means just that - make the variable names for this data set a combination of these three names.

Delimited Data

A final common way we might see data is in delimited format. Delimited format means that some delimiter (often a comma or tab) separates different variables and a newline indicates a new observation or case. Excel spreadsheets can be saved to this format: choose "Save file as" and then select "csv" from the drop-down list of file types. Figure 3 gives an example of this sort.

Fig. 3

country,ud,sd	
Sweden,82.4,111.84 
Israel,80,73.17
Iceland,74.3,17.25
Finland,73.3,59.33
Belgium,71.9,43.25
Denamrk,69.8,90.24
Ireland,68.1,0
Austria,65.6,48.67
NZ,59.4,60
Norway,58.9,83.08
Australia,51.4,33.74
Italy,50.6,0
UK,48,43.67
Germany,39.6,35.33
Netherlands,37.7,31.50
Switzerland,35.4,11.87
Canada,31.2,0
Japan,31,1.92
France,28.2,8.67
USA,24.5,0

Here is the command syntax to read in this data.

data <- read.table("delimit.dat", sep = ",")

Note that the separator command sep now tells S-Plus that each field is separated by a comma and each case (country) by a new line. If you receive data that is delimited in any other way (be it space, comma, tab, or even a character), you simply replace that value for the comma in the sep= subcommand. If any of this is confusing, type ?read.table to get help on the read.table() command.

Inputting Your Own Data

Before we go on, let's cover a couple commands for inputting your own data which you might find useful.

c(): Concatenates values into a vector. Examples: c(1,5,10), c("me", ÿou", "them"), and c(T,F,F,T,T) will produce a numeric, character, and logical vector in turn.
n1:n2: Produces a vector with values ranging from n1 to n2. Skip value is a single unit. For example, 1:5 produces the same output as c(1,2,3,4,5)

This page last updated on: