![]() |
![]() |
Search DSS Finding Data Using Data About Us |
Home Reading Other Text Data Formats - Infile and InfixInfileinfile (free format) may be useful if you have downloaded a data file from the web and it conforms to these specifications:
There is another form of infile (fixed format) that is used with a "dictionary" file to read certain data files with complex formats. See below. infixA fixed-width text file has no "delimiter" (divider) characters. Instead, variable values are placed in the same "columns" (places on the page) for each line of data. Here's an example of how data containing variables for individuals' name, age, and weight would look in a fixed-width text file (fixed-width text files will not have a line for the variable names): John Smith 42250 Mary Johnson 66120 Margaret Veryverylonglastnamed29110 Note that the data are all packed together. The blanks between the names and ages of the first two people are called "filler". They are there for two reasons: one, to leave room for people with longer names (like our third person), and two, to force each value of each variable to occupy the same columns. In this example, we would say that the variable "name" is in columns 1-30, "age" is in columns 31-32, and "weight" is in columns 33-35. Assuming it was named "mydata.txt", the command to read this file with infix would be:
Dictionary filesStata dictionary files can vary in their complexity but all of them basically specify which data file to read, where the variables are located in the file and what they are to be called. It is rare that you will really need to use a dictionary file—in most cases, including the example below, an infix statement will work just as well and is simpler to write. We present this basic information on dictionary files mainly so you will be familiar with them in case you are using data for which the producer provided one. Optionally, labels can also be provided in the dictionary file. Once the dictionary file is written, it is accessed by issuing the Stata infile command:
. infile using [dictionary_file]
where dictionary_file is the name of your Stata dictionary file. In this example, the raw data are saved in a file named "latimes.raw" and the dictionary file is saved in a different file named "latimes.dct." To read the data, you would call the dictionary file:
infile using latimes.dct
Given below is a sample Stata dictionary file:
dictionary using latimes {
_column(13) wght %5f "Weight Value"
_column(22) groups %1f "Ethnic/racial grp"
_column(23) rdir %1f "Cntry in right dir.?"
_column(24) apbush %1f "Do You app. of Bush?"
_column(27) rvote %1f "Are you reg. to vote?"
_column(28) wvote %1f "Vote for/Lean toward?"
_column(29) whyvote %1f "Why this candidate?"
_newline
_column(48) location %1f "Area you live in?"
_column(49) views %1f "Views on political matters?"
_column(50) tparty %1f "Party views?"
_column(52) relign %1f "What religion?"
_column(55) faminc %1f "Family income"
_column(66) agerng %1f "Age range"
_column(69) educ %1f "Education"
_newline
}
The above dictionary starts with the words dictionary using , which defines the file as a Stata dictionary. The name of the file which contains the actual data appears immediately following the word using . In this example, the data file is named "latimes.raw" and the name latimes appears in the dictionary file. The data file extension is assumed to be .raw unless a different file extension is specifically provided. The actual data dictionary is enclosed within curly brackets ({ }). On each line of the dictionary, the _column command is used to move the pointer to the location where the data item starts. After the column locator is the variable name. After the variable name is the format with which the data item is to be read (see the Stata manual for possible formats). After the format, a quoted text string can be provided for a variable label. If there is more than one line of data for each record, a _newline command must be given to advance the pointer to the next line in the data file. |