Finding Data Citing data
DSS lab consultation schedule
*No appts. necessary during walk-in hrs.
Note: the DSS lab is open as long as Firestone is open, no appointments necessary to use the lab computers for your own analysis.
Converting to and from Excel and Stata
Spreadsheet and database programs, such as Excel and Datastream, can save and read files in tab- and comma-delimited format. These files contain one observation per line where the values are separated by tabs or commas. In addition, the first line of the file may contain variable names. Comma-delimited files often have an extension of .csv for "comma-separated values."
This is an example of a spreadsheet from Excel. It cannot be brought into Stata without some editing and saving it as a ".csv" file.
The following conventions must be followed for Stata to read your spreadsheet successfully:
Once you have completed the changes and you are in Stata, give the command:
to read in the file.
Here are step-by-step instructions for saving an Excel worksheet as a comma-delimited file and reading that file into Stata. Follow these instructions if you have saved your data as an Excel file (.xls) or as a comma-delimited file (.csv).
One particular thing to be aware of after insheet is that sometimes (for various quirky reasons) Stata will mistakenly read numeric variables as strings. Issue the command describe (abreviated d) and look at the output. The way you can tell from describe output is to look in the "storage type" column. Anything that starts with "str" is a string and anything else is a number. If you see that a variable that is supposed to be numeric has a storage type that starts with "str" you know Stata's messed up and you will need to fiddle with your text file to get it right.
This is what the data should look like in Excel.
Notice that variable names are in the first row only,
and variable names are 32 characters or less.
Notice also that the data values contain no
extraneous symbols and that cells with missing data
have been left blank.
Here is the same data saved in
The first row of the file contains the names of the variables.
Each subsequent row contains a value for each of the six variables.
Each value is separated by a comma.
The general syntax of the Stata command insheet is:
insheet using [filename]
where filename is the name of your comma-delimited file. For example:
insheet using mydata.csv
Since file extensions vary from program to program, you should provide the file extension as part of the filename on the insheet command. File extensions of .txt and .csv are common.
While it is generally easier to convert an Excel file to Stata with the insheet command, there are some cases where you may want to use DBMS/COPY to accomplish the conversion. You should consider using DBMS/COPY if you have a lot of missing data (see step #2 above) or you are converting multiple worksheets, i.e., a workbook with several spreadsheets.For more information please check Excel to Stata: copy-and-paste and Excel to Stata: insheet. For other formats see Data Preparation & Descriptive Statistics
Sometimes you may want to work with a dataset in Excel rather than Stata. To convert a Stata file to Excel, use the "outsheet" command:
outsheet using mydata.csv,c
This command will create a comma-delimited file called "mydata.csv". This file can be opened in Excel by simply double-clicking on the file's icon. Be sure to use the "csv" extension, as Excel recognizes this by default. Occasionally, when you open the file in Excel, some of the values will not look "right." For example, you may see scientific notation where there was none in the original Stata file. This is due to differences in how Excel and Stata format the values. To correct this, first try using the "Format...Cells" command in Excel. If that does not work, then you must go back to Stata and use Stata's format command. For more information about the format command, see Stata's online help, the Stata manual, or ask a DSS consultant.