Princeton University Library Data and Statistical 
Services

Search DSS





Finding Data Analyzing Data Citing data

About Us


DSS lab consultation schedule
(Monday-Friday)
Sep 4-Nov 3By appt. here
Nov 4-Dec 12Walk-in, 2-5 pm*
Dec 15-Jan 31By appt. here
Feb 2-May 3Walk-in, 1-5 pm*
May 4-May 12Walk-in, 2-5 pm*
May 13-Sep 3By appt.
For quick questions email data@princeton.edu.
*No appts. necessary during walk-in hrs.
Note: the DSS lab is open as long as Firestone is open, no appointments necessary to use the lab computers for your own analysis.

Follow DssData on Twitter
See DSS on Facebook

Home Online Help Statistical Packages Stata Data Management Converting to and from Excel files

Converting to and from Excel and Stata

Converting Excel to Stata
Converting Stata to Excel

Spreadsheet and database programs, such as Excel and Datastream, can save and read files in tab- and comma-delimited format. These files contain one observation per line where the values are separated by tabs or commas. In addition, the first line of the file may contain variable names. Comma-delimited files often have an extension of .csv for "comma-separated values."

Converting Excel to Stata

This is an example of a spreadsheet from Excel. It cannot be brought into Stata without some editing and saving it as a ".csv" file.

The following conventions must be followed for Stata to read your spreadsheet successfully:

  1. The first line should have Stata variable names (32 characters or less, no spaces, or "special characters" except the underscore [_], not starting with an underscore or number) and the second line begins the data.
  2. No blank rows or columns between data.
  3. Missing numeric data should be coded as an empty cell, not a space, dot, or any other non-numeric data. Often, 0, 9, or 99 is used to code missing numeric data; this is fine as long as these are not also valid values for that variable.
  4. Commas in numbers or text are particularly problematic because Stata thinks they are a delimiter and will not read the data properly. You must remove the commas from numeric values before saving the file.
  5. The file must be specifically saved as a "comma separated values" file in Excel. You can do this by going to "File", then "Save As…", then choosing "comma separated values." Simply giving it an extension of ".csv" will not work. When you close the spreadsheet and Excel asks if you want to save the changes, say "No." This is counter-intuitive, but the changes it's asking you about are the changes it needs to make the spreadsheet a regular Excel spreadsheet again.

Once you have completed the changes and you are in Stata, give the command:

    . insheet using filename.csv

to read in the file.

Here are step-by-step instructions for saving an Excel worksheet as a comma-delimited file and reading that file into Stata. Follow these instructions if you have saved your data as an Excel file (.xls) or as a comma-delimited file (.csv).

  1. Invoke Excel and read the file by selecting the File menu, Open.
  2. Prepare the data for conversion.

    a. Make sure that missing data values are coded as blank or as numeric values (e.g., 999 or -1). Do not use periods (.) or any other character values (e.g., N/A) to represent missing data. Simply leave the cells empty or code a number.

    b. Make sure that there are no commas in the numbers. You can change this under Format menu, then select Cells....

    c. Make sure that variable names are included only in the first row of your spreadsheet. There should be only one row of variable names (some files produced by databases have several header rows). Variable names should be 32 characters or less, start with a letter and contain no special characters, such as $ or &, except the underscore [_]. You should eliminate embedded blanks (spaces).

  3. Under the File menu, select Save As. Then Save as type 'CSV' (comma separated values). The file will be saved with a .csv extension, for example "country.csv."
  4. Start Stata. Then issue the following command:
         insheet using country.csv
    
    where country.csv is the name of the comma-delimited file.

One particular thing to be aware of after insheet is that sometimes (for various quirky reasons) Stata will mistakenly read numeric variables as strings. Issue the command describe (abreviated d) and look at the output. The way you can tell from describe output is to look in the "storage type" column. Anything that starts with "str" is a string and anything else is a number. If you see that a variable that is supposed to be numeric has a storage type that starts with "str" you know Stata's messed up and you will need to fiddle with your text file to get it right.

Sample Data

This is what the data should look like in Excel. Notice that variable names are in the first row only, and variable names are 32 characters or less. Notice also that the data values contain no extraneous symbols and that cells with missing data have been left blank.

   country        year    nomdepr   realdeva       infl  changegdp
     CHILE        1981             -.6033136      .1875   .1839487
     CHILE        1982   .8828205  -.6142148   .1052632  -.0267065
     CHILE        1983   .1920196  -.6747882   .2380952   .2571221
    MEXICO        1989   .1578255  -.0900467   .2006079   .2890742
    MEXICO        1990   .1152594  -.1551962   .2658228   .4331929
    MEXICO        1991   .0426428  -.2513676              .2072532
     SPAIN        1990   .1980529  -.3306062   .1549296   .1490039
     SPAIN        1991    .229653  -.3478578   .1463415   .1237474

Here is the same data saved in comma-delimited format. The first row of the file contains the names of the variables. Each subsequent row contains a value for each of the six variables. Each value is separated by a comma.

"country","year","nomdepr","realdeva","infl","changegdp"
"CHILE",1981,,-.6033136,.1875,.1839487
"CHILE",1982,.8828205,-.6142148,.1052632,-.0267065
"CHILE",1983,.1920196,-.6747882,.2380952,.2571221
"MEXICO",1989,.1578255,-.0900467,.2006079,.2890742
"MEXICO",1990,.1152594,-.1551962,.2658228,.4331929
"MEXICO",1991,.0426428,-.2513676,,.2072532
"SPAIN",1990,.1980529,-.3306062,.1549296,.1490039
"SPAIN",1991,.229653,-.3478578,.1463415,.1237474

Notes

The general syntax of the Stata command insheet is:

     insheet using [filename]

where filename is the name of your comma-delimited file. For example:

     insheet using mydata.csv

Since file extensions vary from program to program, you should provide the file extension as part of the filename on the insheet command. File extensions of .txt and .csv are common.

While it is generally easier to convert an Excel file to Stata with the insheet command, there are some cases where you may want to use DBMS/COPY to accomplish the conversion. You should consider using DBMS/COPY if you have a lot of missing data (see step #2 above) or you are converting multiple worksheets, i.e., a workbook with several spreadsheets.For more information please check Excel to Stata: copy-and-paste and Excel to Stata: insheet. For other formats see Data Preparation & Descriptive Statistics

Converting Stata files to Excel

Sometimes you may want to work with a dataset in Excel rather than Stata. To convert a Stata file to Excel, use the "outsheet" command:

     outsheet using mydata.csv,c

This command will create a comma-delimited file called "mydata.csv". This file can be opened in Excel by simply double-clicking on the file's icon. Be sure to use the "csv" extension, as Excel recognizes this by default. Occasionally, when you open the file in Excel, some of the values will not look "right." For example, you may see scientific notation where there was none in the original Stata file. This is due to differences in how Excel and Stata format the values. To correct this, first try using the "Format...Cells" command in Excel. If that does not work, then you must go back to Stata and use Stata's format command. For more information about the format command, see Stata's online help, the Stata manual, or ask a DSS consultant.

This page last updated on: