Princeton University Library Data and Statistical 

Search DSS

Finding Data Analyzing Data Citing data

About Us

DSS lab consultation schedule
Sep 1-Nov 4By appt. here
Nov 7-Dec 16Walk-in, 2-5 pm*
Dec 19-Feb 3By appt. here
Feb 6-May 5Walk-in, 1-5 pm*
May 8-May 16Walk-in, 2-5 pm*
May 17-Aug 31By appt. here
For quick questions email data@princeton.edu.
*No appts. necessary during walk-in hrs.
Note: the DSS lab is open as long as Firestone is open, no appointments necessary to use the lab computers for your own analysis.

Follow DssData on Twitter
See DSS on Facebook

Home Online help Analysis Getting Started

Getting Started

Planning Your Analysis

Choice of analysis should be based on the question you want answered. So when planning your analysis, start at the end and work backwards.
  • What conclusion are you trying to reach?
  • What type of analysis do you need to perform in order to demonstrate that conclusion?
  • What type of data do you need to perform that analysis?
You need to start by formulating your research question.

Research Questions

A research question can take many forms. Some research questions are descriptive whereas others focus on explanation. For example, one researcher might want to know,

How has federal funding for the arts in America changed between 1970 and 1990?

Another researcher might want to know,

What predicts individual support for federal funding for the arts in America? Is support for the arts associated with income, education, type of employment or other social, economic, or demographic indicators?

At DSS we can help you answer these types of questions. However, you have to clearly formulate a question or set of questions so we can help you get started.

When looking for data, you need to consider what variables you need, what time periods you need the data to cover, and how the data was collected. Particularly with analysis of economic and financial data, time is an important factor. There are two basic types of time-dependent analyses: cross-section time-series and panel study.

  • Cross-sectional data means that different people, companies or other entities were sampled over the different time periods.
    For example, the Current Population Survey surveys a different random sample of the population each year.
  • Panel data means that the same people, companies or entities were sampled repeatedly.
    Stock exchange data is a good example of this.

Some common types of analyses:

Identify a Study/Data File (locate data, locate codebook)

Once you have identified your research question(s) and have some idea of what kind of analysis might help answer them, you need to find the data that will help you answer your question(s). You might find that you will have to reformulate your question(s) depending on the data that is available.

Different research questions require different types of data. Some research questions require data that you collect yourself through interviews, small surveys, or historical research (qualitative data). Other research questions require secondary analysis of large data sets.

Preparing Your Data

You will probably spend more time getting the data into a usable format than you will actually conducting the analysis. Trying to match data from different sources can be particularly time-consuming, for a variety of reasons:

  • Different record identifiers. For example, CUSIPS are not neccessarily consistent
  • Different time periods. If you have daily data from one source and monthly from another, your analyses may need to be done at the monthly level
  • Different codings. If you have two studies which code education differently, you will need to come up with a consistent scheme

Data management can include merging different data files, selecting sub-sets of observations, recoding variables, constructing new variables, or adjusting data for inflation across years.

Resources at Other Sites

This page last updated on: