navigation bar

6. How to obtain information from surveys

28 February 2024

Of all of the methods used for evaluation, the sample survey is probably the most common.  The reason is fairly simple:  if program and project evaluation involves changes in the beneficiary population, then only obtaining information from that population will permit evaluation.  When the population is large, the information has to be obtained from a sample.  If the information is about personal behavior, or something that can only be reported by individuals, the main instrument will be a questionnaire.

In using surveys for evaluation, there are three considerations to address: who should be surveyed, how do you get a representative sample, and what kind of questions should be asked.  These issues are addressed to different degrees in the readings.

Who should be surveyed?

There are two classes of persons (or institutions) who could be surveyed in an evaluation: the intended beneficiaries of a program (the people who are expected to experience the change) and the intermediaries who deliver the services and who can provide information on outcomes (since the outcomes are usually a change in their behavior or knowledge). Taken together they are called stakeholders. If the program is designed carefully, this should be fairly obvious.

Intended beneficiaries

Some examples of intended beneficiaries of different programs are:

The population of intended beneficiaries has to be clearly defined so that you can tell who should be included in it.

Service Providers

When the focus is on the service providers (who are expected to change as part of achieving outcomes), some examples might include:

In this case, for each expected outcome, you should be able to define who is expected to change.

How do you get a representative sample?

If there are only a few beneficiaries (for example, you are surveying institutions and there are only a limited number of them), you can survey all of them.  (Even this is a sample, although it is a 100% sample.)  Even if the number is large, you can send questionnaires to all of them, but the rate of return is usually lower.  For example, you could send questionnaires to all graduates of training classes, but not all of them might respond.

The dilemma with this approach is that you will not know how representative your sample (the returned questionnaires) is.  In a town I lived in for a number of years in upstate New York , I was involved with drafting a comprehensive plan to guide land use.  As part of the process, a previous committee sent a survey to all town residents.  Some 25 percent of the residents returned their questionnaires, which is a very high rate of return for a mailed-out questionnaire.  Someone then asked, how representative are the replies?  The answer is:  they are representative of the people who cared to return the questionnaires, but you cannot in good faith tell whether they represent the views of those who did not return the questionnaire. In that case, the people who returned the questionnaire were mostly weekend and summer residents who opposed a development project (a golf course and resort on top of a mountain) that was a focus of the plan. It turned out that those who did not return the questionnaire were long-term residents, mostly local in origin, who favored the development project. The survey did not expose this difference, which emerged in the debates about the plan which I, as chair of the planning committee, had to moderate. Not a pleasant experience.

In order to reduce cost, and permit an estimate of probable error, you usually need to draw a sample from the universe (of those with whom you are concerned).  A sample is representative if it includes a random element.  An element is random if it is completely by chance (i.e. no possibility of influencing it).

In sampling theory, probable error is computed for what are called simple random samples, when a number of cases are selected form the universe through a random means (a table of random numbers, or a random number generator, for example).  The error factors have been computed mathematically and, thanks to the wonders of the Internet, there are sites that can help you compute how large a sample you would need to achieve a given level of error (confidence level).  For example, if you wanted to have a 5% confidence limit (i.e. that a figure of 50 percent was between 45 and 55), 99 out of 100 samples, you would need to have a sample of 666 [that is not a devilish number].

In fact, almost no one uses simple random samples since, if you know something about the composition of your universe, you can draw your sample in proportion to those factors and either decrease the confidence level (a smaller percent) or have the same confidence level with a much smaller sample size.  These are what are called stratified samples.

For example, if you know the breakdown of your universe by some factor that you consider important (like geography, or professional specialization), you break the universe into sub-universes and sample from each in proportion.  Examples can be found in the readings.

The key to random sampling is the use of some chance factor.  As noted, you can use sophisticated techniques like random number generators (which are found on most spreadsheet programs like Microsoft Excel these days), or you can use low-tech solutions.  For example if you want to draw a sample of 100 out of a population of 1000, and you have the list of names, you can take every 10th name, as long as you start at a random point.  The easiest way to do that is to put 10 numbers on individual pieces of papers and put them in a hat, shake and then have someone (preferably a small child -- representing innocence) pick out one and start from that point.

What kinds of questions should be asked?

Even if the sample selection is perfect, the value of information obtained from a survey will be determined by the quality of the questions asked.  All questionnaires are based on the principle of stimulus and response, where the question is designed to stimulate a response that can be analyzed.  The quality of the response will be constrained by the question that is asked.

A good question not only obtains the information that you think it does (this is called validity -- it measures what it purports to), but it discriminates between people (not everyone answers in the same category) and people who answer the question answer it with the same understanding of what was asked (this is called reliability).

The range of approaches ranges from the very simple and factual to the very complex.  On the simple side, the questions can be straightforward and factual: "How often did you use the techniques that you learned in the training course during the past year?"  "What was the most useful technique or approach you learned?"  The key to this question is to make sure that the stimulus specifies exactly what you are asking about (the techniques from the training course) and the time frame (over the past year).

If the range of possible answers is known, you can simplify recording by including the responses as a list. 

How often did you use the techniques that you learned in the training course during the past year?

1.      Every day

2.      At least once a week

3.      At least once very two weeks

4.      At least once a month

5.      Less than once a month

6.      Never

In preparing either coding for open-ended questions, or in preparing fixed responses, the same rules as you applied in a codebook have to be used:  the categories must include all of the responses you think possible (or want to track) and each category must be exclusive of all others.

Much survey research has sought to tap what are called "attitudes" which, rather than facts ("I used the technique every day"), reflect opinions or feelings ("I liked the training").  These are used much more sparingly in evaluation research than in general survey research.  For example, we are less likely to care whether a respondent liked the training course than whether she used the techniques in which she was trained.

Similarly, much of survey research tries to tap future behavior ("if the election were held today, for whom would you vote?" "if you had a million dollars, what would be the first thing you would buy?")  These are not always (or in fact, some critics would say, infrequently) good predictors of the future behavior since they do not take into account changes in the context.  In evaluation research, we are looking at what has happened in the past or present, so these questions are almost never used.  (It is more important to know that a trainee used the technique than that she thinks she might use it in the future.)

Despite these limitations, you can use some imagination in designing questions.  An example is a type of question that I tested some years ago.  One of the objectives of community development programs supported by the United Nations was to promote "political efficacy" among poor people.  This was defined as the belief that they could influence government decisions.  We could have asked "do you believe you can influence government decisions?" but we felt that this blunt a question would be difficult to interpret (since it would be affected by the environment or the context).  For example, if I asked you, "could you influence the decision of the US government to go to war in Iran?", you might answer differently than if I asked you "could you influence the Town's comprehensive plan?"

The question would also not give us much information as to why you answered one way or another.  From cognitive theory, we assumed that a person who was able to identify a problem in a rational way was more likely to be able to find a way to address it than someone who could not do this.

So, we constructed a series of questions:

1.      What is the main problem that your community is facing?

2.      What is the main cause of that problem?

3.      What are the possible solutions that you have considered?

4.      What is the best solution for the problem?

5.      Can you contribute to the solution?

6.      What could you do?

While the specific problem addressed was interesting, there could be a wide range of answers and we were less interested in the content of the answers than the extent to which there was a complete thought process.  So we coded five of the six questions together as a logical chain:

  1. Complete chain (identified problem, identified cause, identified possible solutions, chose a solution and indicated possible contribution)
  2. Incomplete chain (one of the elements is missing)
  3. No chain (could not go beyond the first question)

We found that there was a correlation between having participated in community development programs and the ability to construct a complete chain and, as a consequence, believe that one could influence the solution.  It was also correlated with broader indicators (like the question, "Do you believe you can influence government.")

Again, the key to questionnaire design in evaluation research is to know what, based on the objectives and intended outcomes of a program or project, you should observe in the beneficiary or service-provider population.

Analysis of surveys

The analysis of surveys is complex because, on the whole, they do not pick up context (since they are directed to individuals).  This can be overcome in part by including information about context (e.g. type of ministry, type of country, number of years a program has been running) in the questionnaire and correlating this with individual responses, but it is often insufficient.  One way to work around this is to use a "case study" method to compare individual responses with more complete data about context and process.  This is what we will take up next week.