5 February 2024
Once you have determined what ought to be evaluated in a program or project, you next stage is to determine what to measure and how to do it.
The essential concepts for evaluation are performance indicators. In results-based programming, these are supposed to be specified so that Member States can be shown what will be used to validate that results have been achieved.
They have proven to be remarkably difficult for many organizations to produce successfully. Part of the problem goes back to the issue of objectives and outcomes, where the programs really have not thought through what the end-state they are seeking at the end of a period would look like. Without that, there would be a disconnect between what was observed (the performance indicator) and the intended end-state.
Performance has often been thought of as the production of output. Managers assumed that they had performed if they produced books, ran training courses, sent inspection missions or spent the allotted funds. This was abetted in some organizations by monitoring systems that were based on counting outputs. In some respect that was an example of the principle of the drunkard's search, which was popularized by Abraham Kaplan in his seminal work on the philosophy of science, the Conduct of Inquiry. It is stated as follows: "There is a story of a drunkard searching under a street lamp for his house key, which he had dropped some distance away. Asked why he didn't look where he dropped it, he replied 'It's lighter here!'" [1]
It was easier to count the output (over which the managers had control) than to measure whether the output had led to anything (over which the managers had influence but not control).
In fact, the reason that the concept of an outcome has come into use has been to focus the search for the house key out in the dark. Rather than looking at whether a document was produced, you look to see whether the produced document was used. Rather than looking at how many training courses were held, you look to see whether the persons trained used their training.
You should now be ready to fill in the objective, specific objective, outcome and output boxes in the logframe template.
Once the outcome has been defined, I have argued that establishing a performance indicator is easy: either the outcome has happened or it hasn't. And you can see whether it happened.
Performance Indicators (PI's)
When you have determined your outcomes, in effect you have also determined your performance indicators. Performance indicators simply tell you what you will have to observe to prove that the outcomes happened. Still, you have to submit them to certain tests.
This process brings us the answer to the larger question, “What
do we have to know?”
The substance of the evaluation will be
finding answers to the questions derived from the official performance
indicators. This part of the exercise deals with the scientific question of validity, being sure that you are
measuring what you think you are. The rest of the planning phase consists of
determining how to answer the questions. This deals with the scientific
question of reliability. And,
we start by establishing the information acquisition strategy. This means
deciding the source of the performance indicator, and then how to extract
information from the source.
Of course, defining the performance measure is itself only a first step. The next step is to determine how and where to measure it.
Once you have determined what you have to measure, the fun begins. This is the stage of deciding how you are going to obtain information. There are a number of issues here:
If evaluation costs too much relative to the program, there is probably something wrong either with the program or the evaluation. While there are no clear rules of thumb, I've always assumed that for a large project you should set aside about 1 percent of the project's budget for monitoring and evaluation. (For a million dollar project, that would amount to $10,000, for a $10 million project it would be $100,000). That isn't a hard and fast rule, however, and how much you should devote to evaluation also is related to the importance of the activity. (For example, a pilot project might well justify having a more expensive evaluation to be able to prove the worth of the new approach. Or a project that was under attack for political reasons might justify a major expenditure to prove that it was working and how to improve it.)
The importance of evaluation in holding managers in a results-based programming system accountable also means that there are no hard and fast cost rules. Here, however, the choice of models (using the IAEA approach of simple, standard and extended) depends on the extent to which there are questions raised about effectiveness.
The PIs ask the question “What do we have to know?” The performance indicators tell you what information you will need to be able to observe and will point in the direction that the information can be found. The source has to be accessible in the sense that manager can use it. For example, there may be very good data in a confidential national database, but if the organization doing the evaluation cannot have access to it, it is not very useful. Similarly, there might be very good information in a commercial database like Lexus/Nexus or Google scholar, but if it costs money and the organization doesn’t have a subscription, it isn’t realistic to use it.
For most performance measures, the data source is almost self-defined (and, in fact, part of the definition of performance measures is being assured that it can be observed). The issue becomes important when there are multiple possible sources. In that case, some common sense rules can be applied.
The first rule is that the data source should be directly related to the measure. For example, if the performance measure is whether a resolution is adopted, the source of data would be the texts of resolutions. It is more complex when the performance measure is "use", because you have to answer the question "where do I expect it to be used?"
The second rule is that it should be non-intrusive, if at all possible. (See the discussion of intrusive and non-intrusive measures in the IAEA Guide.) The least intrusive is documents, which can be consulted and analyzed. This has a lot to do with cost. Non-intrusive sources are usually less expensive (although that is conditioned by how you mine the source.)
Fortunately, for international programs, results are often seen in terms of the content of documents. At the international level, this can include resolutions and decisions of intergovernmental bodies. At the national level, it can be reflected in laws. Research work can be cited in professional journals. The existence of the Internet and powerful search engines like Google, have made it much easier to do these kinds of searches, as do services like Lexus-Nexus.
The third rule is that if your data source is national officials, you have to be very careful in determining who and what to ask. Many traditional UN evaluations include sending a questionnaire "to governments". While it can be assumed that the responses reflect the full power of sovereign states, in practice these questionnaires are filled out by a few, usually junior, officials and are only as good as their own sources of information. When we get to surveys, we will see how we can get around this problem.
As you can see from the IAEA Guide (which I like, since I largely wrote it), there are only limited number of collection methods in use. They include content analysis of documents, interviews (either in groups – focus groups – or individual), questionnaires sent out to either a sample of respondents, to all members of a "universe" or received as feedback, and various kinds of regular data series that are reasonably quickly published or accessible (hit counters, household surveys, trade flow statistics, tax and revenue data). Within each of these there are techniques to enable you to reduce cost and deal with potential bias.
We will look at the first of these, content analysis, in the session on 20 February.
[1] Abraham Kaplan, The Conduct of Inquiry: Methodology for Behavioral Science, San Francisco, Chandler, 1964, p. 11.