Basic Requirements for Direct Elicitation Procedures

К оглавлению
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 
136 137 138 139 140 141 142 143 144 145 146 

In contexts where the intended application of causal mapping is to access the relatively

enduring features of actors’ perceptions and beliefs, as a basic minimum requirement, the

procedure(s) employed should exhibit acceptable test-retest reliability and construct

validity. In this context, test-retest reliability refers to the degree of consistency in the

content and structure of participants’ causal maps assessed on multiple occasions. To

the extent that similar maps emerge from one occasion to the next, they are said to possess

test-retest reliability. Reliability statistics can be computed in a variety of ways, ranging

from basic frequency analyses of map content (e.g., the percentage of variables incorporated

in the maps on multiple occasions) to more sophisticated comparisons of

structural indices, expressed in the form of a reliability coefficient (e.g., the Pearson

product-moment or Spearman rank-order correlation), ranging between zero (no reliability)

and unity (perfect reliability). In general, reliability coefficients should exceed 0.70

as a basic minimum indication of acceptable reliability (Nunnally, 1978).

Construct validity in this context concerns the extent to which indices of map structure

and content correlate with one another in ways that are in line with a priori theoretical

predictions. Sound theorizing should enable strong predictions concerning which

particular indices will be significantly correlated with one another and in what direction(s).

The greater the number of significant positive and negative relationships (and nonsignificant

relationships) predicted on the basis of theory, in advance of measurement,

the greater the construct validity of the mapping indices. Ultimately, it is also desirable

that causal maps should exhibit acceptable levels of criterion-related validity, i.e., indices

of map structure and content should correlate significantly in ways that are theoretically

meaningful with a range of exogenous variables (i.e., variables measured outside the

cognitive mapping exercise) including pertinent individual differences and group process

and outcome variables.

In cases where the intended application is largely practical in nature, for example in the

context of interventions designed to facilitate strategy debates among the top management

team (TMT) with a view to challenging the assumptions of key decision makers (e.g.,

Eden & Ackermann, 1998a), arguably, it is still the case that the mapping procedures so

employed should exhibit acceptable test-retest reliability, albeit over relatively shorter

time-periods. If the aim of such interventions is to act as a catalyst for cognitive change,

ultimately we need to ensure that the changes resulting from such applications are in fact

non-trivial, deeper-level changes concerning actors’ enduring thoughts (cf., Daniels, de

Chernatony & Johnson, 1995; Hodgkinson, Maule & Bown, 2004; Stubbart & Ramaprasad,


Basic Requirements for the Construction of Maps from

Indirect Sources

As pointed out by Jenkins (1998), there is a lack of consistency in the literature overall

regarding how coding issues are dealt with and reported. As a basic minimum, the coding

schemes employed should meet the dual requirements of acceptable test-retest and intercoder

reliability. In this context, test-retest reliability means that repeated coding

exercises would yield more or less identical results (technically known as code-recode

reliability) while inter-coder reliability requires that multiple coders reach acceptable

levels of agreement (Miles & Huberman, 1994).

The degree of code-recode reliability ultimately has a bearing on the attainment or

otherwise of acceptable levels of inter-coder reliability. Hence, as noted by Huff and

Fletcher (1990) both of these forms of reliability are necessary prerequisites for a coding

scheme to be deemed technically adequate. It is heartening, therefore, that the majority

of researchers utilizing documentary and other indirect sources routinely take steps to

ensure that their coding schemes exhibit acceptable inter-coder reliability. Typically,

however, this merely takes the form of an analysis of the number of instances where two

or more coders are in basic agreement with one another (i.e., percentage agreement) with

regard to the assignment of the various elements of data to each of the predetermined

categories within the coding scheme, which parts of the various assertions coded contain

the causal concept and the sign of the causal assertion (for representative examples, see

Barr, 1998; Barr & Huff, 1997; Calori et al., 1992, 1994; Jenkins & Johnson, 1997a, 1997b).

As discussed in the Appendix, Laukkanen (1994, 1998) has devised a computerized

system for the analysis of causal maps derived from documentary sources, including

interview transcripts, that seeks to simplify data in the form of “standardized natural

language” in order to facilitate subsequent comparative analyses. Similarly, Nelson et

al. (2000a) have devised procedures for standardizing the variables elicited from individual

participants in order to undertake such comparisons. Laukkenan (1998) argues

that some form of validation process should underpin the standardization of data. In this

connection he advocates the involvement of experienced research colleagues and other

knowledgeable individuals to independently assess the quality of the data coding. While

the process of independent data coding can be extremely cumbersome and time consuming,

multiple trained assessors can be employed, which alleviates the burden to a certain

extent, providing of course that the assessors are able to do so reliably, as discussed

above. In line with the “good practice guidelines” devised by Huff and Fletcher (1990),

Laukkenan also suggests feeding back the findings to individual participants, in an

attempt to validate the coding process. In keeping with this prescription, Nelson et al.

(2000a) went back to their original expert respondents to validate the maps encoded by

the research team.

Despite the popularity of participant validation as an approach to trying to safeguard

factual and interpretive accuracy, there are some non-trivial problems and drawbacks

associated with it, not least the fact that changes can occur very rapidly between that

which is thought at the time a decision occurs and how those experiences come to be

recounted subsequently. A recent study by Hodgkinson et al. (2004) illustrates just how

marked the variations can be that emerge as a function of the type of elicitation procedure

employed. Two direct elicitation procedures, a freehand approach and the pairwise

evaluation of causal relations, were compared systematically. In keeping with their

hypotheses, based largely on work conducted by experimental cognitive psychologists

in the field of human memory, Hodgkinson and his colleagues found that the pairwise

technique yielded significantly richer maps, but participants found the task more

difficult, less engaging and less representative than the freehand approach. Hodgkinson

et al. attributed these findings to key differences in the nature of the basic human memory

mechanisms underpinning the two tasks. When one considers that the causal maps

compared in this study were gathered very soon after the point of decision, using direct

forms of elicitation procedure, it becomes clear that techniques relying on participant

validation of researcher-derived coding schemes are more — not less — likely to

introduce further sources of latent error, as participants reconstruct their thoughts not

as they actually occurred but very much how they would like them to have been. In the

words of J. Sparrow (1998, p.48):

“The way in which a person recollects an event changes over time, depending

on the audience and circumstance as well as any reframing in the light of


Given the politically sensitive nature of the organizational issues typically investigated

using causal mapping techniques, it becomes clear that techniques requiring negotiation

of the findings should be used sparingly, if the purpose is to try and capture in a manner

that represents as accurately as possible the belief systems of actors at the moment of

decision. Participant validation methods administered distally in time from the moment

of decision are limited by virtue of their failure to control for the dynamic capabilities of

the human memory system to distort reality, to say nothing of the demand characteristics

introduced by the researcher during this subsequent process, however unwittingly (cf.,

Hodgkinson, 1997b, 2002).

In the final analysis, participant (and expert panel) validation does not go nearly far

enough as a basis for ascertaining the validity of causal maps elicited by indirect

procedures. As in the case of maps elicited using direct procedures, it is essential that

the construct validity of structure and content indices are established and, wherever

possible, researchers should attempt to demonstrate the criterion-related validity of

maps derived in this way by correlating the various structural and content indices with

key individual differences and/or process and/or outcome variables. Unfortunately,

however, it has been rare indeed for researchers to take these vital steps.

In sum, when assessed by the psychometric standards outlined above, basic requirements

in virtually any area of applied psychology, it is clear that the procedures adopted

Table 3. Minimum acceptable psychometric properties required of cause maps elicited

by direct and indirect procedures





Overall Recommendations



Test-retest, i.e., an acceptable degree of

temporal consistency in map content and

structure when assessed on multiple


Inter-coder, i.e., an acceptable level of

agreement among multiple coders

concerning the structure and content of

coded maps

Test-retest, i.e., more or less identical

results over repeated coding exercises

All associated reliability coefficients should

exceed 0.7.

Where percentage agreement statistics are used

at minimum agreement should be somewhere

in the region of 70-80 percent.



Construct validity

Criterion-related validity

Construct validity

Criterion-related validity

Participant validation

Establish the nature and extent of theoretically

meaningful correlations among indices of map

content and structure, having specified in

advance which relationships will be significant

(convergent validity) and those that won’t

(divergent validity). Preferably, specify the

direction of those relationships expected to be

statistically significant, thereby enabling 1-

tailed tests.

Establish the degree of correlation of various

structural and content indices with key

exogenous individual differences and process

and/or outcome variables.

Participant validation, even when carried out as

close as possible to the point in time when the

cause maps were elicited or constructed, at

best, should be regarded as nothing more than a

basic first step in the validation process.

by many authors of published studies involving causal cognitive mapping fall a long way

short of the mark, with little or no attention having been given to reliability and validity

issues in the strict statistical sense of these terms. Indeed, several commentators, (e.g.,

Eden & Ackermann, 1998b), are openly hostile to the suggestion that there is a need for

greater rigor in this domain. This is understandable, given the many practical difficulties

in meeting these requirements, not the least of which is the laboriousness involved,

which should not be underestimated, particularly in cases involving large numbers of

data sources. Nevertheless, if significant inroads are to be made in the advancement of

new and established substantive domains of application, including, but by no means

restricted to, the IS and IT fields, it is vital that the standards of scientific rigor advocated

in this chapter be adopted as a matter of course.

We have covered much territory in this section. In order to provide a clear sense of

direction for the would-be user of causal mapping techniques, a summary of the main

psychometric issues that need to be considered when making particular methodological

choices and our recommended solutions to the problems identified is presented in Table 3.