Analysis

К оглавлению
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 
136 137 138 139 140 141 142 143 144 145 146 

We have implemented our formalization of the technique of meta-matrix text analysis in

a network text analysis tool called AutoMap (Diesner & Carley, 2004). AutoMap is a

software application that helps analysts to extract, analyze, represent, and compare

mental models from texts. The tool performs computer-supported content analysis, map

analysis, meta-matrix text analysis, and sub-matrix text analysis. The latter two types of

analysis we discuss in this section. The more classic content analysis and map analysis

were previously described in Carley and Palmquist (1992) and Carley (1997a).

Steps 1 to 3 in meta-matrix text analysis may involve a thesaurus. A thesaurus in general

is a two-columned collection that associates text-level concepts with higher-level

concepts (Burkart, 1997; Klein 1997). The text-level concepts represent the content of a

data set, and the higher-level concepts represent the text-level concepts in a generalized

way. Thesauri are created by reading a set of texts, using pre-defined material, and/or

deriving pairs of concepts and higher-level concepts from theory (Burkart 1997; Kelle

1997; Klein, 1997; Zuell & Alexa 2001). The terminology of a thesaurus depends on the

content and the subject of the data set.

Thesauri play a key role in any AutoMap coding. AutoMap in performing content

analysis or map analysis can utilize a generalization thesaurus. In this thesaurus, the

analyst can reclassify words in relation to other words on the basis of shared meaning,

spelling errors, aliases, etc. Further, phrases that refer to a single ideational kernel — such

as “Weapons of Mass Destruction” — can be reclassified as a single concept — WMD.

When texts are pre-processed by AutoMap, using a generalization thesaurus, idiosyncratic

differences in writing style, multi-word-concepts and wording errors can be

eliminated. This generalization process facilitates identifying true conceptual similarities

and differences across texts. The creation of the generalization thesauri is step 1, concept

identification, in the coding procedure.

When AutoMap is used to perform a meta-matrix text analysis, a second type of thesaurus

can also be employed. This second thesaurus, the meta-matrix thesaurus, contains the

translation of concepts into the entity classes in the meta-matrix. When texts are

processed with a meta-matrix thesaurus, the organizational structures described in the

text can be extracted. Since one concept might be indicative of several meta-matrix entity

classes, a meta-matrix thesaurus can consist of more than two columns. For example, the

concept military falls into two entity classes — Organization and Resource. The specific

entity and relation classes used for the meta-matrix approach in this chapter are presented

in Table 3.

Note that in applying the meta-matrix conceptualization to terrorist groups, we have

extended the original conceptualization (see Table 2) by treating Knowledge and

Resource as separate entities (Carley & Reminga, 2004) and by adding Location as a

primary entity. Further, we generalized people into Agent to reflect the fact that often

names are not known and people are identified by actions such as “victim killed.” Since

this is an extensible ontology, these changes pose no harm to the underlying theory. We

did this extension as knowledge, resources, and location are meaningfully unique entities

for research in the area of covert networks. By extending the meta-matrix as shown in

Table 3, we have done step 2, entity identification, of the coding procedure.

The analyst can use none, one or both types of thesauri, generalization and meta-matrix,

to analyze texts with AutoMap. In general, the analyst may find it useful to first create

a word list, then a generalization thesaurus, then a meta-matrix thesaurus. Building these

thesauri can be done iteratively as new texts are added to the available set, as AutoMap

minimizes the cost of coding and recoding texts. The larger the corpus of texts being

analyzed, the more time is saved.

When using the meta-matrix thesaurus, AutoMap allows the analyst to associate a textlevel

concepts or higher-order concepts from the generalization thesauri with one,

multiple or no entity classes, and to add user-defined entity classes. This process of

associating concepts with entity classes is step 3, concept classification, in the coding

procedure.

When AutoMap applies the meta-matrix thesaurus, it searches the text set for the

concepts denoted in the meta-matrix thesaurus and translates matches into the corresponding

meta-matrix entity classes as specified by the analyst. When performing metamatrix

text analysis, AutoMap links the meta-matrix entity classes in the texts that were

pre-processed with a meta-matrix thesaurus into statements, and builds one concept

network per text that is cross-coded in terms of the meta-matrix, thus resulting also in a

meta-matrix. This automated network creation is step 4, perform map analysis, in the

coding procedure.

Table 3. Meta-matrix model formalization used in AutoMap — entity classes and

relation classes

Meta-Matrix

Entities

Agent Knowledge

Resources Tasks/

Event

Organizations Location

Agent Social

network

Knowledge

network

Capabilities

network

Assignment

network

Membership

network

Agent

location

network

Knowledge Information

network

Training

network

Knowledge

requirement

network

Organizational

knowledge

network

Knowledge

location

network

Resources Resource

network

Resource

requirement

Network

Organizational

Capability

network

Resource

location

network

Tasks/ Events Precedence

network

Organizational

assignment

network

Task/Event

location

network

Organizations Interorganizational

network

Organizatio

nal location

network

Location Proximity

network

The resulting networks can be analyzed at varying levels during step 5, graph and analyze

data. For example, the analyst might be interested in seeing and analyzing the networks

of the text-level concepts that represent all or only some of the meta-matrix categories.

We implemented this functionality as sub-matrix text analysis. Each cell in Table 3

denotes a sub-matrix. Sub-matrix analysis distills one or several sub-networks from the

meta-matrix and presents text-level concepts in the chosen entity classes. This routine

enables a more thorough analysis of particular sections of the meta-matrix, such as

Agent-by-Agent networks (social networks), or Organization-by-Resource networks

(organizational capability networks). When performing sub-matrix text analysis, AutoMap

links the concepts representing the meta-matrix entity classes selected by the analyst

into networks.

With the implementation of meta-matrix text analysis and sub-matrix text analysis in

AutoMap, we hope to contribute to the investigation of the network structure of social

and organizational systems that are represented in texts. With these techniques we aim

to provide a reasonable extension of the base technology of computer-supported

network text analysis and a practical implementation of the meta-matrix model. In the next

section we demonstrate how these novel techniques can help analysts to detect the

meaning and underlying social structure inherent in textual data in order to answer related

research questions.

Illustrative Example of the Application

of Network Text Analysis

To demonstrate the meta-matrix approach to NTA we use a small sample data set of 18

texts. Each text will be coded using the proposed approach and the AutoMap software.

Data

This text sample is a sub-sample drawn from a larger text collection that consists of 191

texts collected at CASOS about six major terrorist groups that operate in the West Bank.

These groups are the Al Aksa Martyrs Brigades, Al Fatah, Al Qaeda, Hamas, Hezbollah,

and the Islamic Jihad. We gathered the texts from LexisNexis Academia via exact matching

Boolean keyword search for each of the groups. The media that we searched with

LexisNexis were The Economist, The Washington Post, and The New York Times. The time

frame of our data set ranges from articles published in 2000 to 2003. We sorted the

retrieved texts by relevance, screened the top most texts, and selected up to three texts

per organization and year for our dataset. The sub-sample from this corpus that we work

with in this chapter consists of one text per terror group from each medium from 2003

(Table 4). This sub sample of 18 texts contains 3,035 unique concepts and 13,141 total

concepts. The number of unique concepts considers each concept only once, whereas

the number of total concepts also counts repetitions of concepts. The reader should keep

in mind that the small size of this data set and the fact that the texts were chosen across

groups rather than within groups is likely to lead to more overall concepts and fewer

Revealing Social Structure from Texts 91

relations among them. A discussion of Hamas and Yassin may be unlikely to refer to a

discussion about al Qaeda and bin Laden; whereas, it is more likely to refer to Rantissi.

This text set is a suitable illustrative example because the detection of covert networks

such as terrorist groups is one application domain for meta-matrix analysis (Carley,

Dombrowski, Tsvetovat, Reminga & Kamneva, 2003). Since texts are a widely used source

of information about terrorist groups, a technique for pulling networks classified

according to the meta-matrix scheme from this type of data is needed. The results of this

sample study are neither a valid indication of these terrorist groups nor a formal validation

of the method of meta-matrix text analysis, but show what information the analyst can gain

from this novel technique.