Network Analysis and Health and Risk Messaging
Summary and Keywords
A network is a collection of nodes that are somehow connected to each other. Typically, the pattern of ties among nodes is of central interest to scholars in communication and the allied social sciences, with particular salience for health and risk communication. Network analyses of patterns of communication began in the 1940s in work done at the Massachusetts Institute of Technology (MIT) and was popularized in the early 1980s in the communication discipline with the work of Everett Rogers, Larry Kincaid, and others. Networks are measured by using self-report or observational procedures to determine the presence and/or strength among nodes in a given structure. If there is a tie or relation between a pair of nodes, it is said that the two are adjacent to one another or that the two nodes are neighbors, and the neighborhood of a node consists of all nodes adjacent to this first node. Several measures describe networks, including measures of position and measures of the entire network. Positional measures consider the position of any given node in relation to others in the network, and centrality is a popular measure to account for one’s level of influence in a network. Density is an overall network measure of level of activity among network pairs. Finally, network measures allow researchers to compare dependent and independent networks. Network analysis represents one of the more powerful and elegant procedures for measuring small-group, organizational, and international communication patterns among nodes or actors of interest in health and risk communication.
Computer networks, terrorist networks, networks of health care providers, telecommunication networks, networks of Twitter followers, and Facebook friends—it seems that everywhere one looks, one finds networks. Fundamentally, a network consists of a collection of nodes and the connections among them. Organizations can be connected by their business relations, nations can be connected by their imports and exports, nurses are connected by their hospital affiliations, and research articles are connected by their citations. Often the nodes are individuals. Each of us is enmeshed in a multitude of networks, connected to others by cell phone, culture, friendship, disease, email, kinship, social status, Facebook, shared experience, work, love, sex, and numerous other relational ties. Nodes can also be connected by events, such as when node A communicates with node B at time t. Network analysis is a collection of methods for analyzing these relational structures. The emphasis is not on the attributes of the nodes, but rather on the pattern of their connections. Nodes are defined not so much by their individual characteristics as they are by how they are related to others. Focused as it is on the nature of the connections rather than the nature of the nodes, network analysis is a perspective that cuts across the social, behavioral, and natural sciences. It is one of the few truly interdisciplinary paradigms available today.
This article is an introduction to network analysis, a method often used in health and risk communication (Valente, 1995, 2010). First, a brief history of network analysis will be presented, followed by basic concepts and definitions. Then, standard data collection approaches will be discussed. The text finishes with a review of analysis techniques and measures.
Although social networks are as old as humankind, network analysis as a modern branch of study has its roots in the Königsberg bridge problem, which was solved in 1735 by Leonhard Euler. Königsberg was a Prussian city that straddled both banks of the River Pregel and included two islands. These four land masses (the two islands and both riverbanks) were connected by a series of seven bridges. The Königsberg bridge problem was to take a tour of the city, crossing each bridge exactly once. Euler reasoned that the size and physical location of the land masses did not matter—what mattered was which land masses were connected by which bridges. By taking the land masses of Königsberg to be nodes of a network and the bridges to be connections between nodes, Euler turned the bridge problem into a network problem, and he was able to demonstrate that no such tour through the city was possible. (See Alexanderson, 2006, for a summary.)
Euler’s work was the beginning of the branch of mathematics known as graph theory. The common use of the term graph is what most people remember from high school algebra: a Cartesian coordinate system, along with a collection of points satisfying an equation. The term has a second meaning, however—namely, a collection of nodes, along with the connections between them (though graph theorists use the terms vertices and edges, rather than nodes and connections). In fact, some people use the terms graph and network interchangeably, switching freely between them.
Mathematicians, however, are not social scientists, and as a rule, they are interested in only the mathematical properties of networks. It wasn’t until 1934 that networks entered into the social science literature. In that year, sociologist Jacob Moreno published Who Shall Survive?—a work that includes some of the very first social network depictions (which he called sociograms). He used network methods to model communication across children’s development and to explain a rash of runaways at a girls’ training school.
The next major advance was in the late 1940s, when Alex Bavelas and his students at MIT (Bavelas, 1950; Leavitt, 1951; Shaw, 1954) studied small communication networks. They restricted which communication channels were open to the communicators, as depicted in Figure 1. The four networks shown are called, for obvious reasons, the star, the Y, the chain, and the circle. The MIT studies showed that the pattern of communication ties affects task performance and satisfaction. In particular, groups in the more centralized structures performed better, but were less satisfied. These studies examined the effects of network structures in small-group work. These initial communication network studies spawned literally hundreds of similar studies.
The 1960s saw two major developments. First, mathematician Frank Harary and his colleagues (Harary, Norman, & Cartwright, 1965) published Structural Models: An Introduction to the Theory of Directed Graphs, a book that introduced social scientists to graph theory. After the publication of this book, much terminology from graph theory and many of the results found their way into social scientific research. Second, Stanley Milgram conducted his small-world research (Milgram, 1967; Travers & Milgram, 1969). Milgram selected random people from Boston and Nebraska and asked them to deliver a document to a target person, also in Boston, by forwarding the document to a friend, who would then forward it to another friend, and so forth, until the document reached the target through network contacts. Amazingly, whether across town or across the country, it took approximately six steps for the document to reach the target. This research is the genesis of the now popular phrase “six degrees of separation.”
By the 1970s, network analysis was firmly established in the social sciences. Mark Granovetter (1973) published “On the Strength of Weak Ties,” in which he argued that weak ties provide an important bridging function and can bring information from distant places in a network. Barry Wellman founded the International Network for Social Network Analysis. Lin Freeman launched the journal Social Networks, which is, today, the premier journal in the field. Freeman also published a seminal paper about centrality, a topic to be taken up in a later section of this article.
In 1981, Rogers and Kincaid popularized network analysis in the communication discipline with the publication of Communication Networks: Toward a New Paradigm for Research (Rogers & Kincaid, 1981). In the decades that followed, much network analysis activity has taken place in the discipline, covering topics as diverse as political communication (Park & Thelwall, 2008), computer-mediated communication (Cho & Lee, 2008), the globalization of telecommunications (Lee, Monge, Bar, & Matei, 2007), and health communication (Valente, 2010).
A network is a collection of nodes that are somehow connected to each other. The nodes are sometimes called actors, points, or vertices, and the connections themselves are called links, ties, or edges. (Because network analysis is so interdisciplinary, an unfortunate side effect is that different researchers use different terms for the same things.) For example, a researcher interested in the spread of risky behaviors such as smoking or drug use might study peer pressure by constructing a network of adolescents connected by friendship ties. A hypothetical friendship network is depicted in Figure 2. Nodes are labelled with the names of 10 different individuals, and the ties are represented by the lines connecting pairs of nodes. For example, as can be seen in the figure, Ann and Doug are friends, but Ann and Ed are not. Ann and Ed do, however, have a mutual friend—Carol. Thus, it could be said that Ann and Ed are indirectly related through Carol, and even more indirectly through Bob and Fay.
Usually, it is the pattern of ties—who is connected to whom—that is important and of interest to scholars, not the network’s spatial arrangement on the page. In Figure 2, for example, Ian and Jan could have been placed on the left instead of the right. Ann could be located at the bottom of the page. Fay could have been moved closer to Bob so that the tie between them could be depicted with a shorter line. Think of the nodes as buttons and the ties as rubber bands connecting the buttons. Buttons can be moved and rubber bands stretched, but so long as no tie gets broken and no tie gets added, the resulting network is the same in terms of who is connected to whom. With large networks, the researcher often considers very carefully how to arrange the network so that details are not obscured. But this matter is one of visualization only; the spatial location of the nodes usually has no substantive significance.
If there is a tie between a pair of nodes, then the nodes are said to be adjacent, or that they are neighbors, and the neighborhood of a node consists of all nodes adjacent to this first node. In Figure 2, Carol’s neighborhood consists of Ann, Bob, Doug, and Ed, whereas Holly’s neighborhood contains only Fay and Greg. The size of a node’s neighborhood is called the degree of the node, so Carol is of degree 4, and Holly is of degree 2. If the nodes are numbered from 1 to N, then one can construct the N × N adjacency matrix by placing a 1 in the (ij)th cell of the matrix if nodes i and j are adjacent, and a 0 if they are not. The row (or column) sums give the degrees of the vertices. (Sometimes there are weights associated with the ties indicating tie strength. If such is the case, then the weights are used rather than 1s and 0s.) When analyzing network data (for example, with computer software), it is usually more convenient to work with the adjacency matrix than with the network itself.
Often, it is useful to think of traffic flowing through the network rather than considering discrete pairs—for example, messages flowing through a communication network or a sexually transmitted disease (STD) propagating through a sexual contact network. Although there are several different types of routes through which traffic may flow (see, e.g., Borgatti, 2005), the most basic is the path. A path is a sequence of distinct nodes, each of which is adjacent to the preceding node in the sequence (except for the first node, of course). Note that the nodes must be distinct—that is, nodes may not be repeated. The length of the path is the number of links traversed, or, alternatively, the number of “steps” that it takes to go from the first node to the last. For example, the path Holly, Greg, Fay, Ed, is of length 3. (If there is a path of length 1 between a pair of nodes, it means that the nodes are adjacent.) As an illustration, a patient may see an allergist due to a referral from his or her primary care physician, and the length of the path from the patient to the allergist would be 2.
It is also possible that two nodes are joined by more than one path. Consider Ann and Ed. There are a number of paths from Ann to Ed: Ann, Doug, Carol, Ed; Ann, Carol, Ed; Ann, Doug, Bob, Fay, Ed; and several others. Note that the paths are of varying lengths. Examining all the paths between Ann and Ed, we see that one path—Ann, Carol, Ed—is of length two, and all the other paths are longer. Such a shortest path is called a geodesic and serves as the basis of network distance. More formally, the distance between any pair of nodes is the length of any geodesic connecting them; hence, the distance between Ann and Ed is 2. The distance between Ann and Fay is also 2, but the distance between Ann and Greg is 3. The distance between a pair of nodes is the shortest number of steps that it will take to reach one from the other. It could be said that a node is more central (i.e., has a larger closeness centrality) if it has shorter paths to all others in the network. The concept of centrality will be brought up shortly.
What can we say about the distance between Ann and Ian? There is no path from Ann to Ian. As far as Ann is concerned, Ian is unreachable (and vice versa), which illustrates an important network property (namely, connectedness). A network is said to be connected if there is a path between every pair of nodes; otherwise, the network is disconnected. A disconnected network is in two or more pieces, each of which is called a component. The friendship network depicted in Figure 2 is disconnected, with two components—the eight-person component of nodes Ann through Holly, and the two-person component consisting of the dyad Ian and Jan. Nodes in one component are unreachable by nodes of any other component.
The networks considered thus far have all been undirected, which means that the ties between nodes are symmetric. Sometimes the ties are directional, and a network with directional ties is called a directed network. Figure 3 shows the same nodes as in Figure 1, but now some of the ties have been made directional, and this fact is indicated by arrowheads placed on the lines representing ties. Perhaps Figure 3 is not a friendship network, but rather an email network where we place a tie from node i to node j if i has ever sent j an email. In this case, Figure 3 shows that Ann has exchanged emails with Bob, but although Ann has sent email to Doug, Doug has not reciprocated in sending emails to Ann.
In directed networks, because directionality has been introduced, one must distinguish between indegree and outdegree. The indegree of a node is the number of arrows pointing to the node; the outdegree is the number of arrows pointing from the node. Thus, Ian has an outdegree of 1 and an indegree of 0, while Fay has an indegree of 3 and an outdegree of 2. In the adjacency matrix of a directed network, a 1 is placed in the (ij)th cell if there is a tie from i to j, and a 0 is placed otherwise. Defined in this fashion, the row sums of the matrix give the outdegrees, and the column sums give the indegrees.
It is also important to mention two-mode networks. A two-mode network is one in which the nodes can be partitioned into two groups that are meaningful to the researcher (for example, men and women, or infected and uninfected patients). Often, a two-mode network has the property that ties occur only between nodes of different groups, not between nodes in the same group. In a heterosexual dating network, for example, ties are between men and women, but never between two men or between two women. Similarly, in a network of sexual disease transmission, there would be ties from the infected to the uninfected, but as the disease is not passed between uninfected people or between infected people, there would be no ties between members of the same group. (In fact, this example network is not only two-mode, it is also directed because an uninfected person cannot infect an infected person.)
Measuring the Network
Networks require different data collection techniques than traditional social science research due to their focus on relations between pairs of nodes or among two or more nodes. The level of analysis dictates how a given social network ought to be measured. The analysis can be at the level of the individual node, at the pair of nodes, for a subgroup of nodes, or for the entire network. Very often, the level of analysis for which network data are specified is referred to as the modeling unit. It may also be the case that after data are collected, analyses will be undertaken at various levels depending on the interests of the investigator. We now turn our attention to data collection methods, followed by a description of how network analyses are conducted.
Data Collection Methods
There are primarily two methods to collect network data, and sometimes more than one method is used in an attempt to measure a network. The two common methods of collecting network data are self-report and observation. We will discuss each method individually, and then we will provide examples of networks that rely upon both methods within the same analysis.
The self-report method of data collection is the most commonly used one for gathering network data, and often the goal of self-report is to learn the relation between nodes when these data are not available through other means. In a network of risky sexual behavior, for example, where a link is placed between two nodes if the people represented by those nodes had engaged in risky sexual contact, it would not be possible to observe these behaviors, and the only practical possibility of collecting such data is to ask the participants. Researchers typically use questionnaires that require people to indicate the presence or strength of relationships. How network relations are indicated can be done in a number of ways, including rosters, free-recall, or a ratings method.
The roster method provides participants with a list of all others in the network, and each participant indicates network relations with each other node or the degree of relation with each node. In a smoking cessation group, for example, the researcher might give everyone in the group a membership list and ask them how often they received support from each of the members on the list. Access to the full network roster or practical constraints (e.g., time and size of network) may prohibit use of the roster method of data collection. One advantage of the roster method is reduced bias due to lack of recall.
A second data collection method that uses questionnaires requires members to recall or list relations in a network. In the smoking cessation example, the researcher could have simply asked each member who had provided support, without giving an entire roster to the participants. The advantage of free recall is the researcher may not fully understand the entire network of interest. The list of nodes provided through free recall questionnaires might be the only method available of defining the network. Sometimes labelled the ego-centered network, the recall method can provide a list of alters to an ego and perhaps reveal the relations among alters in the network. For example, a random sample of communication majors might be surveyed and asked to list up to five other communication majors that they consider leaders or influential in the program. The network, then, amounts to the unique individuals listed in the free recall survey, and one is more prestigious or central if one is listed more often.
A less common self-report method of data collection is the use of interviews to gather unit relations. Interview data may be collected face to face or over the telephone, and it may be the case that the level of nuance required to measure or understand a network adequately requires interview methods. We may wish to learn the close network links to a focal node, and questionnaire methods may be too impersonal or inadequate to ascertain how certain alters know the focal node.
A common method of data collection is to observe interactions or communication between nodes. There are (at least) two methods to observe relations. One way is to do so in actual time, where the researcher observes interaction and records the presence of a relationship or other features of the interaction. The researcher could sit in her cubicle at work and record over time what pairs or groups of colleagues leave together for lunch in order to determine the friendship network. A second method is to use archival data or records of relations among nodes. For example, cell phone records might indicate who are a person’s best friends or closest relations, by checking the number of minutes of texts or calls between the person and his or her contacts.
There are many occasions where a network study benefits from using multiple data collection methods. For example, one might use archival records to get the units of a network, and then sample nodes in the network through questionnaire items or interviews. By contrast, one might start with a small sample of nodesand, through snowball sampling, further learn the size and nature of the network. After doing this, one might collect archival data on interactions or communication patterns between or among nodes once the network is established.
A hypothetical example might bring the two primary data collection methods to the fore. Consider a network analysis after a terrorist attack in a given city that seeks to link individuals and social media exposure. After the attack, an investigator may use survey methods and ask participants to list the primary media sources that they used to receive news about the attack. Alternatively, one could use observational methods and track the relationships between certain media websites (e.g., CNN, Fox) and Internet Protocol (IP) addresses to identify the areas of the city or region that use certain media sites as opposed to others.
Analyses and Techniques
Once the network has been defined using the approaches of the previous section, it is time to analyze the network. This section discusses analyses at both the positional and network level.
When one examines the networks of Figure 1, it is clear that not only do the networks differ from one another, but within each network, the nodes themselves sometimes differ from one another. The center node of the star (node 1 in the leftmost network of the figure) is certainly quite different from the other four nodes in that network. Similarly, there seems to be something special about the node at the crook of the Y or the center point of the chain (node 1 in each of these cases as well). The nodes of the circle, on the other hand, are all structurally identical.
The term used to talk about these differences is centrality. Centrality is important because it can influence such things as leadership and satisfaction (Bavelas, 1950; Leavitt, 1951). The center node of the star is clearly more central (by definition) than the other four nodes. Likewise, the node at the crook of the Y has the highest centrality in that network. The nodes of the circle are all equally central. Bavelas (1950) argued for a closeness-based measure of centrality. The center node of the star is maximally close to the other nodes in the network, being only one step away from each. The node at the crook of the Y is similar, being one step away from three of the nodes and two steps away from the fourth.
The problem with a closeness-based interpretation is that the center node of the star possesses other properties that distinguish it as well. For example, it has the largest neighborhood. It is also between every other pair of nodes. If any of the peripheral nodes of the star wanted to pass a message to one of the other peripheral nodes, the message would have to go through the center node. Freeman (1979) recognized these different properties and argued that there were multiple reasons that a node might be considered central. He advocated three different measures of centrality, depending on whether one was interested in communication activity (in the sense of having many communication partners), communication control (in the sense of being between many pairs of nodes), or communication efficiency (being able to disseminate a message quickly throughout the network, which would be facilitated by being close to other nodes in the network). Each of these notions will be discussed in turn.
The first property, activity, is indexed by the degree centrality of a node. The degree centrality is, logically enough, the ordinary degree of a node. A node with high degree has many communication partners, whereas a node with low degree has few. Using the ordinary degree of a node as a measure of activity makes sense when comparing nodes within a single network and when comparing nodes in different, but similarly sized, networks. In networks of drastically different sizes, however, ordinary degree can be biased because in a large network, there are many more opportunities to establish ties. Consequently, many researchers normalize degree centrality by dividing the ordinary degree by N – 1. (For instance, in a communication network with N nodes, there are N – 1 possible communication partners.) Doing so puts the measure on a 0–1 scale—that is, the measure is 1 when a node is linked to every other node in the network, and it is 0 when the node is an isolate, with no neighbors at all.
The extent to which a node can control traffic through the network is indexed by betweenness centrality. Given two distinct nodes i and j, a third node k is said to be between i and j if k is on a geodesic from i to j. (To be on a geodesic means to be an interior node of the geodesic.) If there is but a single geodesic between i and j, then k can exert control by shutting off the flow of traffic from i to j, forcing a longer, nongeodesic route. If there are multiple geodesics between i and j, and k is on only some of them, then the control is partial—k can limit which geodesics are used but cannot force a nongeodesic path. Betweenness centrality, then, is an appropriately weighted count of the number of geodesics that a node is on.
To compute betweenness centrality, one considers every pair of nodes in the network and the geodesics connecting them. For each pair of nodes, if the node whose centrality is sought is on all of the geodesics, then the count is incremented by 1. If the node is on only some of the geodesics, then the count is incremented by the proportion of geodesics containing the node. For example, in Figure 2, Ed is on a geodesic between Carol and Fay, but there is a second geodesic linking Carol and Fay (namely, Carol, Bob, Fay). For this pair of nodes, then, Ed’s geodesic count would be incremented by 1/2 because he is on half of the geodesics from Carol to Fay. Because of the need to find all geodesics, betweenness centrality is never computed by hand except with the very simplest networks; one almost always uses network analysis software to compute betweenness centrality. As with degree centrality, network size makes a difference, and betweenness centrality can be placed on a 0–1 scale by dividing through by a constant that depends on network size. (For the formula, see Freeman, 1979.)
The final property, efficiency, is indexed by closeness centrality, which is obtained by summing the distances between the node in question and each of the other nodes in the network. For example, the center node of the star has a closeness centrality of 4, being one step removed from each of the other four nodes. A peripheral node in the star is one step away from the center, but two steps away from each of the three other peripheral nodes, giving a closeness score of 7.
One problem with defining closeness in this manner is that the scores “run the wrong way.” Because it is a sum of distances, the higher the score, the farther (rather than closer) the node and, thus, the lower the centrality. So long as one recognizes this fact, there is no problem. Some researchers, however, prefer to take the reciprocal of the above-defined measure, meaning that the center node of the star would have a closeness centrality of 1/4 = .25, and each of the peripheral nodes would have a closeness centrality of 1/7 = .14. This adjustment makes the scores “run the right way.” As before, one can adjust for network size by dividing through by a constant (see Freeman, 1979). Note, also, that closeness cannot be computed on disconnected networks.
Apart from these three centrality measures, another popular measure is eigenvector centrality (Bonacich, 1972), so called because it is computed based on the principal eigenvector of the adjacency matrix. Eigenvector centrality is a degree-based measure, but it weights a node’s neighbors according to how central they are. The result is a measure in which a node is of high centrality to the extent that it is adjacent to other nodes of high centrality. Which centrality measure one uses depends of the network under consideration and theoretical concerns. For example, in a network of risky sexual behavior, eigenvector centrality is the most appropriate measure. One is at risk of contracting an STD not only if one has many sexual contacts, but if one’s partners have many sexual contacts as well. On the other hand, in a social support network, simple degree may suffice.
The final positional measure that we will discuss here is not a centrality measure, but it does depend on a node’s position in the network. The clustering coefficient measures how connected a node’s neighbors are. If we look at each of the neighbors of a focal node, the clustering coefficient of that node is defined to be the number of links between the node’s neighbors divided by the number of links that the neighborhood could possibly have. A clustering coefficient of 1 means that the node’s neighbors are all linked to each other; a coefficient of 0 means that none of them are. (The formulas will be given in the next section, when network density is discussed.) Many social networks are “clumpy,” and thus exhibit high average clustering. In fact, a network with high clustering and small average path lengths is called a small-world network, in honor of Milgram’s classic research discussed above.
Overall Network Measures
Rather than the properties of nodes, one might also be interested in properties of the network as a whole. Two easy measures are network size and network density. Size is simply the number of nodes that a network has. Density is related to the number of links. In particular, density is defined as the number of links that a network has, divided by the number of links that it could possibly have. If N is the number of nodes and L is the number of links, then in an undirected network, density is given by the formula
The formula for a directed network is very similar:
(If we focus on a specific node and its neighborhood, and we take N to be the number of neighbors and L to be the number of links between pairs of neighbors, then we see that the density of this neighborhood is in fact the clustering coefficient mentioned in the previous section.)
One can also get a sense of how “stretched out” a connected network is by examining its diameter, which is the length of the longest geodesic in the network. If the diameter is small, then all nodes are reasonably close to one another; if the diameter is large, then there is at least one pair of nodes that is far apart. The diameters of the star, Y, chain, and circle are, respectively, 2, 3, 4, and 2. Clearly, the Y and the chain are more “stretched out” than the circle and the star.
Finally, there are whole-network analogues of nodal centrality, a property called centralization. A network is highly centralized to the extent that it has one or a few nodes high in centrality, with all the other nodes low in centrality. In Figure 1, the networks are lined up, left to right, from most centralized to least centralized. Taking the star and the circle as two ends of a continuum, it can be seen that the star has one node of high centrality and four nodes of low centrality; in the circle, on the other hand, all the nodes have the same centrality.
Freeman’s (1979) method for computing centralization is as follows. First, choose a measure of centrality and compute the centrality of each node in the network. Second, determine which node has the largest centrality. Third, create a difference score for each node by subtracting the node’s centrality score from the largest centrality score found here. Fourth, sum these differences and normalize to a 0–1 scale by dividing through by a constant that depends on the size of the network and the particular centrality measure used (see Freeman, 1979, for the formulas).
Using this procedure, a star (of any size) will have a centralization score of 1, and a network where all centralities are equal (such as the circle) will have a centralization score of 0. If a network has just a few nodes of high centrality, the sum of differences will be large, and the centralization score will tend toward 1; the more nearly equal the centralities are, the lower the sum of differences will be, and the measure will tend toward 0. By way of illustration, the Y and the chain in Figure 1 have degree centralizations of .58 and .17, respectively.
Related to degree centralization is the degree distribution, which is simply a listing of how many nodes have given degrees. One can think of a histogram with degree on the horizontal axis and frequency (number of nodes) on the vertical axis. Clearly, a network with high degree centralization will have a skewed degree distribution. Many observed large networks have a degree distribution that follows a power law, with frequency decreasing as a power function with degree. Such a distribution is still skewed, but has a fatter right tail than would be expected by chance. Such a network is called a scale-free network.
Very often, a researcher will have two different networks defined on the same set of nodes. For example, one might observe the ties among a set of nodes at Time 1 and Time 2 and ask whether the network has changed over time. Or, perhaps, in a social support network, we might ask whether the emotional support network differs from the financial support network. In such cases, the researcher is interested in whether the two networks are structurally similar or different.
How might such a question be answered? A reasonable first approach would be to take each dyad in each of the networks and record 1 or 0 according to whether or not the members of the dyad are adjacent (or, if the ties are weighted, then the weights would be used instead—whether weighted or not, these values can be taken directly from the adjacency matrix). By matching corresponding dyads from the two networks, one could simply compute a correlation coefficient to see how similar or different the networks are. A high correlation implies a large degree of structural similarity, whereas a low correlation implies dissimilarity.
But what is “high” and what is “low”? Usually one answers this question by computing an F or t statistic to determine whether or not the correlation is significantly different from 0. Such an approach is not appropriate with network data, however, because the observations are not independent, leading to biased significance tests.
An alternative is the Quadratic Assignment Procedure, or QAP (pronounced “kwap”), developed primarily by Hubert and Baker (1978). In this approach, one of the adjacency matrices is treated as fixed. The other adjacency matrix has its rows and columns randomly permuted, and the correlation is recalculated. If this permutation and recalculation step is done hundreds or thousands of times, a nonparametric sampling distribution of correlations is created, and testing the significance of the original correlation is a simple matter of comparing it to the tail of the simulated sampling distribution. Naturally, it is not feasible to do so by hand, but many network software packages will compute the correlation between networks and provide QAP significance tests.
Modeling Social Networks
Rather than reporting position or network measures, or comparing networks with QAP, a researcher might want to create a network model of the structural features of interest. Such a process is analogous to the standard variable-analytic tradition, where one has a dependent variable and several independent variables, and one estimates a best-fitting model using linear regression. The difference is that the parameter estimates in a network model, rather than indicating which independent variables influence the dependent variable, instead indicate which structural features occur more or less often than expected by chance.
The most popular network modeling approach uses exponential random graph models (ERGMs), and two excellent introductions to ERGMs are Robins, Pattison, Kalish, and Lusher (2007) and Shumate and Palazzolo (2010). In an ERGM, the basic random variable is the tie (either present or absent), and we think of the observed network as one instantiation from a probability distribution of networks. A number of structural features can be tested, such as density (based on the number of links), reciprocity (whether, if there is a tie from A to B, there is also one from B to A), triangles (such as whether triads are transitive or cyclic), and stars (such as in-stars, where multiple nodes point to a single node; or out-stars, where the ties point in the opposite direction). To estimate the parameters, rather than ordinary least squares (which is used in linear regression), an ERGM relies on Markov-chain Monte Carlo maximum likelihood estimation. Starting from an initial set of parameter estimates, a large number of graphs are simulated to obtain a probability distribution of graphs. The parameter estimates are refined, and the process is repeated until it converges to a set of parameter estimates that best fit the observed network. Goodness-of-fit statistics (analogous to an R2) are examined to determine how well the model fits the data. A significant, positive parameter indicates that in the observed network, the structural feature occurs more often than expected by chance, and a significant, negative parameter indicates that it occurs less often; a nonsignificant parameter means that the structural feature occurs at chance levels.
For many reasons, network analysis is a powerful and predictive method to analyze small-group, organizational, or even international communication-based data. When done appropriately, network analysis can capture large, complicated data structures and summarize their many relationships in a manner that is instructive and illuminating. An important aspect of network analysis is its ability to visualize large, sophisticated network structures in a way that few other social scientific research procedures can do, adding to the scope of understanding in health and risk communication.
Alexanderson, G. L. (2006). Euler and Königsberg’s bridges: A historical view. Bulletin of the American Mathematical Society, 43, 567–573.Find this resource:
Bavelas, A. (1950). Communication patterns in task-oriented groups. Journal of the Acoustical Society of America, 22, 725–730.Find this resource:
Bonacich, P. (1972). Factoring and weighting approaches to status scores and clique identification. Journal of Mathematical Sociology, 2, 113–120.Find this resource:
Borgatti, S. P. (2005). Centrality and network flow. Social Networks, 27, 55–71.Find this resource:
Cho, H., & Lee, J.-S. (2008). Collaborative information seeking in intercultural computer-mediated communication groups: Testing the influence of social context using social network analysis. Communication Research, 35, 548–573.Find this resource:
Freeman, L. C. (1979). Centrality in social networks: Conceptual clarification. Social Networks, 1(3), 215–239.Find this resource:
Granovetter, M. S. (1973). The strength of weak ties. American Journal of Sociology, 78, 1360–1380.Find this resource:
Harary, F., Norman, R. Z., & Cartwright, D. (1965). Structural models: An introduction to the theory of directed graphs. New York: Wiley.Find this resource:
Hubert, L. J., & Baker, F. B. (1978). Evaluating the conformity of sociometric measurements. Psychometrika, 43, 31–41.Find this resource:
Leavitt, H. J. (1951). Some effects of certain communication patterns on group performance. Journal of Abnormal and Social Psychology, 46, 38–50.Find this resource:
Lee, S., Monge, P., Bar, F., & Matei, S. A. (2007). The emergence of clusters in the global telecommunications network. Journal of Communication, 57, 414–434.Find this resource:
Milgram, S. (1967). The small world problem. Psychology Today, 2, 60–67.Find this resource:
Moreno, J. L. (1934). Who shall survive? A new approach to the problem of human interrelations. Washington, DC: Nervous and Mental Disease Publishing Co.Find this resource:
Park, H. W., & Thelwall, M. (2008). Developing network indicators for ideological landscapes from the political blogosphere in South Korea. Journal of Computer-Mediated Communication, 13, 856–879.Find this resource:
Robins, G., Pattison, P., Kalish, Y., & Lusher, D. (2007). An introduction to exponenetial random graph (p*) models for social networks. Social Networks, 29, 173–191.Find this resource:
Rogers, E. M., & Kincaid, D. L. (1981). Communication networks: Toward a new paradigm for research. New York: Free Press.Find this resource:
Shaw, M. E. (1954). Group structure and the behavior of individuals in small groups. Journal of Psychology, 38, 139–149.Find this resource:
Shumate, M., & Palazzolo, E. T. (2010). Exponential random graph (p*) models as a method for social network analysis in Communication research. Communication Methods and Measures, 4, 341–371.Find this resource:
Travers, J., & Milgram, S. (1969). An experimental study of the small world problem. Sociometry, 32, 425–443.Find this resource:
Valente, T. W. (1995). Network models of the diffusion of innovations. Cresskill, NJ: Hampton Press.Find this resource:
Valente, T. W. (2010). Social networks and health: Models, methods, and applications. New York: Oxford University Press.Find this resource:
Appendix A: Software for Conducting an ERGM Analysis
Wang, P., Robins, G., & Pattison, P. (2009). PNet: Program for the estimation of exponentional random graph (p*) models. Melbourne, Australia: University of Melbourne. Retrieved from http://www.swinburne.edu.au/fbl/research/transformative-innovation/our-research/MelNet-social-network-group/PNet-software/resources/PNetManual.pdf.
Handcock., M. S., Hunter, D. R., Butts, C. T., Goodreau, S. M., & Morris, M. (2003). Statnet: Software tools for the statistical modeling of network data. Retrieved from http://statnetproject.org.