IEEE SENSORS JOURNAL, VOL. 11, NO. 12, DECEMBER 2011 3377
 Impact of Topological Characteristics on Consensus
 Building in Multiagent Systems
 Hicham Hatime
 Abstract—This paper represents a network of nodes as a graph
 and investigates the impact of topological characteristics on
 consensus building. The properties evaluated are the algebraic
 connectivity ratio, average path, average cluster coefficient, av-
 erage matching index, modularity, and the average participation.
 The consensus is measured by the average number of state updates
 that the system requires to get the global scope of the task at hand.
 Factor analysis and regression analysis are applied to analyze
 the impact of the topological characteristics, to understand the
 dependencies among them and to reduce the dimensionality of
 the analysis. A theoretical study is provided to support the results
 obtained by the statistical method. The insights from the analysis
 are discussed.
 Index Terms—Communication system, consensus building,
 topology characteristics.
 I. INTRODUCTION
 I N MANY fields, multiagent systems are often characterizedby the interactions that take place among the agents. In bi-
 ology, for example, the collective behavior of cells and tissues
 is a direct consequence of the properties of their molecular con-
 stituents. In computer networks, the overall performance is mea-
 sured by each station’s throughput, security, scalability of pro-
 tocol, etc. In manufacturing, the quality of a product is evaluated
 by the functionality, reliability and maintainability of each com-
 ponent. The performance of such a complex system is affected
 by the connectivity and placement (topology) of its constituents.
 Complex systems of collaborative multiagent systems have to
 address coordination problem as well. Constructing a commu-
 nication topology for randomly dispersed nodes is necessary for
 exchanging information. In a mobile ad hoc network, the initial
 scope of information available to each node is usually limited
 by the information collected locally. Increasing this scope of
 information requires either collecting more data or requesting
 it from other nodes. The iterative process of exchanging local
 information among nodes followed by inferencing in order to
 gain global perspective of a problem is referred to as consensus
 building. There are many factors that influence the consensus
 building process, communication topology being the most
 Manuscript received June 07, 2011; accepted June 07, 2011. Date of publica-
 tion June 23, 2011; date of current version November 02, 2011. The associate
 editor coordinating the review of this paper and approving it for publication was
 Prof. Evgeny Katz.
 The author is with the Department of Electrical Engineering and Computer
 Science, Wichita State University, Wichita, KS 67260 USA (e-mail: hxha-
 time@wichita.edu).
 Color versions of one or more of the figures in this paper are available online
 at http://ieeexplore.ieee.org.
 Digital Object Identifier 10.1109/JSEN.2011.2160393
 important of all. The relationship between the communication
 topology and the consensus process, at present, seems to be a
 black box. Studies aimed at evaluating the impact of a commu-
 nication topology on the process of consensus building do not
 exist in the literature as of now. The purpose of this study is
 to explore this relationship and expose some of its inner struc-
 ture. Random topologies are generated. Few topology metrics,
 namely, the algebraic connectivity ratio, average path, average
 cluster coefficient, average matching index, modularity, and the
 average participation are evaluated. The objective is to extract
 the most important metrics influencing the consensus building.
 In order to investigate the intricate connectivity and coordina-
 tion, the system has to be translated into an abstract concept that
 can be mathematically modeled. The model is then subjected to
 a systematic characterization which seeks to uncover its char-
 acteristics. These characteristics, in turn, provide insights into
 the design and organization of the network that affect the con-
 sensus building process. Statistical techniques such as regres-
 sion analysis and factor analysis can then be applied to show
 the interactions and interdependencies among the topological
 characteristics, and the impact of these structural properties on
 consensus building. The most significant properties can be in-
 cluded in the model for consensus building and the performance
 of such a model can be analyzed in detail.
 II. BACKGROUND
 Communication topology in small-world networks (SWN)
 has been an attractive area of research. They exhibit the desired
 characteristics of a multiagent system such as short path length,
 high clustering coefficient, and adaptive communication links.
 In addition, they are simple to model. Graph theory is usually
 used to model the topology of SWNs. A network is represented
 by an undirected graph where nodes represent the
 set of vertices (V) and the communication links are described by
 the set of edges (E). In this paper, generating random SWN is
 accomplished through different stages [1] (see Fig. 1). The first
 stage utilizes a combination of ART and Maxnet neural network
 concepts to cluster nodes into separate groups. The second stage
 is the election of a cluster-head for each cluster to facilitate in-
 tracluster communication.
 The election is performed using the minimum dominant set
 concepts. The third stage consists of establishing intercluster
 communication through gateways. The selection of a gateway
 node is evaluated by a willingness function that combines node
 degree, location to cluster center, available power, and transmis-
 sion energy. The last stage consists of establishing connections
 between clusters.
 1530-437X/$26.00 © 2011 IEEE
3378 IEEE SENSORS JOURNAL, VOL. 11, NO. 12, DECEMBER 2011
 Fig. 1. Three phases in creating a communication topology among randomly
 dispersed wireless nodes: (a) creation of clusters and intracluster links, elec-
 tion of cluster-heads and selection of gateways; (b) connecting node clusters to
 closest gateways; and (c) establishing intercluster links. (6772 61 67108117115116101114 7210197100,
 71 61 719711610111997121, 78 61 78111100101).
 A. Topological Metrics
 The objective, here, is the selection of a set of topological
 metrics that need to be evaluated [6], [8]–[15]. Several struc-
 tural properties and metrics were defined in the literature for
 networks based on intuitive concepts [2]–[5]. Due to the dif-
 ficulty in enumerating all properties and metrics, it becomes a
 necessity to choose a suitable subset. The challenging question
 then is to identify the most relevant properties and metrics. In
 this study, the focus is on introducing metrics that were not in-
 vestigated extensively in the literature, while keeping some of
 the well investigated metrics (average path and cluster coeffi-
 cient). Another goal is establishing a relationship between the
 new set of parameters and other parameters that are commonly
 used in the existing models in the literature.
 B. Average Path
 In graph theory, the minimum distance between two ver-
 tices and is measured by the length of the shortest path be-
 tween the vertices or the minimal number of edges that can be
 traversed. The average path length is defined as the
 average distance between all pairs of vertices.
 Computing the shortest path between two vertices is
 not straightforward. The commonly used algorithm is
 Floyd–Warshall’s [7], which runs in and returns shortest
 path between all pairs of vertices.
 C. Average Cluster Coefficient
 Cluster coefficient is another characteristic that relates to the
 internal structure of a network, more precisely its cohesiveness.
 It measures the probability that two vertices with a common
 neighbor are connected. Given a vertex with neighbors,
 possible edges could exist between the neighbors.
 The clustering coefficient could be defined as the ratio of the
 actual number of edges over all the possible edges
 (1)
 The average cluster coefficient is defined as
 , where is the number of nodes.
 D. Average Matching Index
 Matching index is a basic measure for neighborhood anal-
 ysis. It quantifies the topological overlap between two nodes by
 measuring their similarity in terms of common shared neigh-
 bors. The similarity is given by the following expression:
 (2)
 where are elements of the adjacency matrix and
 are the respective number of neighbors of ver-
 tices and . The average matching index is defined as:
 .
 E. Average Participation Coefficient
 Participation coefficient is a characteristic that defines the role
 of each node in a complex network. The role fulfilled by a vertex
 is an important factor since it decides the node’s connections. A
 clusterhead or a gateway has more connections than a periph-
 eral node since they act as facilitators in addition to their default
 role. This leads to the interpretation that the participation coef-
 ficient provides information about the link distribution among
 different clusters. A higher value indicates that the network has
 a higher ratio of nodes to clusters and vice versa. Computing the
 participation coefficient of vertex is given by
 (3)
 where denotes the degree of within its cluster and is the
 total degree of . The average participation coefficient is defined
 as .
 F. Algebraic Connectivity Ratio
 Another commonly used property in measuring the connec-
 tivity in a graph is the algebraic connectivity parameter. It is an
 important factor in consensus building since it directly affects
 the network synchronization. This parameter also provides in-
 formation about the robustness of a topology in terms of link and
HATIME: IMPACT OF TOPOLOGICAL CHARACTERISTICS ON CONSENSUS BUILDING IN MULTI-AGENT SYSTEMS 3379
 node failure. The higher the value the more failure tolerant is the
 network. In a given topology, this parameter is evaluated by the
 second-smallest eigenvalue of the Laplacian matrix . The ratio
 of the second-smallest eigenvalue over the last eigenvalue de-
 fines the algebraic connectivity ratio
 (4)
 The eigenvalues of are ordered such that they satisfy:
 .
 G. Modularity
 Cluster coefficient, as defined previously, measures the
 bonding between nodes. Modularity, on the other hand, quan-
 tifies the strength of clusters and evaluates the quality of the
 clustering. When “good” clustering occurs, intracluster connec-
 tions are dense and intercluster links are sparse, whereas, when
 “bad” clustering occurs, nodes form a cluster on their own.
 One of the definitions of modularity is the difference between
 the fraction of intracluster edges over the total network edges
 and the fraction of extremities that connect two clusters and is
 given by
 (5)
 where represents the fraction of within-cluster edges,
 and represent the fraction of edges
 having one extremity in cluster and the other in cluster . is
 proportional to quality and tends to be negative for “bad” clus-
 tering.
 III. CONSENSUS ALGORITHM
 In the previous sections, the steps involved in building the
 communication network were presented and a brief description
 of some structural properties to be extracted from the communi-
 cation topology was provided. This section, presents a descrip-
 tion of the distributed implementation of the consensus building
 algorithm along with the mathematical formulation of the infor-
 mation flow that will be used as the main metric to evaluate the
 average state update.
 Before presenting the consensus building algorithm, we need
 to define initial target awareness for each node. At this level,
 nodes will have partial or no information about the targets lo-
 cation. The objective of each node is to build the list of all tar-
 gets through the consensus algorithm. Since target discovery is
 not within the scope of this paper, two simple approaches were
 considered. The first one assigns randomly a target to a node.
 The advantage of this method is its fairness in target aware-
 ness. However, it fails on proximity criteria (target assigned to
 the closest node). The second approach uses a self organizing
 map principle when the winner (closest) takes all. It has the
 advantage of being close to real-world scenario. However, its
 shortcoming is the lack of fairness. Many nodes end up with no
 initial information about targets. To remedy this issue, targets
 were also randomly positioned to increase the chance of par-
 tial knowledge. Having built a target scope for each node, the
 Fig. 2. Diagram showing the flow of information between different types of
 nodes.
 Fig. 3. Nodes forming clusters on their own where CH3 is the root.
 next step is to exchange these initial scopes in order to build a
 consensus (final list of targets). The task is performed through
 an iterative process that is divided into two data flows processes:
 bottom up and top down (Fig. 2). Bottom up is used by a child to
 update its parent node. The top down flow is utilized by a parent
 to update its child. However, due to randomness in the place-
 ment of the nodes, there are cases where nodes form clusters on
 their own. Fig. 3 illustrates an example of isolated clusterheads
 (ICHs), where CH3 is the root and CH2 is the last descendent. To
 accommodate this case and reach a consensus among ICHs, the
 algorithm includes communication data between ICHs. Thus,
 the following.
 Bottom up flow:
 • Each ICH that has a parent ICH sends its target scope to it.
 • Each ICH or Cluster head (CH) that has a parent Gateway
 sends its target scope to it.
 • Each gateway or regular node sends its target scope to
 the parent cluster head.
 • During this bottom up data forwarding, each node
 receiving a target scope updates its scope.
 Top down flow:
 • Each CH that has a child gateway or regular node sends
 its target scope to it.
 • Each gateway that has a child ICH or CH sends its target
 scope to it.
 • Each ICH that has a child ICH sends its target scope to it.
 • Similarly, each node receiving data, updates its scope.
 In summary, before the “target discovery” process, all the
 nodes in the network are in a state of zero knowledge
 , where is the target scope of node at time . After
3380 IEEE SENSORS JOURNAL, VOL. 11, NO. 12, DECEMBER 2011
 the initial discovery of targets, the state of some nodes changes
 to reflect the partial awareness assuming
 that there are targets to discover. At the end of the iterative
 communication process and exchange of information, the nodes
 reach the saturation state, where they all have a complete and
 identical view of the targets list.
 In general, the state of a node changes whenever it receives
 a state update from either its parent or descendant(s). This can
 be formulated by the following equation:
 (6)
 where is the number of nodes, denotes the adjacency of
 node in the network topology and emphasizes that only the
 parent or descendent(s) are allowed to share their state with
 it.
 However, not all exchange of information between a node
 and its parent or descendant(s) results in updates of . Some-
 times, it takes a few iterations for new information to travel from
 one extremity of the network to the other, meanwhile, nodes in
 the middle exchange the same state. To incorporate this into (6),
 let us define an activation function
 .
 Equation (6) then becomes
 (7)
 At any given time, a node must check whether it reached the
 saturation state or not. To accomplish that, it evaluates its state
 through the following saturation function:
 (8)
 where is a constant. If the returned value is 1, the node’s target
 list is complete. The iterative update process ends when all the
 nodes reach their saturation.
 To measure the consensus building, the average state changes
 could be defined as the total number of necessary updates per-
 formed by each node to reach saturation over the number of
 nodes as follows:
 (9)
 where is the number of nodes, is the state update of node
 at time , and is the saturation time.
 Practically, in small world or random networks, a node can
 verify whether it reached the saturation state or not without
 knowing in advance the number of targets to discover. It can be
 accomplished by evaluating the network size through its diam-
 eter, which increases logarithmically with the number of nodes
 [16]. Thus, if the number of unchanged messages coming from
 one direction exceed (m number of nodes), it is an indi-
 cation that saturation had been reached.
 IV. METHODOLOGY
 In any scientific field, studying the nature of parameters is
 necessary, but not sufficient. The study should also account for
 the relationship among the variables. The idea of variables being
 interrelated is a well-accepted concept that can be mathemati-
 cally represented. It helps in better understanding the contribu-
 tion of each parameter, by itself and through interaction, to the
 overall result. Hence, the evaluation of parameter is an impor-
 tant factor in deciding whether to account for it or ignore it in
 the study.
 Among the commonly used multivariable statistical methods
 that help scientists in analyzing the relationship between vari-
 ables is factor analysis.
 The aim of factor analysis [17], [18] is to reduce the set of
 observable variables to few hypothetical variables called factors.
 These factors are supposed to contain the essential information
 in the original variable set and are constructed in a way that
 reduces the overall complexity of the data by taking advantage
 of inherent interdependencies.
 In addition to studying the interrelationships between the
 variables, the objective of this section is to partition data into
 mutually exclusive sets with different weights. The importance
 assigned to each variable within a set will also help in reducing
 the dimensionality of data.
 The factor analysis method involves the following major
 steps.
 • Collecting data and computing the correlation matrix.
 • Extracting the unrotated factors.
 • Rotating and interpreting the factors.
 A. Data Collection and Correlation Matrix
 The selection of the variables to investigate has a critical
 bearing on the output of the study. The problem becomes more
 complex when the conducted investigation is an exploratory
 one. The lack of knowledge about the underlying nature of vari-
 ables can lead to the discovery of higher orders of overlapping
 or similarities among the variables that requires a reassessment
 of the variables under consideration.
 As mentioned previously in Section II-A, the objective is to
 establish a bridge between newly introduced variables and some
 already studied parameters. The topological variables we intend
 to investigate using factor analysis are the following:
 • average path;
 • average cluster coefficient;
 • average matching index;
 • average participation coefficient;
 • algebraic connectivity ratio;
 • modularity.
 To perform a study using factor analysis method, there are
 some constraints that need to be satisfied. The first constraint
 deals with the measurement characteristics of the variables. Ide-
 ally, factor analysis requires the data to be continuous and nor-
 mally distributed (bell-shape). This requirement is important be-
 cause is has a direct effect on the reliability of the correlation
 coefficient which is an indicator of the degree of correlation
 between variables. In practice, a bell shape is seldom satisfied.
 Prior to analysis, a preprocessing stage is sometimes required to
 clean up data from noisy, missing and outlier values. Since it is
 an exploratory phase, the preprocessing stage should be limited
HATIME: IMPACT OF TOPOLOGICAL CHARACTERISTICS ON CONSENSUS BUILDING IN MULTI-AGENT SYSTEMS 3381
 Fig. 4. Histograms and normal distribution plots for the six variables using a
 sample size of 1000.
 to normalizing data to fit a bell shape distribution. The initial
 test of normality on raw data revealed that none of the variables
 followed a normal distribution. The best goodness-of-fit value
 obtained was for a four-parameters beta distribution. The his-
 tograms and distribution fitting of the six normalized variables
 are presented in Fig. 4.
 The second constraint deals with size of the sample. As the
 size increases the more stability is introduced in the evaluation
 of the correlation coefficients. Factor analysis recommends a
 sample size between 500 and 1000. Below that range, spurious
 variance distortion increases, and above that range no major
 gains are obtained.
 After the data preprocessing step, the next crucial point in fac-
 torial analysis is the evaluation of correlation coefficients. This
 step helps in determining the relationship strength between the
 set of measures upon which the final decision will rely on. A
 major concern at this stage is the accuracy of the correlation
 coefficients. A result is statistically significant if it is unlikely
 to have occurred by chance. Usually, a statistical test of signifi-
 cance is devised for each coefficient. It computes the probability
 of a result being larger and compares it to the confidence level
 . The result is said to be statistically significant if .
 However, a statistical significant result is not always of prac-
 tical significance. Since the conducted study is an exploratory
 one, the only option available is to rely on the statistically sig-
 nificant results. To compute the correlation matrix for the six
 topological metrics, Pearson’s correlation coefficients is used.
 Pearson’s technique requires the data to be normally distributed
 and uses the covariance of the two variables over their standard
 deviation product to compute the elements of the correlation ma-
 trix [21]. The coefficients can take on values between 1 and
 1. The result is a correlation matrix for each pair of variables
 along with a p-value for the test of significance for each coeffi-
 cient. To interpret the correlation coefficients, statisticians rec-
 ommend squaring the coefficients. The new coefficient is called
 a coefficient of determination. It provides information about the
 contribution of each variable to the total variance in one vari-
 able. For example, if the correlation between and is 0.5,
 then 25% of the variance in can be explained by the linear
 relationship between and . The other 75% remains unex-
 plained.
 B. Factor Extraction
 The general goal of factor analysis is to define and extract
 clusters of highly interrelated variables from the correlation ma-
 trix [19]. These clusters are called factors and can be written as
 a linear combination of the variables. Different techniques for
 factor extraction have been proposed. Each has its possibilities,
 limitations, and advocates [20]. The techniques include prin-
 cipal component, maximum likelihood, alpha, image, centroid,
 minimum residual, etc. The most widely used is principal com-
 ponent. It has the characteristic of being suitable for exploratory
 factor analysis. However, some assessments and evaluations of
 these methods suggest that the best method to use when data is
 normally distributed is maximum likelihood because it evalu-
 ates a wide range of goodness of fit indexes for the model [21].
 Choosing an extraction technique is only the first step in
 factor extraction. The next important step is to decide on
 the number of factors to retain. The main objective is not to
 over-extract or under-extract. But there is a general consensus
 among researchers that it is better to extract too many factors
 than too few. At this stage, different criteria are proposed to stop
 the extraction process [19]. The most commonly used are the
 eigenvalues which are the sum of the factors squared weights.
 The cutoff point is a value above one. Another criterion based
 on discontinuity is the scree test [21]. It involves locating the
 point where the sudden drop of eigenvalues occurs on a scree
 plot. The points above it are generally the factors to retain. A
 third criterion relies on the desired percentage of variation to be
 explained by the factors. Extraction in this case, is subjective
 and relies on the user requirements on when to stop extracting.
 In this investigation, the idea is to experiment with principal
 components and maximum likelihood. The objective is to com-
 pare their results in order to choose the best method to use for
 the analysis. Maximum likelihood has the characteristic of pro-
 viding the correct number of factors to extract due mainly due
 to the definition of the degree of freedom for the chi-square test
 which depends on the set of variables
 , where denotes the number of variables and is the
 number of factors to extract. Knowing the number of factors we
 will then use the method that is more accurate.
 C. Factor Rotation and Interpretation
 The output of factor extraction phase is a factor matrix that
 contains the weights of each variable on the extracted factors
 also called loadings. These coefficients are unrotated [20].
 They lack meaningful interpretation, although they account
3382 IEEE SENSORS JOURNAL, VOL. 11, NO. 12, DECEMBER 2011
 for maximum variability. The objective of factor rotation is
 to preserve the amount of the variability, while enhancing the
 interpretability. The improvement is obtained by rotating the
 axes, thus resulting in a clear separation of variable clusters on
 each factor. In other words, each cluster of variables will be
 associated with one factor which simplifies the interpretation.
 There are two methods for factor rotation [19], [21]. Each
 offers a variety of choices. The orthogonal method, in which
 the factors are kept at a 90 angle of each other, employs
 varimax, equimax, and orthomax as a rotation method. The
 oblique method, on the other hand, departs from the right angle
 rule and offers promax, oblimin, and quartimin as options.
 Orthogonal rotation produces factors that are uncorrelated and
 easy to interpret. The oblique method takes into consideration
 the possible existence of correlation between the extracted
 factors. Forcing orthogonality on correlated factors, results in a
 loss of information. In exploratory data, it is recommended to
 implement the oblique method since both methods yield same
 results on uncorrelated factors.
 To rotate the factors in this study, the oblique method with the
 “promax” option was implemented.
 D. Multiple Regression Analysis
 Multiple regression analysis is a statistical tool aimed at un-
 covering and understanding the relationship between a variable
 to be explained, called a dependent variable, and explanatory
 variables, called independent variables [22]. Multiple regression
 analysis tests whether the independent variables contribute to
 the variance in the dependent variable or not. It is used in many
 fields to measure the magnitude of the effect, if present, and
 forecast its value.
 Multiple regression analysis involves many steps, among
 them, the characterization of the relationship between the
 dependent variable and the set of predictors by a mathematical
 representation. The model can adopt a linear or a nonlinear
 form, depending on the nature of the relationship. It is a crucial
 step to determine this nature before proceeding with the anal-
 ysis itself. Another step that can substantially bias the results
 is the choice of the predictors. Failure to select the appropriate
 independent variables can lead to the collinearity effect [23].
 This happens when two or more independent variables contain
 much of the same information, making it difficult for regression
 analysis to determine which one is more important. Another
 concern, at this step, is the precision of the measurements when
 irrelevant variables are included, which tend to reduce the
 precision of the results.
 Multiple regression analysis offers a set of tools that helps
 in the interpretation of the results. The most important are the
 following.
 • The R-square represents the portion of variance in the
 dependent variable explained by all the predictors when
 grouped together.
 • The ANOVA (analysis of variance) table, which can be
 thought of as a significance value for the whole model,
 confirms the statistical significance of the R-square value.
 • The coefficients table indicates both the weight of each
 variable in the equation and informs about the severity of
 collinearity.
 TABLE I(a)
 SUMMARIZES THE PERCENTAGE CONTRIBUTION OF EACH VARIABLE TO
 THE TOTAL VARIANCE OF A VARIABLE. THE SHADED AREAS ARE THE
 CONTRIBUTION THAT FAILED THE STATISTICAL TEST OF SIGNIFICANCE.
 THE LAST COLUMN INDICATES THE COMBINED CONTRIBUTIONS
 OF ALL VARIABLES TO THE VARIANCE OF ONE VARIABLE
 TABLE I (b)
 SHOWS THE PERCENTAGE CONTRIBUTION OF EACH VARIABLE TO THE
 TOTAL VARIANCE OF A VARIABLE
 V. STATISTICAL RESULTS
 Series of experiments were conducted in a MATLAB envi-
 ronment where scores of communication topologies were gener-
 ated using eight nodes randomly dispersed in a small geographic
 area. The choice of eight nodes was dictated by the desire of
 obtaining a balance between the number of single-node clusters
 and multiple-node clusters. The number of targets to be assigned
 to the nodes was also fixed to 8. Hopping that in the best case
 the initial scope of each node will be 1. The coordinates of tar-
 gets, however, remained unchanged, thus allowing one param-
 eter (topology) to change while targets positions are fixed.
 Collecting data was performed in two phases. First, the six
 structural metrics were evaluated for each topology generated.
 The same topology was then used, in the second stage, as a com-
 munication mean for the nodes to update their target scope and
 evaluate the average state update. The data was then fed to the
 Statistical Package for the Social Sciences (SPSSs) for factor
 analysis and regression modeling.
 After normalizing the data, the next step in the methodology
 described in Section IV-A is the evaluation of the correlation
 matrix for the six metrics. Using the coding listed below,
 Table I(a) summarizes the contribution percentage of each
 variable to the overall variance of a topology metric. The actual
 values are presented in Table I(b). The cases that failed the
 statistical test of significance are represented by the shaded
 area. Since the matrix is symmetric, only the upper diagonal is
 presented.
 • Low (L): ;
 • Medium (M): ;
 • High (H): ;
HATIME: IMPACT OF TOPOLOGICAL CHARACTERISTICS ON CONSENSUS BUILDING IN MULTI-AGENT SYSTEMS 3383
 TABLE II
 SHOWS THE LOADINGS FOR EACH FACTOR EXTRACTED, THE COMMUNALITIES
 VALUES AND THE VARIANCE ACCOUNTED FOR. Note. LOADINGS AND
 COMMUNALITIES 406741 60 5853 ARE OMITTED
 From Table I(b), all variables exhibit little influence over each
 other variance. Thus, suggesting the existence of a weak cor-
 relation between the variables. However, the last column sug-
 gests that the variability in one variable can be explained by the
 combined variance in all other metrics. Table I(a), also, reveals
 that average cluster coefficient failed the test of significance sug-
 gesting that its contribution is minimal if not inexistent.
 Combining factor extraction and rotation stage, maximum
 likelihood and principal components analysis, with “promax”
 rotation, were conducted to assess the underlying structure for
 the six variables. Several assumptions were tested, beforehand,
 to verify the existence of a factor analytic solution. The Kaiser-
 Meyer-Olkin and Bartelett’s tests were performed to measure
 the sampling adequacy and spehericity [23]. The first test re-
 veals whether the distribution of data is acceptable for con-
 ducting a factor analysis. The level attributed to the data was
 on the border between middling and mediocre. The second test
 measures whether data is normally distributed and correlated
 enough to provide a reasonable basis for factor analysis. This
 condition was also met since the significance value for the test
 was far below the 0.05 value required. When experiencing with
 factor extraction, maximum likelihood did not allow the extrac-
 tion of more than two factors. The two extracted factors were de-
 signed to represent two constructs: intracluster structure (factor
 1) and intercluster structure (factor 2). After rotation the first
 factor accounted for 35% of variance and the second factor for
 24%. Table II displays the variables and factor loadings for the
 rotated factors along with their communalities (C) which repre-
 sent the amount of variance in each variable accounted for by
 the factor. A high value indicates that the extracted factors are a
 good representation of the variables. Values less than 0.5 were
 omitted to improve clarity.
 Few observations can be made from Table II. The commu-
 nalities evaluated by the principal components indicate that
 the factors extracted represent well the original variables. Both
 methods reveal that the average cluster coefficient is not a
 determinant variable in variance changes. Another observation,
 in which both methods differ, is related to the average path
 variable. Principal components report it as a variable to be
 considered with a low loading, whereas maximum likelihood
 suggests ignoring it.
 In a second phase, a multiple regression assessment was con-
 ducted in order to determine the best linear combination of the
 five structural variables for predicting the consensus variable
 TABLE III
 SIGNIFICANCE TEST OF LINEARITY VERIFIED FOR ALL THE
 INDEPENDENT VARIABLES. 112 60 48584853
 Fig. 5. Residual scatter-plot indicating that the errors are normally distributed
 and no pattern is generated.
 (average state update). Prior to the assessment, there are some
 assumptions that need to be considered. The linear regression
 requires the relationship between each of the independents to
 the dependent variable to be linear. There are different ways to
 evaluate the linearity. The graphical method uses scatter-plots
 to visualize the relationship. It is the commonly recommended
 although difficult to interpret when the sample size is large. An-
 other technique, statistical method, relies on the test of signifi-
 cance of the linearity hypothesis. The relationship is linear if the
 correlation coefficient between the independent variables and
 the dependent variable is statistically significant. This condition
 was met for all the five considered variables.
 Table III shows that the tests are below the 0.05 level of signif-
 icance. Another assumption that was also checked is the residual
 (predicted minus observed values) being normally distributed.
 The scatter-plot in Fig. 5 shows that dots are scattered and no
 patterns are created.
 The ANOVA analysis of the regression model, in Table IV,
 showed that the combination of variables significantly predict
 the average update . The model summary in
 Table V exhibited a multiple correlation coefficient R equal
 to 0.71 and the adjusted R-square is 0.5 meaning that 50%
 of the variance in average state update variable can be pre-
 dicted by algebraic connectivity ratio, average matching index,
 modularity and average participation combined. According to
 Cohen’s effect size, this is a considerable effect [24]. The beta
 weights represented in Table VI suggested that the intracluster
 structure contributed more to the prediction of average update.
 The tolerances in the same table indicate a low collinearity
 . This is due to the “orthogonality” of the
 two extracted factors.
3384 IEEE SENSORS JOURNAL, VOL. 11, NO. 12, DECEMBER 2011
 TABLE IV
 SHOWS THAT THE REGRESSION MODEL SIGNIFICANTLY PREDICTS THE
 CONSENSUS BUILDING VARIABLE
 TABLE V
 THE ADJUSTED R SQUARE INDICATED THE PRESENCE OF A
 FAIRLY GOOD MODEL EXPLAINING 50% OF THE
 VARIANCE IN THE CONSENSUS BUILDING VARIABLE
 TABLE VI
 SHOWS THAT INTRACLUSTER PARAMETERS ARE MORE IMPORTANT, THE
 COLLINEARITY EFFECT IS MINIMAL SINCE TOLERANCE IS 62 485853 61 40490485852575741
 AND THE MODEL COULD BE REPRESENTED BY THE FOLLOWING EQUATION
 65118101114971031010 83116971161010 11711210097116101 61 4858505549 3 105110116114970 99108117115116101114434858495153 3
 10511011610111499108117115116101114 43 4958564955.
 VI. THEORETICAL ANALYSIS
 So far, the statistical approach revealed that intra and in-
 tercluster links contribute to some degree in the consensus
 and synchronization among a set of nodes. The collected data
 for OCTOPUS [1] communication network emphasized the
 importance of intracluster links over intercluster links in the
 synchronization process. The subjectivity of these results is
 a concern. Collecting different data could lead to different
 results. The question is to prove mathematically the validity of
 the statistical results and uncover any other condition(s) on the
 role that intra and intercluster links play in the consensus.
 In this section, we analyze the importance of intra and inter-
 cluster links in the consensus building for OCTOPUS topology.
 We proceed by defining a ratio for intracluster, intercluster, and
 overall network synchronization. We investigate the impact of
 intra and intercluster links on these ratios.
 The consensus process is a dynamic system by nature. It in-
 volves node’s state update. Some of its features, in OCTOPUS
 underlying lattice, are time and state variables. Modeling the dy-
 namics of OCTOPUS consensus involves defining the so called
 coupled map lattice which represents the dynamics of a node
 with respect to the state of other nodes at a given time. The gen-
 eral form of the coupling equation is [25]
 (10)
 where is the state of node at time , is the coupling
 strength, is the weight of the edge between node and ,
 is the adjacency matrix ( if ) and
 is a mapping function.
 In this analysis, we focus on deriving a general form of the
 coupling equation for intra and intercluster convergence and de-
 fine a new condition for synchronization. Similar as in [1], we
 define a graph where is the set of vertices or
 nodes and is the set of edges or links. The set of nodes is
 divided into separate clusters. , where is a
 cluster and for . In a cluster, we define
 as the adjacent neighbors of node . (10) then becomes
 (11)
 Within a cluster the coupling equation is
 (12)
 We also define which restricts the dynamic of
 updates to the network organization. For a given cluster, in-
 forms about the links between the cluster head, dominant node,
 and its neighbor(s), dominated node(s). In a cluster, is only
 dependent of . It can take many forms to repre-
 sent the cluster lattice. One solution is to make indepen-
 dent of and define it as a ratio of a node degree
 , intralinks only, over the highest node degree in any
 cluster . Equation (12)
 becomes
 (13)
 Similarly, we define a coupling equation for nodes that play a
 role in intercluster communication. In OCTOPUS, these nodes
 are the gateways and the cluster heads. The ratio comprises the
 node degree, in terms of the number of intercluster links, over
 the total number of intercluster links. Equation (13) becomes
 (14)
 where is a set comprising one cluster head and one gateway
 within a cluster.
 To define the Laplacian of the system, we set
 and for intra,
 respectively, intercluster connection, and other-
 wise. The Laplacian representing the cluster connections
 (intra) will have one row and one column of identical
 values. One known characteristic of a Laplacian matrix
 is that the sum of each row is equal to zero. For this
 to hold, we set and
HATIME: IMPACT OF TOPOLOGICAL CHARACTERISTICS ON CONSENSUS BUILDING IN MULTI-AGENT SYSTEMS 3385
 . Equations (13) and
 (14) become
 (15)
 Respectively
 (16)
 The purpose is to derive a convergence ratio and boundary
 condition(s) for (15) and (16). They will help in investigating
 the impact of intra and interlinks on the overall convergence
 and synchronization process. This could be accomplished by
 deriving a general form for the two equations and evaluating
 the impact of a small perturbation on the compacted equation.
 When a system approaches the synchronous state , all
 the nodes exhibit similar state behavior
 , which is independent of the state of the neighboring
 node . By denoting and
 , (15) and (16) take the general
 form
 (17)
 where is the number of nodes. Since the sum of each row in
 the Laplacian matrix is equal to zero, diagonalizing leads to
 a set of nonnegative eigenvalues , . This set can
 be sorted as . Normalizing L using
 the corresponding eigenvectors , , results in
 (18)
 At this stage, a small perturbation of the state is dictated by
 the equation [26], [27]. Along the
 eigenvector of (18), the perturbation is given by
 (19)
 One solution for (19) is [27]
 (20)
 A system of more than two nodes reaches a local asymptotic
 stability if [28]–[30]
 (21)
 Fig. 6. Represents the evolution of a node’s saturation state for different map-
 ping functions.
 Fig. 7. Represents Lyapunov exponent values for different number of nodes
 and different mapping functions.
 In a dynamical system with an evolution equation, the second
 term of the addition is identified as the Lyapunov exponent
 [29]. The more negative the exponent, the greater the stability
 of the system
 (22)
 Equation (21) becomes
 (23)
 which leads to the following solutions:
 For any communication network, the smaller the ratio ,
 the better is the convergence and synchronization.
 During simulation, the saturation function
 given in (8) was selected as a mapping function.
 Fig. 6 indicates that all nodes will reach the saturation state.
 As the exchange of target list occurs between nodes, each node
 will update its target scope. The update stops when all targets
 are visible to a node. Fig. 7 shows that using 4000 iterations,
 the increase of the number of nodes generates a higher negative
 Lyapunov exponent. A higher negative value is a sign of greater
 system stability. Fig. 7 reveals also, that the mapping function
 used is conducive to a stability point for the communication
 system. The theoretical ratio (synch-ratio) obtained from
 (23) defines a lower bound for the communication system sta-
 bility region and informs about the shape and behavior of the
 synchronization curve. The system will synchronize and stabi-
 lize as long as its calculated ratio stays above the theoretical
 bound and follows its shape.
 In general, by saturating the area with a higher number of
 nodes, the number of clusters tends to get smaller. More pre-
 cisely, increasing the number of nodes leads to an increase, fol-
 lowed by a decrease of the number of clusters [1]. This is il-
 lustrated in Fig. 8, where 20 nodes form one cluster. Moreover,
3386 IEEE SENSORS JOURNAL, VOL. 11, NO. 12, DECEMBER 2011
 Fig. 8. Formation of one cluster due to a higher number of nodes.
 Fig. 9. Number nodes in the network versus density of intracluster and inter-
 cluster links.
 Fig. 10. Network and intracluster synchronization ratio when the number of
 nodes increases.
 the clustering technique is designed to favor dense intracluster
 links over sparse intercluster links when the number of nodes
 increases. Fig. 9 shows that the ratio of intracluster links to in-
 tercluster links increases with an increase of number of nodes.
 This result forms the basis in the study of the importance of intra
 and intercluster links in the convergence process.
 Series of communication topologies were generated by in-
 creasing the number of nodes. For each topology, a number of
 measures were evaluated: The ratio of intracluster links over the
 total number of links, intraclusters synchronization ratio [(15)],
 interclusters synchronization ratio [(16)], and the overall system
 synchronization ratio.
 In Fig. 10, the calculated topology and intracluster synchro-
 nization ratios are above the theoretical boundary and follow its
 trend. This confirms that the generated topologies will converge
 to a stable point. As the number of node increases, the system
 synchronization is dictated by intracluster synchronization. The
 intracluster convergence curve merges with the overall system
 synchronization curve.
 In contrast to intracluster synchronization, Fig. 11 shows
 that smaller intercluster synchronization ratios are obtained for
 smaller number of intercluster links. In other words, a better
 synchronization is obtained when less intercluster links are
 present.
 Fig. 11. System and intercluster synchronization ratio when the number of
 nodes increases.
 Fig. 12. Variation of topology synchronization rate when intra and intercluster
 links vary.
 Fig. 13. Variation of topology synchronization rate when intra and intercluster
 synchronization rates vary.
 A better visualization of the impact of intra and intercluster
 links on the convergence of a topology is illustrated in Fig. 12.
 Better system synchronization (low value) is obtained when the
 ratio of intralinks is higher. Bad system synchronization (high
 value) is obtained when the ratio of interlinks is higher. Sim-
 ilarly, in Fig. 13, a better system synchronization is obtained
 when intracluster synchronization dominates intercluster syn-
 chronization. Low system synchronization ratios are obtained
 for low intracluster synchronization ratios.
 VII. CONCLUSION AND DISCUSSION
 This paper investigated the effect of few topology struc-
 tural properties on the consensus building among nodes. Six
 metrics, algebraic connectivity ratio, average path, average
 cluster coefficient, average matching index, modularity and
 average participation, were investigated using factor analysis
 technique. Only algebraic connectivity ratio, average matching
 index, modularity and average participation proved to have
 a major impact on the consensus building. The regression
 analysis proved that up to 50% in consensus variability could
 be explained by these four parameters. It also showed that the
 consensus building could be predicted and that the metrics
 describing the intracluster connection have more weight in
 the model. This role was confirmed by the theory part. Better
 system synchronization is conditioned by good intracluster
HATIME: IMPACT OF TOPOLOGICAL CHARACTERISTICS ON CONSENSUS BUILDING IN MULTI-AGENT SYSTEMS 3387
 synchronization. This indicates that a good communication
 topology has more intracluster connections and less intercluster
 connections. It also suggests that achieving better synchro-
 nization requires topology creation techniques that focus on
 generating dense clusters.
 In this study, only six parameters were investigated resulting
 in a linear model and 50% explained variance. This result could
 be improved if more structural and nonstructural metrics were
 incorporated. Furthermore, a model describing the consensus
 building for any given topology with a random number of nodes
 could be devised.
 REFERENCES
 [1] H. Hatime, K. Namuduri, and J. M. Watkins, “OCTOPUS: An on-de-
 mand communication topology updating strategy for mobile sensor
 networks,” IEEE Sensors J., vol. 11, no. 4, pp. 1004–1012, Apr. 2011.
 [2] A. Reka and A. L. Barabasi, “Statistical mechanics of complex net-
 works,” Rev. Modern Phys., vol. 74, no. 1, pp. 47–97, Jan. 2002.
 [3] H. Tangmunarunkit, R. Govindan, S. Jamin, S. Shenker, and W. Will-
 inger, “Network topology generators: Degree-based vs. Structural,” in
 Proc. ACM SIGCOMM, 2002, pp. 147–150.
 [4] S. Boccalettia, V. Latora, Y. Moreno, M. Chavez, and D. U. Hwanga,
 “Complex networks: Structure and dynamics,” Phys. Rep., pp.
 175–308, Feb. 2006.
 [5] M. E. J. Newman, “The structure and function of complex networks,”
 SIAM Rev., vol. 45, pp. 167–256, 2002.
 [6] B. H. Junker and F. Schreiber, Analysis of Biological Networks. New
 York: Wiley, Mar. 2008.
 [7] J. Bang-Jensen and G. Gutin, Digraphs: Theory, Algorithms and Ap-
 plications. Berlin, Germany: Springer-Verlag, 2001.
 [8] A. Li and S. Horvath, “Network neighborhood analysis with the multi-
 node topological overlap measure,” BioInformatics, Nov. 2006.
 [9] R. Guimerà and L. A. N. Amaral, “Functional cartography of complex
 metabolic networks,” Nature, vol. 23, no. 2, pp. 22–231, Feb. 2005.
 [10] A. Krishnan, A. Giuliani, J. P. Zbilut, and M. Tumita, “Implications
 from a network-based topological analysis of ubiquitin unfolding sim-
 ulations,” Plos One, vol. 3, no. 5, p. e2149, May 2008.
 [11] S. Kar and J. M. F. Moura, “Topology for global average consensus,”
 in Proc. 40th Asilomar Conf. Signals, Syst. Comput., Oct. 2006, pp.
 276–280.
 [12] A. Jamakovic and S. Uhlig, “On the relationship between the algebraic
 connectivity and graph’s robustness to node and link failures,” in Proc.
 3rd EuroNGI Conf. Next Generation Internet Networks, May 2007, pp.
 96–102.
 [13] M. E. J. Newman and M. Girvan, “Finding and evaluating community
 structure in networks,” Phys. Rev. E, vol. 69, no. 2, pp. 1–15, 2004.
 [14] E. Ziv, M. Middendorf, and C. H. Wiggins, “Information-theoretic ap-
 proach to network modularity,” Phys. Rev. E, vol. 71, no. 4, p. e046117,
 Apr. 2005.
 [15] E. Ben-Naim, H. Frauenfelder, and Z. Toroczkai, Complex Net-
 works. New York: Springer, Sep. 2004.
 [16] L. A. N. Amaral, A. Scala, M. Barthelemy, and H. E. Stanley, “Classes
 of small-world networks,” in Proc. Nat. Acad. Sci., Jul. 2000, vol. 97,
 no. 2, pp. 11149–52.
 [17] R. Reyment and K. G. Joreskog, Applied Factor Analysis in the Natural
 Science. Cambridge, U.K.: Cambridge Univ. Press, 1993.
 [18] A. L. Comrey and H. B. Lee, A First Course in Factor Analysis.
 Mahwah, NJ: Lawrence Erlbaum, 1992.
 [19] D. F. Polit and C. T. Beck, Nursing Research: Principles and
 Methods. Baltimore, MD: Williams & Wilkins, Jun. 2000.
 [20] R. L. Gorsuch, Factor Analysis. Mahwah, NJ: Lawrence Erlbaum,
 Nov. 1983.
 [21] A. B. Costello and J. W. Osborne, “Best practices in exploratory factor
 analysis: Four recommendations for getting the most from your anal-
 ysis, practical assessment,” Res. Evaluation, vol. 10, no. 7, pp. 1–9, Jul.
 2005.
 [22] D. L. Rubinfeld, “Reference guide on multiple regression,” Reference
 manual on Scientific Evidence, 2000.
 [23] N. L. Leech, K. C. Barrett, and G. A. Morgan, SPSS for Intermediate
 Statistics Use and Interpretation. Mahwah, NJ: Laurence Erlbaum,
 Jul. 2007.
 [24] J. Cohen, Statistical Power Analysis for the Behavioral Sciences.
 Mahwah, NJ: Lawrence Erlbaum, 1988.
 [25] K. Kaneko, Theory and Applications of Coupled Map Lattices. New
 York: Wiley, 1993.
 [26] T. Kato, Perturbation Theory for Linear Operators. New York:
 Springer-Verlag, 1976.
 [27] F. M. Atay, T. Biyikoglu, and J. Jost, “Network synchronization: Spec-
 tral versus statistical properties,” Physica, vol. 224, no. 1–2, pp. 35–41,
 2006.
 [28] O. Edwards, Chaos in Dynamical Systems. Cambridge, U.K.: Cam-
 bridge Univ. Press, 1993.
 [29] N. Hiroyuki, Introduction to Chaos. Philadelphia, PA: Institute of
 Physics, 1999.
 [30] K. Kaneko and I. Tsuda, Complex Systems: Chaos and Beyond. New
 York: Springer, 2001.
 Hicham Hatime received the B.S. degree in computer science from High-Tech
 School, Rabat, Morocco, in 1995, and the M.S. degrees in computer science
 and in electrical and computer engineering from Wichita State University,
 Wichita, KS, in 1998 and 2001 respectively. Currently, he is working towards
 the Ph.D. degree in electrical engineering and computer science at Wichita
 State University.
 He was with Morocco’s Real Estate Department from 1994 to 1995 and with
 Prestige Informatique, Morocco, from 1995 to 1996. From 1998 to 2007, he was
 with LSI Logic working on SCSI, fiber channels, and iSCSI protocols. His areas
 of research include mobile sensors networks and robotics.