IEEE SENSORS JOURNAL, VOL. 11, NO. 12, DECEMBER 2011 3377 Impact of Topological Characteristics on Consensus Building in Multiagent Systems Hicham Hatime Abstract—This paper represents a network of nodes as a graph and investigates the impact of topological characteristics on consensus building. The properties evaluated are the algebraic connectivity ratio, average path, average cluster coefficient, av- erage matching index, modularity, and the average participation. The consensus is measured by the average number of state updates that the system requires to get the global scope of the task at hand. Factor analysis and regression analysis are applied to analyze the impact of the topological characteristics, to understand the dependencies among them and to reduce the dimensionality of the analysis. A theoretical study is provided to support the results obtained by the statistical method. The insights from the analysis are discussed. Index Terms—Communication system, consensus building, topology characteristics. I. INTRODUCTION I N MANY fields, multiagent systems are often characterizedby the interactions that take place among the agents. In bi- ology, for example, the collective behavior of cells and tissues is a direct consequence of the properties of their molecular con- stituents. In computer networks, the overall performance is mea- sured by each station’s throughput, security, scalability of pro- tocol, etc. In manufacturing, the quality of a product is evaluated by the functionality, reliability and maintainability of each com- ponent. The performance of such a complex system is affected by the connectivity and placement (topology) of its constituents. Complex systems of collaborative multiagent systems have to address coordination problem as well. Constructing a commu- nication topology for randomly dispersed nodes is necessary for exchanging information. In a mobile ad hoc network, the initial scope of information available to each node is usually limited by the information collected locally. Increasing this scope of information requires either collecting more data or requesting it from other nodes. The iterative process of exchanging local information among nodes followed by inferencing in order to gain global perspective of a problem is referred to as consensus building. There are many factors that influence the consensus building process, communication topology being the most Manuscript received June 07, 2011; accepted June 07, 2011. Date of publica- tion June 23, 2011; date of current version November 02, 2011. The associate editor coordinating the review of this paper and approving it for publication was Prof. Evgeny Katz. The author is with the Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS 67260 USA (e-mail: hxha- time@wichita.edu). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JSEN.2011.2160393 important of all. The relationship between the communication topology and the consensus process, at present, seems to be a black box. Studies aimed at evaluating the impact of a commu- nication topology on the process of consensus building do not exist in the literature as of now. The purpose of this study is to explore this relationship and expose some of its inner struc- ture. Random topologies are generated. Few topology metrics, namely, the algebraic connectivity ratio, average path, average cluster coefficient, average matching index, modularity, and the average participation are evaluated. The objective is to extract the most important metrics influencing the consensus building. In order to investigate the intricate connectivity and coordina- tion, the system has to be translated into an abstract concept that can be mathematically modeled. The model is then subjected to a systematic characterization which seeks to uncover its char- acteristics. These characteristics, in turn, provide insights into the design and organization of the network that affect the con- sensus building process. Statistical techniques such as regres- sion analysis and factor analysis can then be applied to show the interactions and interdependencies among the topological characteristics, and the impact of these structural properties on consensus building. The most significant properties can be in- cluded in the model for consensus building and the performance of such a model can be analyzed in detail. II. BACKGROUND Communication topology in small-world networks (SWN) has been an attractive area of research. They exhibit the desired characteristics of a multiagent system such as short path length, high clustering coefficient, and adaptive communication links. In addition, they are simple to model. Graph theory is usually used to model the topology of SWNs. A network is represented by an undirected graph where nodes represent the set of vertices (V) and the communication links are described by the set of edges (E). In this paper, generating random SWN is accomplished through different stages [1] (see Fig. 1). The first stage utilizes a combination of ART and Maxnet neural network concepts to cluster nodes into separate groups. The second stage is the election of a cluster-head for each cluster to facilitate in- tracluster communication. The election is performed using the minimum dominant set concepts. The third stage consists of establishing intercluster communication through gateways. The selection of a gateway node is evaluated by a willingness function that combines node degree, location to cluster center, available power, and transmis- sion energy. The last stage consists of establishing connections between clusters. 1530-437X/$26.00 © 2011 IEEE 3378 IEEE SENSORS JOURNAL, VOL. 11, NO. 12, DECEMBER 2011 Fig. 1. Three phases in creating a communication topology among randomly dispersed wireless nodes: (a) creation of clusters and intracluster links, elec- tion of cluster-heads and selection of gateways; (b) connecting node clusters to closest gateways; and (c) establishing intercluster links. (6772 61 67108117115116101114 7210197100, 71 61 719711610111997121, 78 61 78111100101). A. Topological Metrics The objective, here, is the selection of a set of topological metrics that need to be evaluated [6], [8]–[15]. Several struc- tural properties and metrics were defined in the literature for networks based on intuitive concepts [2]–[5]. Due to the dif- ficulty in enumerating all properties and metrics, it becomes a necessity to choose a suitable subset. The challenging question then is to identify the most relevant properties and metrics. In this study, the focus is on introducing metrics that were not in- vestigated extensively in the literature, while keeping some of the well investigated metrics (average path and cluster coeffi- cient). Another goal is establishing a relationship between the new set of parameters and other parameters that are commonly used in the existing models in the literature. B. Average Path In graph theory, the minimum distance between two ver- tices and is measured by the length of the shortest path be- tween the vertices or the minimal number of edges that can be traversed. The average path length is defined as the average distance between all pairs of vertices. Computing the shortest path between two vertices is not straightforward. The commonly used algorithm is Floyd–Warshall’s [7], which runs in and returns shortest path between all pairs of vertices. C. Average Cluster Coefficient Cluster coefficient is another characteristic that relates to the internal structure of a network, more precisely its cohesiveness. It measures the probability that two vertices with a common neighbor are connected. Given a vertex with neighbors, possible edges could exist between the neighbors. The clustering coefficient could be defined as the ratio of the actual number of edges over all the possible edges (1) The average cluster coefficient is defined as , where is the number of nodes. D. Average Matching Index Matching index is a basic measure for neighborhood anal- ysis. It quantifies the topological overlap between two nodes by measuring their similarity in terms of common shared neigh- bors. The similarity is given by the following expression: (2) where are elements of the adjacency matrix and are the respective number of neighbors of ver- tices and . The average matching index is defined as: . E. Average Participation Coefficient Participation coefficient is a characteristic that defines the role of each node in a complex network. The role fulfilled by a vertex is an important factor since it decides the node’s connections. A clusterhead or a gateway has more connections than a periph- eral node since they act as facilitators in addition to their default role. This leads to the interpretation that the participation coef- ficient provides information about the link distribution among different clusters. A higher value indicates that the network has a higher ratio of nodes to clusters and vice versa. Computing the participation coefficient of vertex is given by (3) where denotes the degree of within its cluster and is the total degree of . The average participation coefficient is defined as . F. Algebraic Connectivity Ratio Another commonly used property in measuring the connec- tivity in a graph is the algebraic connectivity parameter. It is an important factor in consensus building since it directly affects the network synchronization. This parameter also provides in- formation about the robustness of a topology in terms of link and HATIME: IMPACT OF TOPOLOGICAL CHARACTERISTICS ON CONSENSUS BUILDING IN MULTI-AGENT SYSTEMS 3379 node failure. The higher the value the more failure tolerant is the network. In a given topology, this parameter is evaluated by the second-smallest eigenvalue of the Laplacian matrix . The ratio of the second-smallest eigenvalue over the last eigenvalue de- fines the algebraic connectivity ratio (4) The eigenvalues of are ordered such that they satisfy: . G. Modularity Cluster coefficient, as defined previously, measures the bonding between nodes. Modularity, on the other hand, quan- tifies the strength of clusters and evaluates the quality of the clustering. When “good” clustering occurs, intracluster connec- tions are dense and intercluster links are sparse, whereas, when “bad” clustering occurs, nodes form a cluster on their own. One of the definitions of modularity is the difference between the fraction of intracluster edges over the total network edges and the fraction of extremities that connect two clusters and is given by (5) where represents the fraction of within-cluster edges, and represent the fraction of edges having one extremity in cluster and the other in cluster . is proportional to quality and tends to be negative for “bad” clus- tering. III. CONSENSUS ALGORITHM In the previous sections, the steps involved in building the communication network were presented and a brief description of some structural properties to be extracted from the communi- cation topology was provided. This section, presents a descrip- tion of the distributed implementation of the consensus building algorithm along with the mathematical formulation of the infor- mation flow that will be used as the main metric to evaluate the average state update. Before presenting the consensus building algorithm, we need to define initial target awareness for each node. At this level, nodes will have partial or no information about the targets lo- cation. The objective of each node is to build the list of all tar- gets through the consensus algorithm. Since target discovery is not within the scope of this paper, two simple approaches were considered. The first one assigns randomly a target to a node. The advantage of this method is its fairness in target aware- ness. However, it fails on proximity criteria (target assigned to the closest node). The second approach uses a self organizing map principle when the winner (closest) takes all. It has the advantage of being close to real-world scenario. However, its shortcoming is the lack of fairness. Many nodes end up with no initial information about targets. To remedy this issue, targets were also randomly positioned to increase the chance of par- tial knowledge. Having built a target scope for each node, the Fig. 2. Diagram showing the flow of information between different types of nodes. Fig. 3. Nodes forming clusters on their own where CH3 is the root. next step is to exchange these initial scopes in order to build a consensus (final list of targets). The task is performed through an iterative process that is divided into two data flows processes: bottom up and top down (Fig. 2). Bottom up is used by a child to update its parent node. The top down flow is utilized by a parent to update its child. However, due to randomness in the place- ment of the nodes, there are cases where nodes form clusters on their own. Fig. 3 illustrates an example of isolated clusterheads (ICHs), where CH3 is the root and CH2 is the last descendent. To accommodate this case and reach a consensus among ICHs, the algorithm includes communication data between ICHs. Thus, the following. Bottom up flow: • Each ICH that has a parent ICH sends its target scope to it. • Each ICH or Cluster head (CH) that has a parent Gateway sends its target scope to it. • Each gateway or regular node sends its target scope to the parent cluster head. • During this bottom up data forwarding, each node receiving a target scope updates its scope. Top down flow: • Each CH that has a child gateway or regular node sends its target scope to it. • Each gateway that has a child ICH or CH sends its target scope to it. • Each ICH that has a child ICH sends its target scope to it. • Similarly, each node receiving data, updates its scope. In summary, before the “target discovery” process, all the nodes in the network are in a state of zero knowledge , where is the target scope of node at time . After 3380 IEEE SENSORS JOURNAL, VOL. 11, NO. 12, DECEMBER 2011 the initial discovery of targets, the state of some nodes changes to reflect the partial awareness assuming that there are targets to discover. At the end of the iterative communication process and exchange of information, the nodes reach the saturation state, where they all have a complete and identical view of the targets list. In general, the state of a node changes whenever it receives a state update from either its parent or descendant(s). This can be formulated by the following equation: (6) where is the number of nodes, denotes the adjacency of node in the network topology and emphasizes that only the parent or descendent(s) are allowed to share their state with it. However, not all exchange of information between a node and its parent or descendant(s) results in updates of . Some- times, it takes a few iterations for new information to travel from one extremity of the network to the other, meanwhile, nodes in the middle exchange the same state. To incorporate this into (6), let us define an activation function . Equation (6) then becomes (7) At any given time, a node must check whether it reached the saturation state or not. To accomplish that, it evaluates its state through the following saturation function: (8) where is a constant. If the returned value is 1, the node’s target list is complete. The iterative update process ends when all the nodes reach their saturation. To measure the consensus building, the average state changes could be defined as the total number of necessary updates per- formed by each node to reach saturation over the number of nodes as follows: (9) where is the number of nodes, is the state update of node at time , and is the saturation time. Practically, in small world or random networks, a node can verify whether it reached the saturation state or not without knowing in advance the number of targets to discover. It can be accomplished by evaluating the network size through its diam- eter, which increases logarithmically with the number of nodes [16]. Thus, if the number of unchanged messages coming from one direction exceed (m number of nodes), it is an indi- cation that saturation had been reached. IV. METHODOLOGY In any scientific field, studying the nature of parameters is necessary, but not sufficient. The study should also account for the relationship among the variables. The idea of variables being interrelated is a well-accepted concept that can be mathemati- cally represented. It helps in better understanding the contribu- tion of each parameter, by itself and through interaction, to the overall result. Hence, the evaluation of parameter is an impor- tant factor in deciding whether to account for it or ignore it in the study. Among the commonly used multivariable statistical methods that help scientists in analyzing the relationship between vari- ables is factor analysis. The aim of factor analysis [17], [18] is to reduce the set of observable variables to few hypothetical variables called factors. These factors are supposed to contain the essential information in the original variable set and are constructed in a way that reduces the overall complexity of the data by taking advantage of inherent interdependencies. In addition to studying the interrelationships between the variables, the objective of this section is to partition data into mutually exclusive sets with different weights. The importance assigned to each variable within a set will also help in reducing the dimensionality of data. The factor analysis method involves the following major steps. • Collecting data and computing the correlation matrix. • Extracting the unrotated factors. • Rotating and interpreting the factors. A. Data Collection and Correlation Matrix The selection of the variables to investigate has a critical bearing on the output of the study. The problem becomes more complex when the conducted investigation is an exploratory one. The lack of knowledge about the underlying nature of vari- ables can lead to the discovery of higher orders of overlapping or similarities among the variables that requires a reassessment of the variables under consideration. As mentioned previously in Section II-A, the objective is to establish a bridge between newly introduced variables and some already studied parameters. The topological variables we intend to investigate using factor analysis are the following: • average path; • average cluster coefficient; • average matching index; • average participation coefficient; • algebraic connectivity ratio; • modularity. To perform a study using factor analysis method, there are some constraints that need to be satisfied. The first constraint deals with the measurement characteristics of the variables. Ide- ally, factor analysis requires the data to be continuous and nor- mally distributed (bell-shape). This requirement is important be- cause is has a direct effect on the reliability of the correlation coefficient which is an indicator of the degree of correlation between variables. In practice, a bell shape is seldom satisfied. Prior to analysis, a preprocessing stage is sometimes required to clean up data from noisy, missing and outlier values. Since it is an exploratory phase, the preprocessing stage should be limited HATIME: IMPACT OF TOPOLOGICAL CHARACTERISTICS ON CONSENSUS BUILDING IN MULTI-AGENT SYSTEMS 3381 Fig. 4. Histograms and normal distribution plots for the six variables using a sample size of 1000. to normalizing data to fit a bell shape distribution. The initial test of normality on raw data revealed that none of the variables followed a normal distribution. The best goodness-of-fit value obtained was for a four-parameters beta distribution. The his- tograms and distribution fitting of the six normalized variables are presented in Fig. 4. The second constraint deals with size of the sample. As the size increases the more stability is introduced in the evaluation of the correlation coefficients. Factor analysis recommends a sample size between 500 and 1000. Below that range, spurious variance distortion increases, and above that range no major gains are obtained. After the data preprocessing step, the next crucial point in fac- torial analysis is the evaluation of correlation coefficients. This step helps in determining the relationship strength between the set of measures upon which the final decision will rely on. A major concern at this stage is the accuracy of the correlation coefficients. A result is statistically significant if it is unlikely to have occurred by chance. Usually, a statistical test of signifi- cance is devised for each coefficient. It computes the probability of a result being larger and compares it to the confidence level . The result is said to be statistically significant if . However, a statistical significant result is not always of prac- tical significance. Since the conducted study is an exploratory one, the only option available is to rely on the statistically sig- nificant results. To compute the correlation matrix for the six topological metrics, Pearson’s correlation coefficients is used. Pearson’s technique requires the data to be normally distributed and uses the covariance of the two variables over their standard deviation product to compute the elements of the correlation ma- trix [21]. The coefficients can take on values between 1 and 1. The result is a correlation matrix for each pair of variables along with a p-value for the test of significance for each coeffi- cient. To interpret the correlation coefficients, statisticians rec- ommend squaring the coefficients. The new coefficient is called a coefficient of determination. It provides information about the contribution of each variable to the total variance in one vari- able. For example, if the correlation between and is 0.5, then 25% of the variance in can be explained by the linear relationship between and . The other 75% remains unex- plained. B. Factor Extraction The general goal of factor analysis is to define and extract clusters of highly interrelated variables from the correlation ma- trix [19]. These clusters are called factors and can be written as a linear combination of the variables. Different techniques for factor extraction have been proposed. Each has its possibilities, limitations, and advocates [20]. The techniques include prin- cipal component, maximum likelihood, alpha, image, centroid, minimum residual, etc. The most widely used is principal com- ponent. It has the characteristic of being suitable for exploratory factor analysis. However, some assessments and evaluations of these methods suggest that the best method to use when data is normally distributed is maximum likelihood because it evalu- ates a wide range of goodness of fit indexes for the model [21]. Choosing an extraction technique is only the first step in factor extraction. The next important step is to decide on the number of factors to retain. The main objective is not to over-extract or under-extract. But there is a general consensus among researchers that it is better to extract too many factors than too few. At this stage, different criteria are proposed to stop the extraction process [19]. The most commonly used are the eigenvalues which are the sum of the factors squared weights. The cutoff point is a value above one. Another criterion based on discontinuity is the scree test [21]. It involves locating the point where the sudden drop of eigenvalues occurs on a scree plot. The points above it are generally the factors to retain. A third criterion relies on the desired percentage of variation to be explained by the factors. Extraction in this case, is subjective and relies on the user requirements on when to stop extracting. In this investigation, the idea is to experiment with principal components and maximum likelihood. The objective is to com- pare their results in order to choose the best method to use for the analysis. Maximum likelihood has the characteristic of pro- viding the correct number of factors to extract due mainly due to the definition of the degree of freedom for the chi-square test which depends on the set of variables , where denotes the number of variables and is the number of factors to extract. Knowing the number of factors we will then use the method that is more accurate. C. Factor Rotation and Interpretation The output of factor extraction phase is a factor matrix that contains the weights of each variable on the extracted factors also called loadings. These coefficients are unrotated [20]. They lack meaningful interpretation, although they account 3382 IEEE SENSORS JOURNAL, VOL. 11, NO. 12, DECEMBER 2011 for maximum variability. The objective of factor rotation is to preserve the amount of the variability, while enhancing the interpretability. The improvement is obtained by rotating the axes, thus resulting in a clear separation of variable clusters on each factor. In other words, each cluster of variables will be associated with one factor which simplifies the interpretation. There are two methods for factor rotation [19], [21]. Each offers a variety of choices. The orthogonal method, in which the factors are kept at a 90 angle of each other, employs varimax, equimax, and orthomax as a rotation method. The oblique method, on the other hand, departs from the right angle rule and offers promax, oblimin, and quartimin as options. Orthogonal rotation produces factors that are uncorrelated and easy to interpret. The oblique method takes into consideration the possible existence of correlation between the extracted factors. Forcing orthogonality on correlated factors, results in a loss of information. In exploratory data, it is recommended to implement the oblique method since both methods yield same results on uncorrelated factors. To rotate the factors in this study, the oblique method with the “promax” option was implemented. D. Multiple Regression Analysis Multiple regression analysis is a statistical tool aimed at un- covering and understanding the relationship between a variable to be explained, called a dependent variable, and explanatory variables, called independent variables [22]. Multiple regression analysis tests whether the independent variables contribute to the variance in the dependent variable or not. It is used in many fields to measure the magnitude of the effect, if present, and forecast its value. Multiple regression analysis involves many steps, among them, the characterization of the relationship between the dependent variable and the set of predictors by a mathematical representation. The model can adopt a linear or a nonlinear form, depending on the nature of the relationship. It is a crucial step to determine this nature before proceeding with the anal- ysis itself. Another step that can substantially bias the results is the choice of the predictors. Failure to select the appropriate independent variables can lead to the collinearity effect [23]. This happens when two or more independent variables contain much of the same information, making it difficult for regression analysis to determine which one is more important. Another concern, at this step, is the precision of the measurements when irrelevant variables are included, which tend to reduce the precision of the results. Multiple regression analysis offers a set of tools that helps in the interpretation of the results. The most important are the following. • The R-square represents the portion of variance in the dependent variable explained by all the predictors when grouped together. • The ANOVA (analysis of variance) table, which can be thought of as a significance value for the whole model, confirms the statistical significance of the R-square value. • The coefficients table indicates both the weight of each variable in the equation and informs about the severity of collinearity. TABLE I(a) SUMMARIZES THE PERCENTAGE CONTRIBUTION OF EACH VARIABLE TO THE TOTAL VARIANCE OF A VARIABLE. THE SHADED AREAS ARE THE CONTRIBUTION THAT FAILED THE STATISTICAL TEST OF SIGNIFICANCE. THE LAST COLUMN INDICATES THE COMBINED CONTRIBUTIONS OF ALL VARIABLES TO THE VARIANCE OF ONE VARIABLE TABLE I (b) SHOWS THE PERCENTAGE CONTRIBUTION OF EACH VARIABLE TO THE TOTAL VARIANCE OF A VARIABLE V. STATISTICAL RESULTS Series of experiments were conducted in a MATLAB envi- ronment where scores of communication topologies were gener- ated using eight nodes randomly dispersed in a small geographic area. The choice of eight nodes was dictated by the desire of obtaining a balance between the number of single-node clusters and multiple-node clusters. The number of targets to be assigned to the nodes was also fixed to 8. Hopping that in the best case the initial scope of each node will be 1. The coordinates of tar- gets, however, remained unchanged, thus allowing one param- eter (topology) to change while targets positions are fixed. Collecting data was performed in two phases. First, the six structural metrics were evaluated for each topology generated. The same topology was then used, in the second stage, as a com- munication mean for the nodes to update their target scope and evaluate the average state update. The data was then fed to the Statistical Package for the Social Sciences (SPSSs) for factor analysis and regression modeling. After normalizing the data, the next step in the methodology described in Section IV-A is the evaluation of the correlation matrix for the six metrics. Using the coding listed below, Table I(a) summarizes the contribution percentage of each variable to the overall variance of a topology metric. The actual values are presented in Table I(b). The cases that failed the statistical test of significance are represented by the shaded area. Since the matrix is symmetric, only the upper diagonal is presented. • Low (L): ; • Medium (M): ; • High (H): ; HATIME: IMPACT OF TOPOLOGICAL CHARACTERISTICS ON CONSENSUS BUILDING IN MULTI-AGENT SYSTEMS 3383 TABLE II SHOWS THE LOADINGS FOR EACH FACTOR EXTRACTED, THE COMMUNALITIES VALUES AND THE VARIANCE ACCOUNTED FOR. Note. LOADINGS AND COMMUNALITIES 406741 60 5853 ARE OMITTED From Table I(b), all variables exhibit little influence over each other variance. Thus, suggesting the existence of a weak cor- relation between the variables. However, the last column sug- gests that the variability in one variable can be explained by the combined variance in all other metrics. Table I(a), also, reveals that average cluster coefficient failed the test of significance sug- gesting that its contribution is minimal if not inexistent. Combining factor extraction and rotation stage, maximum likelihood and principal components analysis, with “promax” rotation, were conducted to assess the underlying structure for the six variables. Several assumptions were tested, beforehand, to verify the existence of a factor analytic solution. The Kaiser- Meyer-Olkin and Bartelett’s tests were performed to measure the sampling adequacy and spehericity [23]. The first test re- veals whether the distribution of data is acceptable for con- ducting a factor analysis. The level attributed to the data was on the border between middling and mediocre. The second test measures whether data is normally distributed and correlated enough to provide a reasonable basis for factor analysis. This condition was also met since the significance value for the test was far below the 0.05 value required. When experiencing with factor extraction, maximum likelihood did not allow the extrac- tion of more than two factors. The two extracted factors were de- signed to represent two constructs: intracluster structure (factor 1) and intercluster structure (factor 2). After rotation the first factor accounted for 35% of variance and the second factor for 24%. Table II displays the variables and factor loadings for the rotated factors along with their communalities (C) which repre- sent the amount of variance in each variable accounted for by the factor. A high value indicates that the extracted factors are a good representation of the variables. Values less than 0.5 were omitted to improve clarity. Few observations can be made from Table II. The commu- nalities evaluated by the principal components indicate that the factors extracted represent well the original variables. Both methods reveal that the average cluster coefficient is not a determinant variable in variance changes. Another observation, in which both methods differ, is related to the average path variable. Principal components report it as a variable to be considered with a low loading, whereas maximum likelihood suggests ignoring it. In a second phase, a multiple regression assessment was con- ducted in order to determine the best linear combination of the five structural variables for predicting the consensus variable TABLE III SIGNIFICANCE TEST OF LINEARITY VERIFIED FOR ALL THE INDEPENDENT VARIABLES. 112 60 48584853 Fig. 5. Residual scatter-plot indicating that the errors are normally distributed and no pattern is generated. (average state update). Prior to the assessment, there are some assumptions that need to be considered. The linear regression requires the relationship between each of the independents to the dependent variable to be linear. There are different ways to evaluate the linearity. The graphical method uses scatter-plots to visualize the relationship. It is the commonly recommended although difficult to interpret when the sample size is large. An- other technique, statistical method, relies on the test of signifi- cance of the linearity hypothesis. The relationship is linear if the correlation coefficient between the independent variables and the dependent variable is statistically significant. This condition was met for all the five considered variables. Table III shows that the tests are below the 0.05 level of signif- icance. Another assumption that was also checked is the residual (predicted minus observed values) being normally distributed. The scatter-plot in Fig. 5 shows that dots are scattered and no patterns are created. The ANOVA analysis of the regression model, in Table IV, showed that the combination of variables significantly predict the average update . The model summary in Table V exhibited a multiple correlation coefficient R equal to 0.71 and the adjusted R-square is 0.5 meaning that 50% of the variance in average state update variable can be pre- dicted by algebraic connectivity ratio, average matching index, modularity and average participation combined. According to Cohen’s effect size, this is a considerable effect [24]. The beta weights represented in Table VI suggested that the intracluster structure contributed more to the prediction of average update. The tolerances in the same table indicate a low collinearity . This is due to the “orthogonality” of the two extracted factors. 3384 IEEE SENSORS JOURNAL, VOL. 11, NO. 12, DECEMBER 2011 TABLE IV SHOWS THAT THE REGRESSION MODEL SIGNIFICANTLY PREDICTS THE CONSENSUS BUILDING VARIABLE TABLE V THE ADJUSTED R SQUARE INDICATED THE PRESENCE OF A FAIRLY GOOD MODEL EXPLAINING 50% OF THE VARIANCE IN THE CONSENSUS BUILDING VARIABLE TABLE VI SHOWS THAT INTRACLUSTER PARAMETERS ARE MORE IMPORTANT, THE COLLINEARITY EFFECT IS MINIMAL SINCE TOLERANCE IS 62 485853 61 40490485852575741 AND THE MODEL COULD BE REPRESENTED BY THE FOLLOWING EQUATION 65118101114971031010 83116971161010 11711210097116101 61 4858505549 3 105110116114970 99108117115116101114434858495153 3 10511011610111499108117115116101114 43 4958564955. VI. THEORETICAL ANALYSIS So far, the statistical approach revealed that intra and in- tercluster links contribute to some degree in the consensus and synchronization among a set of nodes. The collected data for OCTOPUS [1] communication network emphasized the importance of intracluster links over intercluster links in the synchronization process. The subjectivity of these results is a concern. Collecting different data could lead to different results. The question is to prove mathematically the validity of the statistical results and uncover any other condition(s) on the role that intra and intercluster links play in the consensus. In this section, we analyze the importance of intra and inter- cluster links in the consensus building for OCTOPUS topology. We proceed by defining a ratio for intracluster, intercluster, and overall network synchronization. We investigate the impact of intra and intercluster links on these ratios. The consensus process is a dynamic system by nature. It in- volves node’s state update. Some of its features, in OCTOPUS underlying lattice, are time and state variables. Modeling the dy- namics of OCTOPUS consensus involves defining the so called coupled map lattice which represents the dynamics of a node with respect to the state of other nodes at a given time. The gen- eral form of the coupling equation is [25] (10) where is the state of node at time , is the coupling strength, is the weight of the edge between node and , is the adjacency matrix ( if ) and is a mapping function. In this analysis, we focus on deriving a general form of the coupling equation for intra and intercluster convergence and de- fine a new condition for synchronization. Similar as in [1], we define a graph where is the set of vertices or nodes and is the set of edges or links. The set of nodes is divided into separate clusters. , where is a cluster and for . In a cluster, we define as the adjacent neighbors of node . (10) then becomes (11) Within a cluster the coupling equation is (12) We also define which restricts the dynamic of updates to the network organization. For a given cluster, in- forms about the links between the cluster head, dominant node, and its neighbor(s), dominated node(s). In a cluster, is only dependent of . It can take many forms to repre- sent the cluster lattice. One solution is to make indepen- dent of and define it as a ratio of a node degree , intralinks only, over the highest node degree in any cluster . Equation (12) becomes (13) Similarly, we define a coupling equation for nodes that play a role in intercluster communication. In OCTOPUS, these nodes are the gateways and the cluster heads. The ratio comprises the node degree, in terms of the number of intercluster links, over the total number of intercluster links. Equation (13) becomes (14) where is a set comprising one cluster head and one gateway within a cluster. To define the Laplacian of the system, we set and for intra, respectively, intercluster connection, and other- wise. The Laplacian representing the cluster connections (intra) will have one row and one column of identical values. One known characteristic of a Laplacian matrix is that the sum of each row is equal to zero. For this to hold, we set and HATIME: IMPACT OF TOPOLOGICAL CHARACTERISTICS ON CONSENSUS BUILDING IN MULTI-AGENT SYSTEMS 3385 . Equations (13) and (14) become (15) Respectively (16) The purpose is to derive a convergence ratio and boundary condition(s) for (15) and (16). They will help in investigating the impact of intra and interlinks on the overall convergence and synchronization process. This could be accomplished by deriving a general form for the two equations and evaluating the impact of a small perturbation on the compacted equation. When a system approaches the synchronous state , all the nodes exhibit similar state behavior , which is independent of the state of the neighboring node . By denoting and , (15) and (16) take the general form (17) where is the number of nodes. Since the sum of each row in the Laplacian matrix is equal to zero, diagonalizing leads to a set of nonnegative eigenvalues , . This set can be sorted as . Normalizing L using the corresponding eigenvectors , , results in (18) At this stage, a small perturbation of the state is dictated by the equation [26], [27]. Along the eigenvector of (18), the perturbation is given by (19) One solution for (19) is [27] (20) A system of more than two nodes reaches a local asymptotic stability if [28]–[30] (21) Fig. 6. Represents the evolution of a node’s saturation state for different map- ping functions. Fig. 7. Represents Lyapunov exponent values for different number of nodes and different mapping functions. In a dynamical system with an evolution equation, the second term of the addition is identified as the Lyapunov exponent [29]. The more negative the exponent, the greater the stability of the system (22) Equation (21) becomes (23) which leads to the following solutions: For any communication network, the smaller the ratio , the better is the convergence and synchronization. During simulation, the saturation function given in (8) was selected as a mapping function. Fig. 6 indicates that all nodes will reach the saturation state. As the exchange of target list occurs between nodes, each node will update its target scope. The update stops when all targets are visible to a node. Fig. 7 shows that using 4000 iterations, the increase of the number of nodes generates a higher negative Lyapunov exponent. A higher negative value is a sign of greater system stability. Fig. 7 reveals also, that the mapping function used is conducive to a stability point for the communication system. The theoretical ratio (synch-ratio) obtained from (23) defines a lower bound for the communication system sta- bility region and informs about the shape and behavior of the synchronization curve. The system will synchronize and stabi- lize as long as its calculated ratio stays above the theoretical bound and follows its shape. In general, by saturating the area with a higher number of nodes, the number of clusters tends to get smaller. More pre- cisely, increasing the number of nodes leads to an increase, fol- lowed by a decrease of the number of clusters [1]. This is il- lustrated in Fig. 8, where 20 nodes form one cluster. Moreover, 3386 IEEE SENSORS JOURNAL, VOL. 11, NO. 12, DECEMBER 2011 Fig. 8. Formation of one cluster due to a higher number of nodes. Fig. 9. Number nodes in the network versus density of intracluster and inter- cluster links. Fig. 10. Network and intracluster synchronization ratio when the number of nodes increases. the clustering technique is designed to favor dense intracluster links over sparse intercluster links when the number of nodes increases. Fig. 9 shows that the ratio of intracluster links to in- tercluster links increases with an increase of number of nodes. This result forms the basis in the study of the importance of intra and intercluster links in the convergence process. Series of communication topologies were generated by in- creasing the number of nodes. For each topology, a number of measures were evaluated: The ratio of intracluster links over the total number of links, intraclusters synchronization ratio [(15)], interclusters synchronization ratio [(16)], and the overall system synchronization ratio. In Fig. 10, the calculated topology and intracluster synchro- nization ratios are above the theoretical boundary and follow its trend. This confirms that the generated topologies will converge to a stable point. As the number of node increases, the system synchronization is dictated by intracluster synchronization. The intracluster convergence curve merges with the overall system synchronization curve. In contrast to intracluster synchronization, Fig. 11 shows that smaller intercluster synchronization ratios are obtained for smaller number of intercluster links. In other words, a better synchronization is obtained when less intercluster links are present. Fig. 11. System and intercluster synchronization ratio when the number of nodes increases. Fig. 12. Variation of topology synchronization rate when intra and intercluster links vary. Fig. 13. Variation of topology synchronization rate when intra and intercluster synchronization rates vary. A better visualization of the impact of intra and intercluster links on the convergence of a topology is illustrated in Fig. 12. Better system synchronization (low value) is obtained when the ratio of intralinks is higher. Bad system synchronization (high value) is obtained when the ratio of interlinks is higher. Sim- ilarly, in Fig. 13, a better system synchronization is obtained when intracluster synchronization dominates intercluster syn- chronization. Low system synchronization ratios are obtained for low intracluster synchronization ratios. VII. CONCLUSION AND DISCUSSION This paper investigated the effect of few topology struc- tural properties on the consensus building among nodes. Six metrics, algebraic connectivity ratio, average path, average cluster coefficient, average matching index, modularity and average participation, were investigated using factor analysis technique. Only algebraic connectivity ratio, average matching index, modularity and average participation proved to have a major impact on the consensus building. The regression analysis proved that up to 50% in consensus variability could be explained by these four parameters. It also showed that the consensus building could be predicted and that the metrics describing the intracluster connection have more weight in the model. This role was confirmed by the theory part. Better system synchronization is conditioned by good intracluster HATIME: IMPACT OF TOPOLOGICAL CHARACTERISTICS ON CONSENSUS BUILDING IN MULTI-AGENT SYSTEMS 3387 synchronization. This indicates that a good communication topology has more intracluster connections and less intercluster connections. It also suggests that achieving better synchro- nization requires topology creation techniques that focus on generating dense clusters. In this study, only six parameters were investigated resulting in a linear model and 50% explained variance. This result could be improved if more structural and nonstructural metrics were incorporated. Furthermore, a model describing the consensus building for any given topology with a random number of nodes could be devised. REFERENCES [1] H. Hatime, K. Namuduri, and J. M. Watkins, “OCTOPUS: An on-de- mand communication topology updating strategy for mobile sensor networks,” IEEE Sensors J., vol. 11, no. 4, pp. 1004–1012, Apr. 2011. [2] A. Reka and A. L. Barabasi, “Statistical mechanics of complex net- works,” Rev. Modern Phys., vol. 74, no. 1, pp. 47–97, Jan. 2002. [3] H. Tangmunarunkit, R. Govindan, S. Jamin, S. Shenker, and W. Will- inger, “Network topology generators: Degree-based vs. Structural,” in Proc. ACM SIGCOMM, 2002, pp. 147–150. [4] S. Boccalettia, V. Latora, Y. Moreno, M. Chavez, and D. U. Hwanga, “Complex networks: Structure and dynamics,” Phys. Rep., pp. 175–308, Feb. 2006. [5] M. E. J. Newman, “The structure and function of complex networks,” SIAM Rev., vol. 45, pp. 167–256, 2002. [6] B. H. Junker and F. Schreiber, Analysis of Biological Networks. New York: Wiley, Mar. 2008. [7] J. Bang-Jensen and G. Gutin, Digraphs: Theory, Algorithms and Ap- plications. Berlin, Germany: Springer-Verlag, 2001. [8] A. Li and S. Horvath, “Network neighborhood analysis with the multi- node topological overlap measure,” BioInformatics, Nov. 2006. [9] R. Guimerà and L. A. N. Amaral, “Functional cartography of complex metabolic networks,” Nature, vol. 23, no. 2, pp. 22–231, Feb. 2005. [10] A. Krishnan, A. Giuliani, J. P. Zbilut, and M. Tumita, “Implications from a network-based topological analysis of ubiquitin unfolding sim- ulations,” Plos One, vol. 3, no. 5, p. e2149, May 2008. [11] S. Kar and J. M. F. Moura, “Topology for global average consensus,” in Proc. 40th Asilomar Conf. Signals, Syst. Comput., Oct. 2006, pp. 276–280. [12] A. Jamakovic and S. Uhlig, “On the relationship between the algebraic connectivity and graph’s robustness to node and link failures,” in Proc. 3rd EuroNGI Conf. Next Generation Internet Networks, May 2007, pp. 96–102. [13] M. E. J. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Phys. Rev. E, vol. 69, no. 2, pp. 1–15, 2004. [14] E. Ziv, M. Middendorf, and C. H. Wiggins, “Information-theoretic ap- proach to network modularity,” Phys. Rev. E, vol. 71, no. 4, p. e046117, Apr. 2005. [15] E. Ben-Naim, H. Frauenfelder, and Z. Toroczkai, Complex Net- works. New York: Springer, Sep. 2004. [16] L. A. N. Amaral, A. Scala, M. Barthelemy, and H. E. Stanley, “Classes of small-world networks,” in Proc. Nat. Acad. Sci., Jul. 2000, vol. 97, no. 2, pp. 11149–52. [17] R. Reyment and K. G. Joreskog, Applied Factor Analysis in the Natural Science. Cambridge, U.K.: Cambridge Univ. Press, 1993. [18] A. L. Comrey and H. B. Lee, A First Course in Factor Analysis. Mahwah, NJ: Lawrence Erlbaum, 1992. [19] D. F. Polit and C. T. Beck, Nursing Research: Principles and Methods. Baltimore, MD: Williams & Wilkins, Jun. 2000. [20] R. L. Gorsuch, Factor Analysis. Mahwah, NJ: Lawrence Erlbaum, Nov. 1983. [21] A. B. Costello and J. W. Osborne, “Best practices in exploratory factor analysis: Four recommendations for getting the most from your anal- ysis, practical assessment,” Res. Evaluation, vol. 10, no. 7, pp. 1–9, Jul. 2005. [22] D. L. Rubinfeld, “Reference guide on multiple regression,” Reference manual on Scientific Evidence, 2000. [23] N. L. Leech, K. C. Barrett, and G. A. Morgan, SPSS for Intermediate Statistics Use and Interpretation. Mahwah, NJ: Laurence Erlbaum, Jul. 2007. [24] J. Cohen, Statistical Power Analysis for the Behavioral Sciences. Mahwah, NJ: Lawrence Erlbaum, 1988. [25] K. Kaneko, Theory and Applications of Coupled Map Lattices. New York: Wiley, 1993. [26] T. Kato, Perturbation Theory for Linear Operators. New York: Springer-Verlag, 1976. [27] F. M. Atay, T. Biyikoglu, and J. Jost, “Network synchronization: Spec- tral versus statistical properties,” Physica, vol. 224, no. 1–2, pp. 35–41, 2006. [28] O. Edwards, Chaos in Dynamical Systems. Cambridge, U.K.: Cam- bridge Univ. Press, 1993. [29] N. Hiroyuki, Introduction to Chaos. Philadelphia, PA: Institute of Physics, 1999. [30] K. Kaneko and I. Tsuda, Complex Systems: Chaos and Beyond. New York: Springer, 2001. Hicham Hatime received the B.S. degree in computer science from High-Tech School, Rabat, Morocco, in 1995, and the M.S. degrees in computer science and in electrical and computer engineering from Wichita State University, Wichita, KS, in 1998 and 2001 respectively. Currently, he is working towards the Ph.D. degree in electrical engineering and computer science at Wichita State University. He was with Morocco’s Real Estate Department from 1994 to 1995 and with Prestige Informatique, Morocco, from 1995 to 1996. From 1998 to 2007, he was with LSI Logic working on SCSI, fiber channels, and iSCSI protocols. His areas of research include mobile sensors networks and robotics.