Партиционисање графа — разлика између измена

С Википедије, слободне енциклопедије
Садржај обрисан Садржај додат
Нова страница: У математици, проблем партиционисанја графа је дефинисан на основу података предсављених у…
 
Нема описа измене
Ред 1: Ред 1:
{{ПРЕВОД}}
У математици, проблем партиционисанја графа је дефинисан на основу података предсављених у облику графа G = (V,E), где V представља гране графа а Е чворове, тако да је могуће партиционисање G на мање компоненте са специфичним особинама. На пример, к-точлано партиционисање дели гране на к мањих делова. Добро партиционисање графа је оно када је број чворова који се налазе између одвојених компоненти,мали. Партиционисање хомогеног графа, је врста партиционисања графа,које се састоји од проблема дељенја графа на компоненте, тако да су компоненте исте величине и да постоји повезаност између компоненти. Подела(партиционисање) графова има важну улогу у научном рачунању, подели различитих фаза у VLSI дизајну стујних кола и подели послова у више-процесорском систему.
In mathematics, the '''graph partition''' problem is defined on data represented in the form of a [[graph (mathematics)|graph]] ''G'' = (''V'',''E''), with ''V'' vertices and ''E'' edges, such that it is possible to [[partition of a set|partition]] ''G'' into smaller components with specific properties. For instance, a ''k''-way partition divides the vertex set into ''k'' smaller components. A good partition is defined as one in which the number of edges running between separated components is small. Uniform graph partition is a type of graph partitioning problem that consists of dividing a graph into components, such that the components are of about the same size and there are few connections between the components. Important applications of graph partitioning include scientific computing, partitioning various stages of a [[VLSI]] design circuit and task scheduling in multi-processor systems.<ref name="balgraph">
У скорије време, проблем поделе графа је добило на значењу због његове улоге у груписању и откривању клика (групе људи ) у друштвеним, паталошким и биолошким мрежама.
{{cite journal
| author = Andreev, Konstantin and Räcke, Harald,
| title = Balanced Graph Partitioning
| journal = Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
| year = 2004
| location = Barcelona, Spain
| pages = 120–124
| doi = 10.1145/1007912.1007931
| isbn = 1-58113-840-7
}}</ref> Recently, the graph partition problem has gained importance due to its application for clustering and detection of cliques in social, pathological and biological networks. For a survey on recent trends in computational methods and applications see {{harvtxt|Buluc|Meyerhenke|Safro|Sanders|2013}}.<ref name="recentadvances">{{citation
| last1 = Buluc | first1 = Aydin
| last2 = Meyerhenke | first2 = Henning
| last3 = Safro | first3 = Ilya
| last4 = Sanders | first4 = Peter | author4-link = Peter Sanders (computer scientist)
| last5 = Schulz | first5 = Christian
| arxiv = 1311.3144
| title = Recent Advances in Graph Partitioning
| year = 2013
| mode = cs1
| ref = harv}}</ref>

== Problem complexity ==

Typically, graph partition problems fall under the category of [[NP-hard]] problems. Solutions to these problems are generally derived using heuristics and approximation algorithms.<ref name=baltrees>
{{cite journal
| author = Feldmann, Andreas Emil and Foschini, Luca
| title = Balanced Partitions of Trees and Applications
| journal = Proceedings of the 29th International Symposium on Theoretical Aspects of Computer Science
| year = 2012
| location = Paris, France
| pages = 100–111
}}</ref> However, uniform graph partitioning or a balanced graph partition problem can be shown to be NP-complete to approximate within any finite factor.<ref name=balgraph /> Even for special graph classes such as trees and grids, no reasonable approximation algorithms exist,<ref name="fastpartition">
{{cite journal
| author = Feldmann, Andreas Emil
| title = Fast Balanced Partitioning is Hard, Even on Grids and Trees
| journal = Proceedings of the 37th International Symposium on Mathematical Foundations of Computer Science
| year = 2012
| location = Bratislava, Slovakia
}}</ref> unless [[P=NP]]. Grids are a particularly interesting case since they model the graphs resulting from [[Finite element method|Finite Element Model (FEM)]] simulations. When not only the number of edges between the components is approximated, but also the sizes of the components, it can be shown that no reasonable fully polynomial algorithms exist for these graphs.<ref name="fastpartition" />

==Problem==
Consider a graph ''G'' = (''V'', ''E''), where ''V'' denotes the set of ''n'' vertices and ''E'' the set of edges. For a (''k'',''v'') balanced partition problem, the objective is to partition ''G'' into ''k'' components of at most size ''v''&middot;(''n''/''k''), while minimizing the capacity of the edges between separate components.<ref name=balgraph /> Also, given ''G'' and an integer ''k'' > 1, partition ''V'' into ''k'' parts (subsets) ''V''<sub>1</sub>, ''V''<sub>2</sub>, ..., ''V<sub>k</sub>'' such that the parts are disjoint and have equal size, and the number of edges with endpoints in different parts is minimized. Such partition problems have been discussed in literature as bicriteria-approximation or resource augmentation approaches. A common extension is to [[hypergraph]]s, where an edge can connect more than two vertices. A hyperedge is not cut if all vertices are in one partition, and cut exactly once otherwise, no matter how many vertices are on each side. This usage is common in [[electronic design automation]].

===Analysis===
For a specific (''k'', 1&nbsp;+&nbsp;''ε'') balanced partition problem, we seek to find a minimum cost partition of ''G'' into ''k'' components with each component containing maximum of (1&nbsp;+&nbsp;''ε'')&middot;(''n''/''k'') nodes. We compare the cost of this approximation algorithm to the cost of a (''k'',1) cut, wherein each of the ''k'' components must have exactly the same size of (''n''/''k'') nodes each, thus being a more restricted problem. Thus,

: <math>\max_i |V_i| \le (1+\varepsilon) \left\lceil\frac{|V|}{k}\right\rceil.</math>

We already know that (2,1) cut is the minimum bisection problem and it is NP complete.<ref name="minbisect">
{{cite book
| author = Garey, Michael R. and Johnson, David S.
| title = Computers and intractability: A guide to the theory of NP-completeness
| year = 1979
| isbn = 0-7167-1044-7
| publisher = W. H. Freeman & Co.
}}</ref> Next we assess a 3-partition problem wherein ''n''&nbsp;=&nbsp;3''k'', which is also bounded in polynomial time.<ref name=balgraph /> Now, if we assume that we have an finite approximation algorithm for (''k'',&nbsp;1)-balanced partition, then, either the 3-partition instance can be solved using the balanced (''k'',1) partition in ''G'' or it cannot be solved. If the 3-partition instance can be solved, then (''k'',&nbsp;1)-balanced partitioning problem in ''G'' can be solved without cutting any edge. Otherwise if the 3-partition instance cannot be solved, the optimum (''k'', 1)-balanced partitioning in ''G'' will cut at least one edge. An approximation algorithm with finite approximation factor has to differentiate between these two cases. Hence, it can solve the 3-partition problem which is a contradiction under the assumption that ''P''&nbsp;=&nbsp;''NP''. Thus, it is evident that (''k'',1)-balanced partitioning problem has no polynomial time approximation algorithm with finite approximation factor unless ''P''&nbsp;=&nbsp;''NP''.<ref name=balgraph />

The [[planar separator theorem]] states that any ''n''-vertex [[planar graph]] can be partitioned into roughly equal parts by the removal of O(√''n'') vertices. This is not a partition in the sense described above, because the partition set consists of vertices rather than edges. However, the same result also implies that every planar graph of bounded degree has a balanced cut with O(√''n'') edges.

==Graph partition methods==
Since graph partitioning is a hard problem, practical solutions are based on heuristics. There are two broad categories of methods, local and global. Well known local methods are the [[Kernighan&ndash;Lin algorithm]], and [[Fiduccia-Mattheyses algorithm]]s, which were the first effective 2-way cuts by local search strategies. Their major drawback is the arbitrary initial partitioning of the vertex set, which can affect the final solution quality. Global approaches rely on properties of the entire graph and do not rely on an arbitrary initial partition. The most common example is spectral partitioning, where a partition is derived from the spectrum of the adjacency matrix.

==Multi-level methods==
A multi-level graph partitioning algorithm works by applying one or more stages. Each stage reduces the size of
the graph by collapsing vertices and edges, partitions the smaller graph, then maps back and refines this partition of the original graph.<ref>
{{cite conference
|title=A multilevel algorithm for partitioning graphs
|author=Hendrickson, B. and Leland, R.
|conference =Proceedings of the 1995 ACM/IEEE conference on Supercomputing
|pages=28
|year=1995
|publisher=ACM
}}</ref> A wide variety of partitioning and refinement methods can be applied within the overall multi-level scheme. In many cases, this approach can give both fast execution times and very high quality results.
One widely used example of such an approach is [[METIS]],<ref name=":0">{{cite journal
|title=A fast and high quality multilevel scheme for partitioning irregular graphs
|author=Karypis, G. and Kumar, V.
|journal=SIAM Journal on Scientific Computing
|volume=20
|issue=1
|pages=359
|year=1999
|doi=10.1137/S1064827595287997
}}</ref> a graph partitioner, and hMETIS, the corresponding partitioner for hypergraphs.<ref name="hmetis">
{{cite conference
|title=Multilevel hypergraph partitioning: application in VLSI domain
|author=Karypis, G. and Aggarwal, R. and Kumar, V. and Shekhar, S.
|conference=Proceedings of the 34th annual Design Automation Conference
|pages=526–529
|year=1997
}}</ref>

==Spectral partitioning and spectral bisection==
Given a graph with [[adjacency matrix]] ''A'', where an entry ''A<sub>ij</sub>'' implies an edge between node ''i'' and ''j'', and [[degree matrix]] ''D'', which is a diagonal matrix, where each diagonal entry of a row ''i'', ''d<sub>ii</sub>'', represents the node degree of node ''i''. The [[Laplacian matrix|Laplacian of the matrix]] ''L'' is defined as ''L''&nbsp;=&nbsp;''D''&nbsp;&minus;&nbsp;''A''. Now, a ratio-cut partition for graph ''G''&nbsp;=&nbsp;(''V'',&nbsp;''E'') is defined as a partition of ''V'' into disjoint ''U'', and ''W'', such that cost of cut(''U'',''W'')/(|''U''|&middot;|''W''|) is minimized.

In such a scenario, the second smallest [[Eigenvalues and eigenvectors|eigenvalue]] (''λ'') of ''L'', yields a lower bound on the optimal cost (''c'') of ratio-cut partition with ''c''&nbsp;≥&nbsp;''λ''/''n''. The eigenvector (''V'') corresponding to ''λ'', called the Fiedler vector, bisects the graph into only two communities based on the sign of the corresponding vector entry. Division into a larger number of communities is usually achieved by repeated bisection, but this does not always give satisfactory results. The examples in Figures 1,2 illustrate the spectral bisection approach.

[[File:Bisected network.jpg|thumb|Figure 1: The graph ''G''&nbsp;=&nbsp;(5,4) is analysed for spectral bisection. The linear combination of the smallest two eigenvectors leads to [1 1 1 1 1]' having an eigen value&nbsp;=&nbsp;0.]]

[[File:Connected graph..jpg|thumb|Figure 2: The graph ''G''&nbsp;=&nbsp;(5,5) illustrates that the Fiedler vector in red bisects the graph into two communities, one with vertices {1,2,3} with positive entries in the vector space, and the other community has vertices {4,5} with negative vector space entries.]]

Minimum cut partitioning however fails when the number of communities to be partitioned, or the partition sizes are unknown. For instance, optimizing the cut size for free group sizes puts all vertices in the same community. Additionally, cut size may be the wrong thing to minimize since a good division is not just one with small number of edges between communities. This motivated the use of [[Modularity (networks)|Modularity]] (Q) <ref name="npnas">
{{cite journal
| author = Newman, M. E. J.
| year = 2006
| title = Modularity and community structure in networks
| journal = PNAS
| volume = 103
| issue = 23
| pages = 8577–8696
| doi = 10.1073/pnas.0601602103
| pmid=16723398
| pmc=1482622
}}</ref> as a metric to optimize a balanced graph partition. The example in Figure 3 illustrates 2 instances of the same graph such that in ''(a)'' modularity (Q) is the partitioning metric and in ''(b)'', ratio-cut is the partitioning metric. However, it is well-known that Q suffers a resolution limit, producing unreliable results when dealing with small communities. In this context, [[Surprise (networks)|Surprise]]<ref name = "Surprise">
{{cite journal
| author = Rodrigo Aldecoa and Ignacio Marín
| year = 2011
| title = Deciphering network community structure by Surprise
| journal = PLoS ONE
| volume = 6
| issue = 9
| pages = e24195
| doi = 10.1371/journal.pone.0024195
| pmid = 21909420
| url = http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0024195
| pmc=3164713
}}
</ref> has been proposed as an alternative approach for evaluating the quality of a partition.

[[File:Graph comparison.jpg|thumb|Figure 3: Weighted graph ''G'' may be partitioned to maximize ''Q'' in (a) or to minimize the ratio-cut in (b). We see that (a) is a better balanced partition, thus motivating the importance of modularity in graph partitioning problems.]]

==Other graph partition methods==
Spin models have been used for clustering of multivariate data wherein similarities are translated into coupling strengths.<ref name="potts">
{{cite journal
| title = Statistical mechanics of community detection
|date=Jul 2006
| doi = 10.1103/PhysRevE.74.016110
| author = Reichardt, Jörg and Bornholdt, Stefan
| issue = 1
| journal = Phys. Rev. E
| pages = 016110
| volume = 74
}}</ref> The properties of ground state spin configuration can be directly interpreted as communities. Thus, a graph is partitioned to minimize the Hamiltonian of the partitioned graph. The [[Hamiltonian mechanics#Mathematical formalism|Hamiltonian]] (H) is derived by assigning the following partition rewards and penalties.
* Reward internal edges between nodes of same group (same spin)
* Penalize missing edges in same group
* Penalize existing edges between different groups
* Reward non-links between different groups.

Additionally, Kernel PCA based Spectral clustering takes a form of least squares Support Vector Machine framework, and hence it becomes possible to project the data entries to a kernel induced feature space that has maximal variance, thus implying a high separation between the projected communities <ref name="pami">
{{cite journal
| author = Carlos Alzate and Johan A.K. Suykens
| title = Multiway Spectral Clustering with Out-of-Sample Extensions through Weighted Kernel PCA
| journal = IEEE Transactions on Pattern Analysis and Machine Intelligence
| volume = 32
| issn = 0162-8828
| year = 2010
| pages = 335–347
| doi=10.1109/TPAMI.2008.292
| publisher = IEEE Computer Society
| issue = 2
| pmid = 20075462
}}</ref>

Some methods express graph partitioning as a multi-criteria optimization problem which can be solved using local methods expressed in a game theoretic framework where each node makes a decision on the partition it chooses.<ref>Kurve, Griffin, Kesidis (2011) "A graph partitioning game for distributed simulation of networks" Proceedings of the 2011 International Workshop on Modeling, Analysis, and Control of Complex Networks: 9 -16</ref>

== Software tools ==

Chaco,<ref name="chaco">
{{cite journal
| author = Bruce Hendrickson
| title = Chaco: Software for Partitioning Graphs
}}</ref> due to Hendrickson and Leland, implements the multilevel approach outlined above and basic local search algorithms.
Moreover, they implement spectral partitioning techniques.

[[METIS]]<ref name=":0"/> is a graph partitioning family by Karypis and Kumar. Among this family, kMetis aims at greater partitioning speed, hMetis,<ref name="hmetis"/> applies to hypergraphs and aims at partition quality, and ParMetis<ref name=":0"/> is a parallel implementation of the Metis graph partitioning algorithm.

PaToH<ref name="patoh">{{cite conference
| author = Ü. Catalyürek and C. Aykanat
| title = PaToH: Partitioning Tool for Hypergraphs
| year=2011
}}</ref> is another hypergraph partitioner.

Scotch<ref name="scotch">{{cite journal
| author = C. Chevalier and F. Pellegrini
| title = PT-Scotch: A Tool for Efficient Parallel Graph Ordering
| journal = Parallel Computing
| volume = 34
| issue = 6
| pages = 318–331
| year=2008
| doi=10.1016/j.parco.2007.12.001
}}</ref> is graph partitioning framework by Pellegrini. It uses recursive multilevel bisection and includes sequential as well as parallel partitioning techniques.

Jostle<ref name="jostle">{{cite journal
| author = C. Walshaw and M. Cross
| title = Mesh Partitioning: A Multilevel Balancing and Refinement Algorithm
| journal = Journal on Scientific Computing
| volume = 22
| issue = 1
| pages = 63–80
| year=2000
| doi=10.1137/s1064827598337373
}}</ref> is a sequential and parallel graph partitioning solver developed by Chris Walshaw.
The commercialized version of this partitioner is known as NetWorks.

Party<ref name="party">{{cite journal
| author = R. Diekmann, R. Preis, F. Schlimbach and C. Walshaw
| title = Shape-optimized Mesh Partitioning and Load Balancing for Parallel Adaptive FEM
| journal = Parallel Computing
| volume = 26
| issue = 12
| pages = 1555–1581
| year=2000
| doi=10.1016/s0167-8191(00)00043-0
}}</ref> implements the Bubble/shape-optimized framework and the Helpful Sets algorithm.

The software packages DibaP<ref name="dibap">{{cite journal
| author = H. Meyerhenke and B. Monien and T. Sauerwald
| title = A New Diffusion-Based Multilevel Algorithm for Computing Graph Partitions
| journal = Journal of Parallel Computing and Distributed Computing
| volume = 69
| issue = 9
| pages = 750–761
| year=2008
| doi=10.1016/j.jpdc.2009.04.005
}}</ref> and its MPI-parallel variant PDibaP<ref name="pdibap">{{cite conference
| author = H. Meyerhenke
| title = Shape Optimizing Load Balancing for MPI-Parallel Adaptive Numerical Simulations.
| conference = 10th DIMACS Implementation Challenge on Graph Partitioning and Graph Clustering
| pages = 67–82
| year=2013
}}</ref> by Meyerhenke implement the Bubble framework using diffusion; DibaP also uses AMG-based techniques for coarsening and solving linear systems arising in the diffusive approach.

Sanders and Schulz released a graph partitioning package KaHIP<ref name="kahip">{{cite conference
| author = [[Peter Sanders (computer scientist)|P. Sanders]] and C. Schulz
| title = Engineering Multilevel Graph Partitioning Algorithms
| conference = Proceeings of the 19th European Symposium on Algorithms (ESA)
| volume = 6942
| pages = 469–480
| year=2011
}}</ref> (Karlsruhe High Quality Partitioning) that implements for example flow-based methods, more-localized local searches and several parallel and sequential meta-heuristics.

The tools Parkway<ref name="parkway">{{cite journal
| author = A. Trifunovic and W. J. Knottenbelt
| title = Parallel Multilevel Algorithms for Hypergraph Partitioning
| journal = Journal of Parallel and Distributed Computing
| volume = 68
| issue = 5
| pages = 563–581
| year=2008
| doi=10.1016/j.jpdc.2007.11.002
}}</ref> by Trifunovic and
Knottenbelt as well as Zoltan<ref name="zoltan">{{cite conference
| author = K. Devine and E. Boman and R. Heaphy and R. Bisseling and Ü. Catalyurek
| title = Parallel Hypergraph Partitioning for Scientific Computing
| conference = Proceedings of the 20th International Conference on Parallel and Distributed Processing
| pages = 124-124
| year=2006
}}</ref> by Devine et al. focus on hypergraph
partitioning.

'''List of free open-source frameworks:'''
{| class="wikitable"
|-
!Name
!License
!Brief info
|-
|Chaco||GPL|| software package implementing spectral techniques and the multilevel approach
|-
|DiBaP||*|| graph partitioning based on multilevel techniques, algebraic multigrid as well as graph based diffusion
|-
|Jostle||*|| multilevel partitioning techniques and diffusive load-balancing, sequential and parallel
|-
|KaHIP||GPL|| several parallel and sequential meta-heuristics, guarantees the balance constraint
|-
|kMetis||Apache 2.0||graph partitioning package based on multilevel techniques and k-way local search
|-
|Mondriaan||LGPL|| matrix partitioner to partition rectangular sparse matrices
|-
|PaToH||BSD|| multilevel hypergraph partitioning
|-
|Parkway||*|| parallel multilevel hypergraph partitioning
|-
|Scotch||CeCILL-C|| implements multilevel recursive bisection as well as diffusion techniques, sequential and parallel
|-
|Zoltan||BSD|| hypergraph partitioning
|}

==References==
<references/>

==External links==
* Chamberlain, Bradford L. (1998). [http://masters.donntu.edu.ua/2006/fvti/krasnokutskaya/library/generals.pdf "Graph Partitioning Algorithms for Distributing Workloads of Parallel Computations"]

== Bibliography ==
* {{cite book|title=Graph Partitioning: Optimisation and Applications|author=Charles-Edmond Bichot, Patrick Siarry|year=2011|publisher=ISTE – Wiley|isbn=978-1848212336|pages=384|url=http://cebichot.netne.net/graph_partitioning_book/}}
* {{cite book|last=Feldmann|first=Andreas Emil|title=Balanced Partitioning of Grids and Related Graphs: A Theoretical Study of Data Distribution in Parallel Finite Element Model Simulations|year=2012|publisher=Cuvillier Verlag|location=Goettingen, Germany|isbn=978-3954041251|pages=218|url=http://www.pw.ethz.ch/people/research_group/andemil/personal/thesis.pdf}} An exhaustive analysis of the problem from a theoretical point of view.
* {{cite journal |title=An efficient heuristic procedure for partitioning graphs |url=http://www.ece.wisc.edu/~adavoodi/teaching/756-old/papers/kl.pdf |author=BW Kernighan, S Lin |journal=Bell System Technical Journal |year=1970}} One of the early fundamental works in the field. However, performance is O(n<sup>2</sup>), so it is no longer commonly used.
* {{cite conference |title=A Linear-Time Heuristic for Improving Network Partitions |url=http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1585498 |author=CM Fiduccia, RM Mattheyses |conference=Design Automation Conference |year=1982}} A later variant that is linear time, very commonly used, both by itself and as part of multilevel partitioning, see below.
* {{cite journal |title=A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs |url=http://glaros.dtc.umn.edu/gkhome/node/107 |author= G Karypis, V Kumar |journal=Siam Journal on Scientific Computing |year=1999}} Multi-level partitioning is the current state of the art. This paper also has good explanations of many other methods, and comparisons of the various methods on a wide variety of problems.
* {{cite journal |title=Multilevel hypergraph partitioning: applications in VLSI domain |url=http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=748202 |author= Karypis, G., Aggarwal, R., Kumar, V., and Shekhar, S. |journal=IEEE Transactions on Very Large Scale Integration (VLSI) Systems |date=March 1999 |volume=7 |issue=1 |pages=pp. 69–79 |doi=10.1109/92.748202}} Graph partitioning (and in particular, hypergraph partitioning) has many applications to IC design.
* {{cite journal |title=Optimization by Simulated Annealing |url=http://www.sciencemag.org/cgi/content/abstract/220/4598/671 |author=S. Kirkpatrick, C. D. Gelatt Jr., and M. P. Vecchi |journal=Science |date=13 May 1983 |issue=4598 |pages=671–680 |doi= 10.1126/science.220.4598.671 |volume=220 |pmid=17813860}} Simulated annealing can be used as well.
* {{cite journal |title=New spectral methods for ratio cut partitioning and clustering |url=http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=159993 |author=Hagen, L. and Kahng, A.B. |journal=IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems |date=September 1992 |volume=11,&nbsp; |issue=9 |pages= 1074–1085 |doi=10.1109/43.159993}}. There is a whole class of ''spectral partitioning'' methods, which use the Eigenvectors of the Laplacian of the connectivity graph. You can see [http://www.stanford.edu/~dgleich/demos/matlab/spectral/spectral.html a demo of this], using Matlab.

[[Category:NP-complete problems]]
[[Category:Computational problems in graph theory]]

Верзија на датум 24. мај 2015. у 16:56

Шаблон:ПРЕВОД In mathematics, the graph partition problem is defined on data represented in the form of a graph G = (V,E), with V vertices and E edges, such that it is possible to partition G into smaller components with specific properties. For instance, a k-way partition divides the vertex set into k smaller components. A good partition is defined as one in which the number of edges running between separated components is small. Uniform graph partition is a type of graph partitioning problem that consists of dividing a graph into components, such that the components are of about the same size and there are few connections between the components. Important applications of graph partitioning include scientific computing, partitioning various stages of a VLSI design circuit and task scheduling in multi-processor systems.[1] Recently, the graph partition problem has gained importance due to its application for clustering and detection of cliques in social, pathological and biological networks. For a survey on recent trends in computational methods and applications see Buluc et al. (2013).[2]

Problem complexity

Typically, graph partition problems fall under the category of NP-hard problems. Solutions to these problems are generally derived using heuristics and approximation algorithms.[3] However, uniform graph partitioning or a balanced graph partition problem can be shown to be NP-complete to approximate within any finite factor.[1] Even for special graph classes such as trees and grids, no reasonable approximation algorithms exist,[4] unless P=NP. Grids are a particularly interesting case since they model the graphs resulting from Finite Element Model (FEM) simulations. When not only the number of edges between the components is approximated, but also the sizes of the components, it can be shown that no reasonable fully polynomial algorithms exist for these graphs.[4]

Problem

Consider a graph G = (V, E), where V denotes the set of n vertices and E the set of edges. For a (k,v) balanced partition problem, the objective is to partition G into k components of at most size v·(n/k), while minimizing the capacity of the edges between separate components.[1] Also, given G and an integer k > 1, partition V into k parts (subsets) V1, V2, ..., Vk such that the parts are disjoint and have equal size, and the number of edges with endpoints in different parts is minimized. Such partition problems have been discussed in literature as bicriteria-approximation or resource augmentation approaches. A common extension is to hypergraphs, where an edge can connect more than two vertices. A hyperedge is not cut if all vertices are in one partition, and cut exactly once otherwise, no matter how many vertices are on each side. This usage is common in electronic design automation.

Analysis

For a specific (k, 1 + ε) balanced partition problem, we seek to find a minimum cost partition of G into k components with each component containing maximum of (1 + ε)·(n/k) nodes. We compare the cost of this approximation algorithm to the cost of a (k,1) cut, wherein each of the k components must have exactly the same size of (n/k) nodes each, thus being a more restricted problem. Thus,

We already know that (2,1) cut is the minimum bisection problem and it is NP complete.[5] Next we assess a 3-partition problem wherein n = 3k, which is also bounded in polynomial time.[1] Now, if we assume that we have an finite approximation algorithm for (k, 1)-balanced partition, then, either the 3-partition instance can be solved using the balanced (k,1) partition in G or it cannot be solved. If the 3-partition instance can be solved, then (k, 1)-balanced partitioning problem in G can be solved without cutting any edge. Otherwise if the 3-partition instance cannot be solved, the optimum (k, 1)-balanced partitioning in G will cut at least one edge. An approximation algorithm with finite approximation factor has to differentiate between these two cases. Hence, it can solve the 3-partition problem which is a contradiction under the assumption that P = NP. Thus, it is evident that (k,1)-balanced partitioning problem has no polynomial time approximation algorithm with finite approximation factor unless P = NP.[1]

The planar separator theorem states that any n-vertex planar graph can be partitioned into roughly equal parts by the removal of O(√n) vertices. This is not a partition in the sense described above, because the partition set consists of vertices rather than edges. However, the same result also implies that every planar graph of bounded degree has a balanced cut with O(√n) edges.

Graph partition methods

Since graph partitioning is a hard problem, practical solutions are based on heuristics. There are two broad categories of methods, local and global. Well known local methods are the Kernighan–Lin algorithm, and Fiduccia-Mattheyses algorithms, which were the first effective 2-way cuts by local search strategies. Their major drawback is the arbitrary initial partitioning of the vertex set, which can affect the final solution quality. Global approaches rely on properties of the entire graph and do not rely on an arbitrary initial partition. The most common example is spectral partitioning, where a partition is derived from the spectrum of the adjacency matrix.

Multi-level methods

A multi-level graph partitioning algorithm works by applying one or more stages. Each stage reduces the size of the graph by collapsing vertices and edges, partitions the smaller graph, then maps back and refines this partition of the original graph.[6] A wide variety of partitioning and refinement methods can be applied within the overall multi-level scheme. In many cases, this approach can give both fast execution times and very high quality results. One widely used example of such an approach is METIS,[7] a graph partitioner, and hMETIS, the corresponding partitioner for hypergraphs.[8]

Spectral partitioning and spectral bisection

Given a graph with adjacency matrix A, where an entry Aij implies an edge between node i and j, and degree matrix D, which is a diagonal matrix, where each diagonal entry of a row i, dii, represents the node degree of node i. The Laplacian of the matrix L is defined as L = D − A. Now, a ratio-cut partition for graph G = (VE) is defined as a partition of V into disjoint U, and W, such that cost of cut(U,W)/(|U|·|W|) is minimized.

In such a scenario, the second smallest eigenvalue (λ) of L, yields a lower bound on the optimal cost (c) of ratio-cut partition with c ≥ λ/n. The eigenvector (V) corresponding to λ, called the Fiedler vector, bisects the graph into only two communities based on the sign of the corresponding vector entry. Division into a larger number of communities is usually achieved by repeated bisection, but this does not always give satisfactory results. The examples in Figures 1,2 illustrate the spectral bisection approach.

Figure 1: The graph G = (5,4) is analysed for spectral bisection. The linear combination of the smallest two eigenvectors leads to [1 1 1 1 1]' having an eigen value = 0.
Figure 2: The graph G = (5,5) illustrates that the Fiedler vector in red bisects the graph into two communities, one with vertices {1,2,3} with positive entries in the vector space, and the other community has vertices {4,5} with negative vector space entries.

Minimum cut partitioning however fails when the number of communities to be partitioned, or the partition sizes are unknown. For instance, optimizing the cut size for free group sizes puts all vertices in the same community. Additionally, cut size may be the wrong thing to minimize since a good division is not just one with small number of edges between communities. This motivated the use of Modularity (Q) [9] as a metric to optimize a balanced graph partition. The example in Figure 3 illustrates 2 instances of the same graph such that in (a) modularity (Q) is the partitioning metric and in (b), ratio-cut is the partitioning metric. However, it is well-known that Q suffers a resolution limit, producing unreliable results when dealing with small communities. In this context, Surprise[10] has been proposed as an alternative approach for evaluating the quality of a partition.

Figure 3: Weighted graph G may be partitioned to maximize Q in (a) or to minimize the ratio-cut in (b). We see that (a) is a better balanced partition, thus motivating the importance of modularity in graph partitioning problems.

Other graph partition methods

Spin models have been used for clustering of multivariate data wherein similarities are translated into coupling strengths.[11] The properties of ground state spin configuration can be directly interpreted as communities. Thus, a graph is partitioned to minimize the Hamiltonian of the partitioned graph. The Hamiltonian (H) is derived by assigning the following partition rewards and penalties.

  • Reward internal edges between nodes of same group (same spin)
  • Penalize missing edges in same group
  • Penalize existing edges between different groups
  • Reward non-links between different groups.

Additionally, Kernel PCA based Spectral clustering takes a form of least squares Support Vector Machine framework, and hence it becomes possible to project the data entries to a kernel induced feature space that has maximal variance, thus implying a high separation between the projected communities [12]

Some methods express graph partitioning as a multi-criteria optimization problem which can be solved using local methods expressed in a game theoretic framework where each node makes a decision on the partition it chooses.[13]

Software tools

Chaco,[14] due to Hendrickson and Leland, implements the multilevel approach outlined above and basic local search algorithms. Moreover, they implement spectral partitioning techniques.

METIS[7] is a graph partitioning family by Karypis and Kumar. Among this family, kMetis aims at greater partitioning speed, hMetis,[8] applies to hypergraphs and aims at partition quality, and ParMetis[7] is a parallel implementation of the Metis graph partitioning algorithm.

PaToH[15] is another hypergraph partitioner.

Scotch[16] is graph partitioning framework by Pellegrini. It uses recursive multilevel bisection and includes sequential as well as parallel partitioning techniques.

Jostle[17] is a sequential and parallel graph partitioning solver developed by Chris Walshaw. The commercialized version of this partitioner is known as NetWorks.

Party[18] implements the Bubble/shape-optimized framework and the Helpful Sets algorithm.

The software packages DibaP[19] and its MPI-parallel variant PDibaP[20] by Meyerhenke implement the Bubble framework using diffusion; DibaP also uses AMG-based techniques for coarsening and solving linear systems arising in the diffusive approach.

Sanders and Schulz released a graph partitioning package KaHIP[21] (Karlsruhe High Quality Partitioning) that implements for example flow-based methods, more-localized local searches and several parallel and sequential meta-heuristics.

The tools Parkway[22] by Trifunovic and Knottenbelt as well as Zoltan[23] by Devine et al. focus on hypergraph partitioning.

List of free open-source frameworks:

Name License Brief info
Chaco GPL software package implementing spectral techniques and the multilevel approach
DiBaP * graph partitioning based on multilevel techniques, algebraic multigrid as well as graph based diffusion
Jostle * multilevel partitioning techniques and diffusive load-balancing, sequential and parallel
KaHIP GPL several parallel and sequential meta-heuristics, guarantees the balance constraint
kMetis Apache 2.0 graph partitioning package based on multilevel techniques and k-way local search
Mondriaan LGPL matrix partitioner to partition rectangular sparse matrices
PaToH BSD multilevel hypergraph partitioning
Parkway * parallel multilevel hypergraph partitioning
Scotch CeCILL-C implements multilevel recursive bisection as well as diffusion techniques, sequential and parallel
Zoltan BSD hypergraph partitioning

References

  1. ^ а б в г д Andreev, Konstantin and Räcke, Harald, (2004). „Balanced Graph Partitioning”. Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures. Barcelona, Spain: 120—124. ISBN 1-58113-840-7. doi:10.1145/1007912.1007931. 
  2. ^ Buluc, Aydin; Meyerhenke, Henning; Safro, Ilya; Sanders, Peter; Schulz, Christian (2013). Recent Advances in Graph Partitioning. arXiv:1311.3144Слободан приступ. 
  3. ^ Feldmann, Andreas Emil and Foschini, Luca (2012). „Balanced Partitions of Trees and Applications”. Proceedings of the 29th International Symposium on Theoretical Aspects of Computer Science. Paris, France: 100—111. 
  4. ^ а б Feldmann, Andreas Emil (2012). „Fast Balanced Partitioning is Hard, Even on Grids and Trees”. Proceedings of the 37th International Symposium on Mathematical Foundations of Computer Science. Bratislava, Slovakia. 
  5. ^ Garey, Michael R. and Johnson, David S. (1979). Computers and intractability: A guide to the theory of NP-completeness. W. H. Freeman & Co. ISBN 0-7167-1044-7. 
  6. ^ Hendrickson, B. and Leland, R. (1995). A multilevel algorithm for partitioning graphs. Proceedings of the 1995 ACM/IEEE conference on Supercomputing. ACM. стр. 28. 
  7. ^ а б в Karypis, G. and Kumar, V. (1999). „A fast and high quality multilevel scheme for partitioning irregular graphs”. SIAM Journal on Scientific Computing. 20 (1): 359. doi:10.1137/S1064827595287997. 
  8. ^ а б Karypis, G. and Aggarwal, R. and Kumar, V. and Shekhar, S. (1997). Multilevel hypergraph partitioning: application in VLSI domain. Proceedings of the 34th annual Design Automation Conference. стр. 526—529. 
  9. ^ Newman, M. E. J. (2006). „Modularity and community structure in networks”. PNAS. 103 (23): 8577—8696. PMC 1482622Слободан приступ. PMID 16723398. doi:10.1073/pnas.0601602103. 
  10. ^ Rodrigo Aldecoa and Ignacio Marín (2011). „Deciphering network community structure by Surprise”. PLoS ONE. 6 (9): e24195. PMC 3164713Слободан приступ. PMID 21909420. doi:10.1371/journal.pone.0024195. 
  11. ^ Reichardt, Jörg and Bornholdt, Stefan (јул 2006). „Statistical mechanics of community detection”. Phys. Rev. E. 74 (1): 016110. doi:10.1103/PhysRevE.74.016110. 
  12. ^ Carlos Alzate and Johan A.K. Suykens (2010). „Multiway Spectral Clustering with Out-of-Sample Extensions through Weighted Kernel PCA”. IEEE Transactions on Pattern Analysis and Machine Intelligence. IEEE Computer Society. 32 (2): 335—347. ISSN 0162-8828. PMID 20075462. doi:10.1109/TPAMI.2008.292. 
  13. ^ Kurve, Griffin, Kesidis (2011) "A graph partitioning game for distributed simulation of networks" Proceedings of the 2011 International Workshop on Modeling, Analysis, and Control of Complex Networks: 9 -16
  14. ^ Bruce Hendrickson. „Chaco: Software for Partitioning Graphs”. 
  15. ^ Ü. Catalyürek and C. Aykanat (2011). PaToH: Partitioning Tool for Hypergraphs. 
  16. ^ C. Chevalier and F. Pellegrini (2008). „PT-Scotch: A Tool for Efficient Parallel Graph Ordering”. Parallel Computing. 34 (6): 318—331. doi:10.1016/j.parco.2007.12.001. 
  17. ^ C. Walshaw and M. Cross (2000). „Mesh Partitioning: A Multilevel Balancing and Refinement Algorithm”. Journal on Scientific Computing. 22 (1): 63—80. doi:10.1137/s1064827598337373. 
  18. ^ R. Diekmann, R. Preis, F. Schlimbach and C. Walshaw (2000). „Shape-optimized Mesh Partitioning and Load Balancing for Parallel Adaptive FEM”. Parallel Computing. 26 (12): 1555—1581. doi:10.1016/s0167-8191(00)00043-0. 
  19. ^ H. Meyerhenke and B. Monien and T. Sauerwald (2008). „A New Diffusion-Based Multilevel Algorithm for Computing Graph Partitions”. Journal of Parallel Computing and Distributed Computing. 69 (9): 750—761. doi:10.1016/j.jpdc.2009.04.005. 
  20. ^ H. Meyerhenke (2013). Shape Optimizing Load Balancing for MPI-Parallel Adaptive Numerical Simulations. 10th DIMACS Implementation Challenge on Graph Partitioning and Graph Clustering. стр. 67—82. 
  21. ^ P. Sanders and C. Schulz (2011). Engineering Multilevel Graph Partitioning Algorithms. Proceeings of the 19th European Symposium on Algorithms (ESA). 6942. стр. 469—480. 
  22. ^ A. Trifunovic and W. J. Knottenbelt (2008). „Parallel Multilevel Algorithms for Hypergraph Partitioning”. Journal of Parallel and Distributed Computing. 68 (5): 563—581. doi:10.1016/j.jpdc.2007.11.002. 
  23. ^ K. Devine and E. Boman and R. Heaphy and R. Bisseling and Ü. Catalyurek (2006). Parallel Hypergraph Partitioning for Scientific Computing. Proceedings of the 20th International Conference on Parallel and Distributed Processing. стр. 124—124. 

External links

Bibliography