Friday 11 September 2015

Finding the maximum flow in a network using Python

To find a maximum flow in a network, we can use a maximum flow algorithm, using the Python networkx library.

Applications of the maximum flow algorithm
Example applications of this are:
(i) finding how much gas can flow through a given pipeline network (eg. in a day),
(ii) finding how many vehicles can pass through a town (in a given period of time).

Bioinformatics examples include:
(i) identifying putative disease genes, by finding the maximum 'information' flow in a network (Chen et al 2011) that takes into account phenotype similarities and protein-protein interactions. The vertices in the network are diseases and genes; and the edges are phenotype similarities between pairs of diseases, and protein-protein interactions between pairs of genes, and all known associations between diseases and genes. For each disease node, it finds the gene node that has the maximum 'information' flow to that disease node.
(ii) partitioning a protein sequence into putative protein domains (Yu et al (2000). The vertices in the network are residues of a protein, and the edges are residue-residue contacts. The protein is split into two domains by finding the 'maximum flow' along edges, where the flow capacities of edges are set in a way that means that residues (nodes) in the same domain have greater flow, based on training data (as far as I understand). To split the protein into multiple domains, the algorithm can be run again and again.

The maximum flow algorithm in Python
At the moment, networkx finds the maximum flow in a network using the highest-label variant of the preflow-push algorithm. As input, we need to know the network structure (nodes and directed edges) and also know flow capacities (maximum flow values) for each edge. We also need to specify a source (start) node for the flow, and a sink (end) node for the flow.

The max-flow min-cut theorem says that the maximum flow must be equal to the value of a minimum cut, ie. to the minimum possible value for a set of a cut, that is, for a set of edges that, when removed from the network, causes the situation that no flow can pass from the source node to the sink node.

Download and install the latest version of the library from the download page (v1.10 as I speak).  (Thanks Martin Aslett for this!) (Note to self: you can install Python packages using 'pip install' or setup.py).

Enter the network data
First we need to enter our network data:
import networkx 
print(networkx.__version__) # check the versionnn
    1.10 
DG=networkx.DiGraph() # make a directed graph (digraph)
DG.add_nodes_from(["S","A","B","C","D","E","T"]) # add nodes
DG.add_edges_from([("S","A"),("S","B"),("S","C"),("A","B"),("A","D"),("B","D"),("B","C"),
    ("B","E"),("C","E"),("D","T"),("E","T")])  
# specify the capacity values for the edges:
networkx.set_edge_attributes(DG, 'capacity', {('S','A'): 5.0, ('S','B'): 6.0, ('S','C'): 8.0, 
    ('A','B'): 4.0, ('A','D'): 10.0, ('B','D'): 3.0, ('B','C'): 2.0, ('B','E'): 11.0, 
    ('C','E'): 6.0, ('D','T'): 9.0, ('E','T'): 4.0})

Plot our  network (just for fun!)
For this I've used the pygraphviz module, as I found it makes nicer pictures of digraphs than networkx:
import pygraphviz
G=pygraphviz.AGraph(strict=False,directed=True)
nodelist = ['A', 'B', 'C', 'D', 'E', 'S', 'T']
G.add_nodes_from(nodelist)
G.add_edge('S','A',label='5')
G.add_edge('S','B',label='6')
G.add_edge('S','C',label='8')
G.add_edge('A','B',label='4')
G.add_edge('A','D',label='10')
G.add_edge('B','D',label='3')
G.add_edge('B','E',label='11')
G.add_edge('B','C',label='2')
G.add_edge('C','E',label='6')
G.add_edge('D','T',label='9')
G.add_edge('E','T',label='4')
G.layout()
G.draw('file.png')














Notes:
- sometimes G.layout(prog='dot') gives a nicer picture.
- to make coloured nodes using pygraphviz:
  G.node_attr['style'] = 'filled'
  n = G.get_node('T')
  n.attr['fillcolor'] = 'red'
  n = G.get_node('D')
  n.attr['fillcolor'] = 'red'

Find the maximum flow
Now find the maximum flow, specifying node "S" as the source node for the flow, node "T" as the sink node for the flow,
networkx.maximum_flow_value(DG, s="S", t="T")
 12.0
networkx.maximum_flow(DG, s="S", t="T")
(12.0, {'S': {'C': 4.0, 'B': 3.0, 'A': 5.0}, 'C': {'E': 4.0}, 'B': {'C': 0, 'E': 0, 'D': 3.0}, 
    'A': {'B': 0, 'D': 5.0}, 'D': {'T': 8.0}, 'E': {'T': 4.0}, 'T': {}})
networkx.minimum_cut(DG, s="S", t="T")
(12.0, ({'C', 'B', 'S', 'E'}, {'A', 'T', 'D'}))
networkx.minimum_cut_value(DG, s="S", t="T")
12.0


This tells us that the value of a minimum cut (and therefore of the maximum flow) is 12.0. The minimum cut given separates nodes A, T, and D from the other nodes, so includes edges SA, AB, BD, and ET. The capacity of this cut is 5 (from SA) + 3 (from BD) + 4 (from ET) = 12.0. So the maximum flow is 12.0!

(Note that the arc AB is not included in this calculation since it is directed from a vertex in the set {A, D, T} to a vertex in the set {S, B, C, E}.)

Example 2 of finding the maximum flow
import networkx
DG=networkx.DiGraph() # make a directed graph (digraph)
DG.add_nodes_from(["S","A","B","C","D","T"]) # add nodes
DG.add_edges_from([("S","A"),("S","B"),("S","C"),("A","B"),("A","C"),("B","D"),("C","D"),("C","T"),("D","T")])
# specify the capacity values for the edges:
networkx.set_edge_attributes(DG, 'capacity', {('S','A'): 2.0, ('S','B'): 4.0, ('S', 'C'): 5.0, ('A', 'B'): 3.0, ('A', 'C'): 3.0, ('B', 'D'): 3.0, ('C', 'D'): 2.0, ('C', 'T'): 4.0, ('D', 'T'): 6.0})
networkx.maximum_flow_value(DG, s="S", t="T")
9.0
networkx.maximum_flow(DG, s="S", t="T")
(9.0, {'S': {'C': 4.0, 'B': 3.0, 'A': 2.0}, 'C': {'D': 2.0, 'T': 4.0}, 'B': {'D': 3.0}, 'A': {'C': 2.0, 'B': 0}, 'D': {'T': 5.0}, 'T': {}})
networkx.minimum_cut(DG, s="S", t="T")
(9.0, ({'C', 'B', 'A', 'S'}, {'T', 'D'}))
networkx.minimum_cut_value(DG, s="S", t="T")
9.0

No comments: