ANUPhysicsSS2008ABM Networks PajekTutorial
From COSNet
|
Summer school Menu |
Welcome to the Pajek Tutorial!
Aims for this tutorial are that students will be able to:
- Upload their own pajek format network file;
- Visualise the network in a number of ways;
- Run common graph measures over their network; and
- Generate new networks (e.g. random-graphs) for comparision.
Contents |
Uploading
Pajek can accept a range of information about your network. The usual components of a network are mandatory, such as vertices (nodes) and edges or arcs (links between nodes). Notice, that pajek rightly treats edges and arcs differently. Refer to the picture below to see the difference:
Example file:
*Vertices 4 1 "Harry" 2 "Sally" 3 "Jim" 4 "Jenny" *Arcs 1 3 2 3 4 *Edges 1 2
Notes:
- Vertices cannot be indexed 0 (start with 1);
- Whitespace are <space> characters (not tabs);
- The first line includes the number of vertices for pajek to expect (in this case, '4');
- List the edges by specifying the two nodes (any order is fine as these are undirected);
- To specify arcs you have two options:
- If you are using the *Arcs keyword, you can specify one arc per line, with three numbers (separated by spaces): the starting node, ending node, and arc weight.
- If you are using the *Arcslist keyword, on each line you specify the starting node, followed (separated by spaces) the list of destination nodes (arc weight is default of 1).
- Note that Pajek won't draw arcs and edges that are from a node to itself, but they are still there in the data structure.
- pajek doesn't support Unix text files, lines need to be ended with <carriage return><line feed> (if working on a PC don't worry about this);
- The file above is just the basic information about the network (vertices and connections), other information such as each vertex's rank, cluster and/or other information for each vertex can be added as separate files (below);
- Each file can be read into pajek and then saved as a Project (File > Pajek Project File > Save)
Clustering Data (File > Cluster > Read) (note: first line must be empty line)
family-gender.cls *Vertices 4 1 2 1 2
Permutation Data (File > Permutation > Read)
family-rank.per
*Vertices 4 1 1 2 2
Data on each Vertex of any kind ( File > Vector > Read)
family-age.vec *Vertices 4 45 44 18 17
Examples of translating a network to a file
Undirected network
Pajek .net file:
*vertices 5 1 "1" 2 "2" 3 "3" 4 "4" 5 "5" *edges 1 2 1 3 2 3 2 4 2 5 3 5 4 5
Directed network
*vertices 4 1 "1" 2 "2" 3 "3" 4 "4" *arcslist 1 2 4 2 3 4 3 1 4 1 2
Do
- For the network at the beginning of the tutorial (vertices: {a,...,l}), make a text file called myfirstgraph.net to represent the network;
- Be careful to distinguish between Edges and Arcs!
- Read this file into pajek using File > Network > Read.
Visualising Networks
There are a variety of ways to visualise a network. Infact, this is an area of active current research (consider the difference between the layout of the Subway network versus an electical diagram versus a social network versus a gene expression network versus ...). Each network has its own 'natural' way of laying out. Sometimes these are connected strongly to physical interpretations (e.g. an international flight network overlaid on the world map), but very many networks do not have a physical interpretation (e.g. a sexual activity network) -- rather, they are just networks: objects which indicate the relationships between elements (not their location).
Do
- With your network loaded, use Draw > Draw (or just hit Ctrl+G);
- You will see a simple representation, by default, the nodes are arranged on a circle -- try changing this representation, or Layout, by using the different options in Layout > ...
- If the network looks strange to you, try using a different starting position (Layout > Energy > Starting Positions > ...) (see sub-section below on Layouts algorithms);
- Now, try a common layout: Layout > Energy > Fruchterman Reingold > 3D;
- Since your screen is in 2D, and the graph is now laid out in 3D, we need to alter the perspective to 'see' it .. do this as follows: Spin > Spin Around (accept 360 Degrees)
- This likely was too quick for you .. in which case, use Spin > Step in degrees > ... and set this value to something very small (e.g. 0.05), now use Spin > Spin Around again.
On Force-Directed Layouts
'Energy' refers to a class of algorithms called 'Force-directed' (or 'Spring-directed') .. they conceive of the network as a collection of balls, connected by springs of a certain length. The algorithms work along these lines:
Positions <-- Generate Random Position for all Vertices (X,Y)
Energy <-- CalculateTotalForceInSprings
while Energy > Energy-Threshold do {
Displacements <-- CalculateResultanForceOnAllVertices
Positions <-- Positions + Displacements
Energy <-- CalculateTotalForceInSprings
} end while
For this reason, the initial Position is quite important. It can be the case the this algorithm, like so many inter-connected springs, might actually not be able to untangle them all into a 'good' layout. Hence, the option is given in pajek to vary the starting positions.
Exporting the network layout to another program
To get an image file of your current layout, use the Export > 2D > ... menu. Choose a file format that you wish -- e.g. EPS/PS (mainly used for importing into LaTeX, or Bitmap for a web or MS Word application).1
Generating a Comparison Network
It is often useful to compare a given network to one that is like it in some ways, but different in others. For instance, the work of Watts and Strogatz's influential Nature paper (see Refs in parent page) uses comparisons between a given graph and its Random Graph equivalent (same number of nodes and average degree).
We will do the same.
Do
- First, we'll generate a Random Erdos-Renyi network. Net > Random Network > Erdos-Renyi > Undirected > General. Specify a number of Vertices, and the Average Degree (e.g. 50, and 4) .. (obviously, for a comparison net you will want to use the same number of vertices and average degree as your network of interest).
- The network will appear in your 'Networks' row on the home area of pajek.
- Let's visualise it. You can do that as before.
- Now, since this is probably our first large network, we'll do a bit more with our visualisations:
- Back in the Home area, with the new ER Random network highlighted, we'll create a Partition (of the vertices) by the property 'Degree' (number of incident edges) .. Net > Partitions > Degree > Output .. new rows should have been added in the Home area.
- Highlight the original ER Random network (Networks) and use Draw > Draw Partition (or Ctrl+P). This should bring up a coloured visualisation.
- Make the vertices bigger by using Options > Size > of Vertices (e.g. 10)
- Now let's get a better idea of the Partitions .. try Layout > Circular > using Partition .. this will place similar vertices (by their degree/partition) together in space
Common Network Measures
We'll continue working with the large ER Random network we made earlier. A Random network by itself, even visualised with one of the pretty force-directed placement algorithms, is still just a random network -- it is hard to get much out of it. (Mark Newmann calls visualisations of such complicated huge networks, Ridiculograms.)
Now we'll generate some outputs from the network. This process is going to get a bit confusing unless you pay close attention to the way that pajek organises and operates on data.
Notice that there are several fundamental types of data for pajek:
- Networks: Have vertices and edges/arcs (are indexed by their generation number)
- Partitions: Vertex information of a particular kind -- they are actually class information, such that all the vertices are part of a particular class (e.g. 'Male' could be 'Partition=1', 'Female' could be 'Partition=2'), thus, the Partition (class) information often relates to actual vertex information (Vectors)
- Vectors: Information on each vertex -- e.g. 'Male' could be represented by '1' and 'Female' by '2' .. but likewise, each vertex can have all kinds of unique data attached to it
- Permutations: These are produced by functions applying to the whole network that create a new network based on the original network information. E.g. a permutation could be a new network produced which represents the hierarchical relationships between the vertices.
- And .. a place for input data (see above)
- Cluster
- Hierarchy
Do
- Degree distribution: Now, what we wish to do is assign each vertex a number which accords to the number of input, output or either input/output edges that are incident at the given vertex. Hence, we will be aiming to produce a Vector output of (normalised) degrees. The way this is done, is actually to do a Partition of the vertices into their degree numbers (we kill two birds with one operation):
- Net > Partitions > Degree > All .. notice that this produces two new lines in Partition and Vectors;
- Double-click on the text window of Vectors (see pic at right), to bring up a simple text file view of the data -- notice that each vertex now has a number associated with it: this is the normalised degree of each vertex;
- Double-click on the text window of Partitions (see pic at right), this file shows for each vertex, based on its normalised degree number, which class (or 'Partition') of vertex it falls into. Find the vertex with the highest partition number (for me this is vertex 36). Go back to the 'Vector' information for normalised degree and look at that vertex's normalised degree .. for me, this is 0.183... which corresponds to a degree of 9 (0.183... x (N-1)).
- NB: to make a degree distribution, you will have to export the normalised degree vector into another application (e.g. MATLAB, Excel, R, SPSS) and run a histogram function over the data. Infact, porting to R and SPSS is integrated into pajek for this purpose: Tools > R > Send to R > Current Vector
- Clustering Coefficient: This time, we just generate the information using Net > Vector > Clustering Coefficients > CC1;
- Betweeness Centrality: As above, Net > Vector > Centrality > Betweeness
- Diameter: (Longest shortest path between two vertices) Net > Paths between 2 vertices > Diameter
- Hierarchical Decomposition: (useful for finding connections between nodes based on structure alone .. i.e. in an agnostic fashion) Net > Hierarchical Decomposition > Clustering* > Run. NB: this produces a 2D specific hierarchical network output which is different to pajek's normal 'Draw' output, hence it is sent directly to an external .EPS file (a dendrogram) -- hence, to view the file, you will need to either print it directly (it is an encapsulated post-script file), or convert it using (e.g.) GIMP or some such tool.
There are very many measures that you can use -- the above are just a starter to get you going. Exploration in pajek is rewarding!
Refs
- Pajek homepage (downloading, updates etc.)
- Pajek manual
- Web tutorials:
- Tutorial by Vladimir Batagelj given at the NICTA workshop (Sydney, Australia)
- Primer tutorial at Indiana
- Software:
- Other tidbits:




