Exploring a way of using networks to create means and therefore predictions of missing links in the network.
If a team A plays a team B then a statistic can be formed of their relative strength(RS) r by dividing r=A/B.
If a team plays itself then we would expect its relative strength to be r=1
In a round-robin competition where every team plays every other team a network of RSs exists. Where loops exist it is possible to calculate the score where team A plays team A. For example in the Tri-Nations competition, where 3 teams play, suppose Australia plays New-Zealand with r=2, and NZ then plays South-Africa with r=2. This means that A is twice as good as NZ who is twice as good as SA so we would expect SA to be 1/4 as good as Australia (assuming it is the same Australia we started with). It is this assumption which is flawed as teams are never the same and so can never play themselves! This is a kind of SRH where relationship with oneself once again proves impossible. However holding the assumption that loops multiply to 1 and that such consistent networks reflect the underlying dynamics of competitions the following calculations can be made.
In a competition of N teams there are N-1 degrees of freedom in a consistent network. So with N=4 teams playing there are 3 unique r values. Imagine the network (a square with diagonals drawn in). Once we know 3 of the sides (a,b,c) the 4th side is simply d=1/(a*b*c). One diagonal is e=1/(a*b) and the other diagonal is f=1/(b*c). Care must be taken to treat these as vectors where direction from A->B always means A/B.
Once real values are known for any of these values the following expression can be used to find the “nearest” consistent network.
If each arc of the network is treated as a dimension then the space of all possible networks between N teams is represented by an N(N-1)/2 dimensional space. However only certain points in this space are consistent networks. The consistent networks obey the loop constraints as discussed so the distance from the real network to a consistent network is given by D. Numerical minimisation of D offers the easiest way to discover the nearest consistent network, that is the consistent network that gives the smallest overall change to the data values.
UPDATE (3/10/2011)
However this method means reducing a score from 1 to 0.5 is considered a smaller change than increasing one from 1 to 2 when actually they are the same change. Instead it makes sense to calculate the difference between log values D, then take Exp(magnitude of D) as the Euclidean distance. This forms each part of the above expression.
Recap on 2011 6-Nations championship
N=6 teams play, with a network of size 6.5/2=15 and N-1=5 degrees of freedom (size of the system). From the final championship scores the following relative scores r were calculated (original actual scores before minimising in brackets):
Eng/Ita=4.08 (4.54)
Ita/Ire=0.35 (0.85)
Ire/Fra=1.44 (0.88)
Fra/Sco=0.99 (1.62)
Sco/Wal=1.04 (0.25)
Wal/Eng=0.47 (0.73)
Sco/Eng=0.50 (0.73)
Fra/Eng=0.49 (0.53)
Ire/Eng=0.70 (3.00)
Wal/Ita=1.94 (1.50)
Sco/Ita=2.02 (2.63)
Fra/Ita=1.99 (0.95)
Wal/Ire=0.68 (1.46)
Sco/Ire=0.71 (0.86)
Wal/Fra=0.97 (0.32)
D=3.15
Tri-Nations 2011
N=3 with 3x2/2=3 arcs and 2 degrees of freedom.
This year only 2 matches were played with the following relative scores and geometric average.
A/SA = 1.65
SA/NZ = 0.51
NZ/A = 1.19
For geometrically averaged
r D=0.33!
Geometric because if a team does 4x another on one occasion and 1/4 on the next occasion they are on average the same as that other team!
Very much the teams didn’t play as well as themselves! Also the limited size of the network means it doesn’t hold much information. Both rounds are consistent within themselves (small D), but contradictory between themselves. Since the scores represent how many times one team scored than the other it is perhaps better to use the geometric mean for averaging them.
Rugby World Cup 2011
The first week is done and South Africa/Wales (r=1.06), Australia/Ireland (r=0.4!!) and Australia/Italy (r=5.33) have both played Southern Hemisphere teams. These two games link the networks above and enable a consistent network to be calculated for the top 9 teams.
N=9,full network arcs=36 (not all games will be played),d.f.=8
There are 6 more arcs to add to the 6 Nations network
Aus/SA = (1.65)
SA/NZ = (0.51)
NZ/Aus = (1.19)
SA/Wal = (1.06)
Aus/Ire = (0.40)
Aus/Ita = (5.33)
Numerically minimising the new expression gives the following network seeds from which the rest of the consistent network can be calculated:
D=3.30632
a -> 3.95592, b -> 0.218121, c -> 1.97058, d -> 1.09539, e -> 1.02128, p -> 5.10258, r -> 1.17188, t -> 1.33486
Eng/Ita=3.96 (4.54) [a]
Ita/Ire=0.22 (0.85) [b]
Ire/Fra=1.97 (0.88) [c]
Fra/Sco=1.10 (1.62) [d]
Sco/Wal=1.02 (0.25) [e]
Wal/Eng=0.53 (0.73) [1/(a b c d e)]
Sco/Eng=0.54 (0.73) [1/(a b c d)]
Fra/Eng=0.59 (0.53) [1/(a b c)]
Ire/Eng=1.16 (3.00) [1/(a b)]
Wal/Ita=2.08 (1.50) [1/(b c d e)]
Sco/Ita=2.12 (2.63) [1/(b c d)]
Fra/Ita=2.33 (0.95) [1/(b c)]
Wal/Ire=0.45 (1.46) [1/(c d e)]
Sco/Ire=0.46 (0.86) [1/(c d)]
Wal/Fra=0.89 (0.32) [1/(d e)]
Aus/SA = 1.84 (1.65) [p b c d e/t]
SA/NZ = 0.46 (0.51) [t/(b c d e p r)]
NZ/Aus = 1.17 (1.19) [r]
SA/Wal = 1.33 (1.06) [t]
Aus/Ire = 1.11 (0.40) [p b]
Aus/Ita = 5.10 (5.33) [p]
Using this consistent network we can walk between teams multiplying with the arrows and dividing against the arrows to find the relative strengths of games not played. This provides a ranking of teams.
EvNZ=0.67
EvA=0.78
EvI=0.86
EvSA=1.41
EvF=1.69
EvW=1.89
6 Nations further study
The values of the 6 nations network was evaluated after each game in two ways. (1) the nearest consistent network NCN was calculated each time in one step from the raw data network (2) the NCN was calculated from the last network plus the new dimension of data, i.e. in a series of incremental stems. Version two allowed for teams to gradually improve or weaken during the tournament, while version one assumes that games at the start are as important as those at the end.
Results from 1 step calculation.
The first 5 games do not complete any loops so offer no network with which to work. Team strengths relative to England are given on a Log scale and games mark the x-axis.All teams strengthen against England as the tournament progresses, in particular Ireland who storm ahead in the game against England.
Version 2 – Incremental
Essentially the same results but changes in teams (i.e. the walk through the Network space) is smoother. Scotland in particular should be happy with the way their team improved the most during the tournament, perhaps however because they started so weakly. France and Ireland take the last two games very seriously.
UPDATE 3/10/2011
Following the decision to work in log space to make 1/4x the same magnitude change as 4x the 6 nations 1 step calculation yields:
D=8.13701 – the large D value is because %increases above 1 are as valuable as %decreases below in this method.
{a -> 1.04727, b -> 0.745217, c -> 1.32598, d -> 1.43158, e -> 0.76705}
So relative to England the teams would score:
E-W = 0.955
E I = 1.098
E It = 1.432
E F = 0.939
E S = 1.245
Using this method there is very little between the teams! England is stronger than Ireland, Italy and Scotland, but Wales and France are the threats.
A General Note on Networks
If one walks a network making sure that no “loops” are formed then after N-1 arcs (where N is the number of nodes) the “seed” arcs will be laid from which all other arcs can represented by an alternative route through the seed network. In other words the number of arcs A = N + R – 1(R=the number of “regions”). However regions are topographical in N dimensional space so that no arcs ever cross. I believe this is a recognised formula … will check.
ToDo
It has also not been fully tested whether deciding which arcs to make seeds (i.e. degrees of freedom d.f.) and which are dependent on the network leads to different minima. Easy to test just try a few variations of network.