Yet another example of how graphs can be used as models to solve real problems comes from the world of project management. I tend to cringe at anything that involves management; as a former IBMer, I've dealt with my share of paper-pushing pinheaded project managers. But used well, this demonstrates using graphs as a model for a valuable planning tool, and a really good example of how to apply math to real-world problems.
Project managers often define something called PERT charts for planning and scheduling a project and its milestones. A PERT chart is nothing but a labeled, directed graph. I'm going to talk about the simplest form of PERT, which considers only time, but not resources. More advanced versions of PERT exist that also include things like required resources, equipment, etc.
PERT stands for "Project Evaluation and Review Technique" - very typical bureaucratic doubletalk for something very simple. A PERT chart is a graph containing vertices (called events), and weighted, directed edges (called activities). The "event" represents the completion of a particular set of related tasks and all of its prerequisites - a task in complete when all of its incoming activities are completed; the edges represent individual tasks which must be performed, and are labeled with a weight that indicates how long the task takes to complete.
Each activity actually has multiple associated weights: the optimistic, pessimistic, normal, and expected times:
- Optimistic time (O): the minimum possible time it takes to complete the task under ideal conditions.
- Pessimistic time (P): the maximum time that it takes to complete the task under non-catastrophic conditions.
- Normal time (N): the amount of time taken to perform the task under normal conditions.
- Expected Time (T_{E}): the best estimate of the time to complete the task on average. This is computed as a weighted average of the other three times. The exact weighting depends on the history of related tasks (i.e., how often the worst case happens relative to the normal case). The standard default when you have no information is to weight normal time 4 times higher than the optimistic or pessimistic times - so T_{E}=(O + 4N + P)/6.
In a PERT graph, given a task (vertex) A, the tasks that have direct edges to A (and therefore must be completed before A) are called the predecessors of A. The start time of A is entirely dependent on the start times of its predecessors, and the weights of the edges from those predecessors. If A is a predecessor of B, then we can also say that B is a successor of A.
There are a bunch of values you can compute from the PERT graph which are useful in planning and scheduling, and which are easy to compute using standard graph algorithms:
- Start(A): the time at which you expect task A to start - that is, the expected completion time of all of task A's prerequisites.
- Lag(A): the earliest time at which any of the successors of A can start - that is, min({Start(B) : B is a successor of A}).
- Slack(A,t): if t is the time at which A actually starts, Slack(A,t) = Start(A)_{E}-t. That is, the slack is a measure of how far ahead or behind the expected schedule you are.
- Lead(A): the time by which A must be completed in order to not delay the execution of any of its successors.
- Buffer(A): the amount of time by which A can be delayed without delaying the start time of its successors; Min({Start(B) | B is a successor of A}-Time(B,A))-Start(A).
- CritPath: the longest path from start to finish through the entire PERT chart. The sum of the weights along the critical path is the expected time to completion of the entire task.
Those values are all important: they tell you how long things will take, what depends on what. But more important, analyzing the graph, you can see where adding resources to reduce the time to complete some activity will help, and where it won't. If you can speed up some activity that is not on the critical path, it won't help. If you can speed something up by spending more money/resources, the PERT chart and the critical path can tell you just how much you'll benefit from doing it, and how much benefit you can achieve before you start wasting resources (that is, when you've reduced something enough that you've changed the critical path).
For example, look at the graph below, which shows a set of tasks and activities, weighted with the expected time for each activity.
First, let's work out the expected start times of all the tasks, as well as the buffer
of a couple of examples.
- Start(A) = 2, Buffer(A) = 0.
- Start(B)=max(Start(A)+2,Start(C)+3) = 6, Buffer(B)=min{(Start(D)-(B,D)), (Start(E)-(B,E))}-Start(B) = min{(13-7),(8-2)}-6 = 6-6 = 0.
- Start(C)= 3
- Start(D) = max(Start(A)+6,Start(B)+7) = max(8,13) = 13
- Start(E) = Start(B)+2 = 8, Buffer(E) = Start(I)-(E,I)-Start(E) = 24-4-13 = 7.
- Start(F) = Start(C) + 5 = 8
- Start(G) = Start(D) + 2 = 15
- Start(H) = Start(G) + 5 = 20, Buffer(H)=Start(Finish)-(H,Finish)-Start(H) = 30 - 3 - 20 = 7.
- Start(I) = max(Start(D)+11,Start(E)+4,Start(F)+3) = max(24,12,11) = 24
- Finish=max(Start(H)+3,Start(I)+6) = max(23,30) = 30
The completion time for this project is 30, which comes from the critical path (C,B,D,I,Finish).
Suppose we wanted to make that time shorter. Suppose we know we can reduce both Start→C and C→B to cost 1 each. Should we do that? If we first reduce Start→C to 1, we nick two off of the time to completion - great. But then if we cut C→B to one, we've reduced the cost of C→B by 2, but the time to completion doesn't change: if we do that reduction, then the critical path changes - instead of starting with (C,B), it will start with (A,B), and Start(A) will still be 4.
From looking at the buffers, we can see that, for example, we can delay the start of E by up to 7 units. E can start at time 8, but we can delay its start time to as late as time 15 without affecting the completion time. Similarly, we can see that we can also delay H by 7 without affecting completion.
We can also see from looking at lags and comparing them to path lengths, that while we can start task H at time 20, we can delay it up to 7 clicks. In fact, we can add delays up to 7 anywhere along the path (D,G,H,Finish), because it's off the critical path, and will not change the critical path or the time to completion unless 8 clicks are added.
I've always liked that (O+4N+P)/6 distributional guess-- it is an assumption of a triangular distribution defined by (O,N,P) and is the centroid along X of the two back-to-back triangles: ((O+2N)/3 +(2N+P)/3)/2.
In your explanation, you don't really make use of the Optimistic, Pessimistic, or Normal times, and it looks like you are wokring it out entirely deterministically based on the Expected times. I'm OK with that, but the O,P, and N times/weights seem complicating in this explanation, and might be better classed with the more complicated handling of resources and equipment attributes and their constraints.
@DaveX #2
Yes, Mark, what about a Monte Carlo simulation?
My experience with using these charts is that all the nice theory breaks down when you add in the inherent non-linearity introduced by the amount of time you spend maintaining the goddamn charts.
@csrster: The theory part is pretty basic, it is the skill in estimating how long it takes to do tasks that sucks. As long as you recognize the inherent G.I.G.O. nature of the planning process, and tha tthe charts will certainly need revision, they do pretty good at helping people communicate who is/will be holding up who.