About Ingeneue:

Background: motivations for developing a gene network simulation program

Several years ago we began work on a model of the segment polarity gene network, which has been very well characterized in fruit flies and appears to be conserved in its basic architecture and function throughout the insects and probably much further. There are two aspects of interest in the context of network modeling: first, despite the apparent conservation of the segment polarity genes and their function among the insects, the upstream segment specification process differs radically among the insects. There is every reason to believe that the genes providing the initial inputs to the segment polarity network are completely different in long-germ insects like flies and short-germ insects like grasshoppers. This implies that the segment polarity network is a developmental module unto itself, capable of accepting any of several suitable inputs and then self-organizing its stable pattern of expression. The second interesting aspect is that the segment polarity network, or pieces thereof, is "re-deployed" in many different contexts during fruit fly development (and indeed during many stages of vertebrate development as well). For instance, the segment polarity genes are intimately involved in maintenance of the anterior-posterior compartment boundary in fly imaginal discs. They are also involved in the morphogenesis of the eye rudiment, where, instead of establishing a stable regime of patterned gene expression, they are expressed in a traveling wave that moves across the eye rudiment, triggering differentiation of photoreceptor groups as it goes. Most dramatically, at least some of these genes are involved in the specification of the eyespot in butterflies; since the eyespot is an evolutionary novelty, then it seems inescapable that this represents an instance of re-deployment of a gene network.

spgThe basic questions we sought to address were: what makes this gene network modular? What properties make it re-usable? How does the evolutionary process adapt the same network of genes for different pattern formation uses? An important background question for us has been to ask whether biologists have enough information about the segment polarity network to explain any of the developmental phenomena that it is involved in. We therefore set out to make our model of the segment polarity network, and we took what we call a reconstitution approach. A biochemist attempting to reconstitute a biochemical pathway in vitro would add, one by one, purified components to a reaction system until she achieved the desired process. So too, we add known facts to a computer simulation of a chemical reaction system until it replicates in simulo some aspect of real, biological behavior. We emphasize that the basic questions and approach are general; instead of the segment polarity network we might just as well be talking about the MAP kinase signal transduction cascade, the cell cycle oscillator, phage lambda, or the citric acid cycle. The approach of simulating a chemical reaction network as a dynamical system is applicable to all such cases.

We therefore began work on a custom gene network simulation library (Ingeneue) designed to do away with all these drawbacks. We chose the programming language Java to develop this package. Java provides several advantages over these more traditional scientific programming languages. First, Java code can be compiled on one kind of computer and run without modification on another, saving a substantial amount of programmer effort. Second, unlike C++, Java was designed from the ground up as an object-oriented language, and Java lacks most, if not all, of the common pitfalls that make C++ a difficult language. Third, Java allows run-time modification of the running code and dynamic loading of code modules (objects). This means the simulation can be arbitrarily changed while the program is running, rather than having to stop and re-link the program to accomplish each change, thus removing a major hurdle for non-technical users of a simulation library. Finally, Java was designed with networking and distributed processing in mind. While we have yet to make use fully of this important feature, we expect to do so in the future because the intense computational demand of larger models will make it difficult to accomplish much on a single computer.

Design and Implementation of Ingeneue

Ingeneue is presently a working prototype that we use as a research tool. The core of the package is a set of objects that work together. "Cells" are compartmental containers for "Nodes" (mRNAs, proteins, and the like - reactants, basically). Each Node type has its own group of Affectors; an Affector object usually constitutes a single additive term in the differential equation that specifies the rate of change in a Node's concentration in each cell (or on each face of the cell in the case of membrane-bound Nodes) over time. We have an ever-growing library of Affectors (for example, the most commonly used Affector is the one which encapsulates the term for first-order decay) and it is a trivial programming task to write new Affector types. Nodes are grouped into a Network object, and each Cell in a user-specified grid contains its own copy of the entire Network. The user specifies initial concentrations, generally from a library of InitialCondition objects that make describing a pattern straightforward, and other parameters and conditions. A ModelRunner object conducts the simulation by marching a numerical integrator along timesteps; each Node uses its list of Affectors to compute its derivative as required by the particular integration scheme in use. A complex StoppingCondition object, assembled from a library of simpler StoppingConditions, generally monitors the course of the integration. StoppingConditions encapsulate some pattern element or dynamical property (reaching a stable equilibrium, for instance) and a recipe for both assigning a score for how well the present state matches the desired pattern and for stopping the integration run (either because the model is hopelessly far from desired behavior or because it has achieved it so well that further computation is superfluous). Presently, by default Nodes use a standard-issue adaptive step-sizing embedded Runga-Kutta scheme to conduct the integration. While not fancy, this scheme is a workhorse integration method that in our experience seems quite robust. We have recently developed a customized integration strategy that exploits stereotyped features of the equations used by Ingeneue. This strategy is based on classical predictor-corrector methods but can sometimes achieve much higher efficiency, especially for complex models.

To summarize, a library of InitialCondition objects makes it easy to specify a starting state, a library of Affectors makes it easy to specify the differential equations, and a library of StoppingConditions makes it easy to specify the desired behavior. This core is dressed with various utility objects that handle tasks like the parsing of a text file describing the model, the display of model state on screen, and so on. This is sufficient if the user knows all the parameters (rate constants, half-lives, etc.) necessary to constrain the model, as is the case for certain well-studied biochemical processes such as the life cycle of lambda phage. However, it is generally not the case. More often, none of the parameters governing the biochemistry of epigenetic interactions has been measured. This is the case for most well-studied developmental mechanisms, including all those we have worked on models of. This requires a rephrasing of the basic questions involved. Instead of asking if the known facts account for a particular phenomenon, one can ask only if there exists any set of biologically-plausible parameters that allow the known facts to account for real behavior. Of course, even if the answer to that question is yes, that does not in any way suggest that indeed the real values of those parameters in fact correspond.

Because of this problem the Ingeneue core is wrapped in a layer of Iterator objects. Iterators encapsulate some algorithm for searching the space of possible parameter values for sets of parameter choices that lead to desired behavior. The simplest possible scheme, of course, is random choice, and there is indeed an Iterator that implements a random search. However, the parameter space is often very very large; the segment polarity model we have been working with includes, in its simplest form, roughly fifty free parameters, each of which could realistically range over three or four orders of magnitude. To get a feel for how difficult this space would be to exhaustively search, we once estimated that it would take 10 quadrillion years with our lab's existing computational capacity to exhaustively sample the parameter space merely for high, medium, and low values of each parameter! This is obviously a problem that mere brute force and more computers cannot hope to overcome. Thus, most of the Iterator objects we have implemented encapsulate some sort of directed search strategy, including non-linear optimizers, population-selection schemes, and so on. We've included both well-established algorithms and custom-tailored approaches of our own.