Thursday, 6 July 2017

What we learn from the learning rate


Cells need to sense their environment in order to survive. For example, some cells measure the concentration of food or the presence of signalling molecules. We are interested in studying the physical limits to sensing with limited resources, to understand the challenges faced by cells and to design synthetic sensors.

We have recently published a paper (arxiv version) where we explore the interpretation of a metric called the learning rate that has been used to measure the quality of a sensor (New J. Phys. 16 103024, Phys. Rev E 93 022116). Our motivation is that in this field a number of metrics (a metric is a number you can calculate from the properties of the sensor that, ideally, tells you how good the sensor is) have been applied to make some statement about the quality of sensing, or limits to sensory performance. For example, a limit of particular interest is the energy required for sensing. However, it is not always clear how to interpret these metrics. We want to find out what the learning rate means. If one sensor has a higher learning rate than another what does that tell you? 

The learning rate is defined as the rate at which changes in the sensor increase the information the sensor has about the signal. The information the sensor has about the signal is how much your uncertainty about the state of the signal is reduced by knowing the state of the sensor (this is known as the mutual information). From this definition, it seems plausible that the learning rate could be a measure of sensing quality, but it is not clear. Our approach is a test to destruction – challenge the learning rate in a variety of circumstances, and try to understand how it behaves and why

To do this we need a framework to model a general signal and sensor system. The signal hops between discrete states and the sensor also hops between discrete states in a way that follows the signal. A simple example is a cell using a surface receptor to detect the concentration of a molecule in its environment.


The figure shows such a system. The circles represent the states and the arrows represent transitions between the states. The signal is the concentration of a molecule in the cell’s environment. It can be in two states; high or low, where high is double the concentration of low. The sensor is a single cell surface receptor, which can be either unbound or bound to a molecule. Therefore, the joint system can be in four different states. The concentration jumps between its states with rates that don’t depend on the state of the sensor. The receptor becomes unbound at a constant rate and is bound at a rate proportional to the molecule concentration. 

We calculated the learning rate for several systems, including the one above, and compared it to the mutual information between the signal and the sensor. We found that in the simplest case, shown in the figure, the learning rate essentially reports the correlation between the sensor and the signal and so it is showing you the same thing as the mutual information. In more complicated systems the learning rate and mutual information show qualitatively different behaviour. This is because the learning rate actually reflects the rate at which the sensor must change in response to the signal, which is not, in general, the equivalent to the strength of correlations between the signal and sensor. Therefore, we do not think that the learning rate is useful as a general metric for the quality of a sensor.

Tuesday, 4 July 2017

Becoming more certain about uncertainty in molecular systems

By Jenny Poulton

Due to the unpredictability of motion at the microscopic scale, molecular processes have randomness associated with them, exhibiting what we call thermodynamic fluctuations. A group in Germany lead by Barato and Seifert have written a series of papers, beginning with "Thermodynamic uncertainty relation for biomolecular processes" (preprint here), exploring how uncertainty in the number of reaction steps taken by a molecular process is related to the degree to which the system is constantly consuming energy.

To be more precise, Barato and Seifert consider the number of times a system completes a cycle in a given time window. A good example of this kind of setup is the rotary motor F0F1-ATPsynthase (below, image taken from Wikipedia).
This motor is used to create the chemical fuel source of the cell (ATP) from its components (ADP and inorganic phosphate P). In order to drive this process, a current of hydrogen ions flows through the top half of the motor, causing it to systematically rotate in one direction with respect to the bottom half. This rotation is physically linked to the reaction ADP + P -> ATP, and so ATP is created. This one-directional rotational motion only arises because the current of hydrogen ions continuously supplies more energy (more technically, free energy) to the system than is needed to create the ATP. We say that the current of ions drives the system.

In general, small driven systems have a bias towards stepping forward, but there is still a non-zero probability of stepping backwards due to thermodynamic fluctuations. We also cannot predict exactly how long the system will take to complete each step of the cycle, and so the time taken per step is variable. Thus the number of cycles completed in a given time is uncertain. It is, however, possible to define an average of the net number of cycles in a time window µ and a variance σ2, which is a mathematical measure of the typical deviation from the average due to fluctuations. The Fano factor F = σ2/µ gives a measure of the relative importance of the random fluctuations about the average.

In the paper "Thermodynamic uncertainty relation for biomolecular processes", Barato and Seifert relate the energy consumption and the Fano factor via F ≤ 2kT /E. Here E is the energy consumed per cycle, T is the temperature and k is Boltzmann’s constant. This expression means that the Fano factor is at least as big as the quantity 2kT /E. Thus a cycle which uses a certain amount of fuel E has an upper limit to its precision, and there is an evident trade-off between the amount of energy dissipated per cycle and the Fano factor.

In the original paper, the authors only prove their relation for very simple processes. However, it has since been generalised in this paper (preprint here). The result is actually based on very deep statements about the types of fluctuating processes that are possible in physical systems. One of the challenges now is to take this fundamental insight and apply it to gain a better understanding of practical systems. Fortunately, the F0F1-ATPsynthase rotary motor is not the only example of an interesting biological system that  undergoes driven cycles; the cell contains a huge variety of molecular motors that can also be understood in this way (preprint here). Molecular timekeepers that are vital to the cellular life cycle also depend on driven cycles. Understanding the trade-offs between unwanted variability and energy consumption will be vital in engineering such systems.

Tuesday, 11 April 2017

Two papers on the fundamental principles of biomolecular copying

Single cells, which are essentially bags of chemicals, can achieve remarkable feats of information processing. Humans have designed computers to perform similar tasks in our everyday world. The question of whether it is possible to emulate cells and use molecular systems to perform complex computational tasks in parallel, at an extremely small scale and consuming a low amount of power, is one that has intrigued many scientists.

In collaboration with the ten Wolde group from AMOLF Amsterdam, we have just published two articles in Physical Review X and Physical Review Letters that get to the heart of this question. 

The readout molecules (orange) act as copies of the binding
state of the receptors (purple), through catalytic
phosphorylation/dephosphorylation reactions.

In the first, “The Thermodynamics of computational copying in biochemical systems”, we show that a simple molecular process occurring inside living cells - a phosphorylation/dephosphorylation cycle - is able to copy the state of one protein (for example, whether a food molecule is bound to it or not) into the chemical modification state of another protein (phosphorylated or not). This copy process can be rigorously related to those performed by conventional computers.
We thus demonstrated that living cells can perform the basic computational operation of copying a single bit of information. Moreover, our analysis revealed that these biochemical computations can occur rapidly and at a low power consumption. The article shows precisely how natural systems relate to and differ from traditional computing architectures, and provides a blueprint for building naturally-inspired synthetic copying  systems that approach the lower limits of power consumption.
The production of a persistent copy from a template.
The separation in the final state is essential.
A more complex natural copy operation is the production of polymer copies from polymer templates, as discussed in this previous post. Such processes are necessary for DNA replication, and also for the production of proteins from DNA templates via intermediate RNA molecules. For cells to function, the data in the original DNA sequence of bases must be faithfully reproduced - each copy therefore involves copying many bits of data. 

In the second article, "Fundamental costs in the production and destruction of persistent polymer copies", we consider such processes. We point out that these polymer copies must be persistent to be functional. In other words, the end result is two physically separate polymers: it would be useless to produce proteins that couldn't detach from their nucleic acid templates. As a result, the underlying principles are very different from the superficially similar process of self-assembly, in which molecules aggregate together according to specific interactions to form a well-defined structure. 

In particular, we show that the need to produce persistent copies implies that more accurate copies necessarily have a higher minimal production cost (in terms of resources consumed) than sloppier copies. This result, which is not true if the copies do not need to physically separate from their templates, sets a bound on the function of minimal self-replicating systems.

Additionally, the  results suggest that polymer copying processes that occur without external intervention (autonomously) must occur far from equilibrium. Being far from equilibrium means that processes are highly irreversible - taking a forwards step is much more likely than taking a backwards step. This finding draws a sharp distinction with self-assembling systems, that typically assemble most accurately when close to equilibrium. This difference may explain why recent years have shown an enormous growth in the successful design of self-assembling molecular systems, but autonomous synthetic systems that produce persistent copies through chemical means have yet to be constructed.
Taken together, these papers set a theoretical background on which to base the design of synthetic molecular systems that achieve computational processes such as copying and information transmission. The next challenge is now to develop experimental systems that exploit these ideas.

Monday, 3 April 2017

Working with the City of London School on an exciting iGEM project

Today I meet with a group of school students (aged 16-18) from the City of London School, who will be working on a project for iGEM this year. iGEM is an international competition for school, undergrad and postgrad teams to design, model and build complex systems by engineering cells. Last year, Imperial won the overall prize, as discussed in this post by Ismael. 

Without giving too much away, the students will be working on a system based on a newly-developed molecular device, the toehold switch. Toehold switches are RNA molecules that contain the information required to produce proteins. This information is hidden via interactions within the RNA, which cause it to fold up into a shape that prevents the sequence from being accessed. If, however, a second strand of RNA with the right sequence is present, the structure can be opened up and protein production is possible.

This idea has been around for a reasonable while, but toehold switches are particularly useful, because they provide a better decoupling of the input, output and internal operation of the switch than previous designs. This is the principal of modularity that underlies the work of many of my colleagues here at Imperial, and allows for systematic engineering of molecular systems. This modularity is key to the proposed project.

I've been giving the students advice on how to model the operation of a toehold switch, in order that they can explore the design space before getting into the lab.


Wednesday, 11 January 2017

A simple biomolecular machine for exploiting information



Biological systems at many scales exploit information to extract energy from their environment. In chemotaxis, single-celled organisms use the location of food molecules to navigate their way to more food; humans use the fact that food is typically found in the cafeteria. Although the general idea is clear, the fundamental physical connection between information and energy is not yet well-understood. In particular, whilst energy is inherently physical, information appears to be an abstract concept, and relating the two consistently is challenging. To overcome this problem, we have designed two microscopic machines that can be assembled out of naturally-occurring biological molecules and exploit information in the environment to charge a chemical battery. The work has just been published as an Editor's selection in Physical Review Letters: http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.118.028101

The basic idea behind the machines is simple, and makes use of pre-existing biology. We employ an enzyme that can take a small phosphate group (one phosphorus and several oxygen atoms bound together) from one molecule and attach it to another – a process known as phosphorylation. Phosphorylation is the principal signaling mechanism within a cell, as enzymes called kinases use phosphyrlation to activate other proteins. In addition to signalling, phosphates are one of the cell’s main stores of energy; chains of phosphate bonds in ATP (the cell’s fuel molecule) act as batteries. By ‘recharging’ ATP through phosphorylation, we store energy in a useful format; this is effectively what mitochondria do via a long series of biochemical reactions.





Fig 1.: The ATP molecule (top) and ADP molecule (bottom). Adenosine (the "A") is the group of atoms on the right of the pictures; the phosphates (the P) are the basic units that form the chains on the left. In ADP (Adenosinediphosphate) there are two phosphates in the chain; in ATP((Adenosinetriphosphate) there are three. 


The machines we consider have three main components: the enzyme, the ‘food’ molecule that acts as a source of phosphates to charge ATP, and an activator for the enzyme, all of which are sitting in a solution of ATP and its dephosphorylated form ADP. Food molecules can either be charged (i.e. have a phosphate attached) or uncharged (without phosphate). When the enzyme is bound to an activator, it allows transfer of a phosphate from a charged food molecule to an ADP, resulting in an uncharged food molecule and ATP. The reverse reaction is also possible.

In order to systematically store energy in ATP, we want to activate the enzyme when a charged food molecule is nearby. This is possible if we have an excess of charged food molecules, or if charged food molecules are usually located near activators. In the second case, we're making use of information: the presence of an activator is informative about the possible presence of a charged food molecule. This is a very simple analogue of the way that cells and humans use information as outlined above. Indeed, mathematically, the 'mutual information' between the food and activator molecules is simply how well the presence of an activator indicates the presence of a charged food molecule. This mutual information  acts as an additional power supply that we can use to charge our ATP-batteries. We analyse the behaviour of our machines in environments containing information, and find that they can indeed exploit this information, or expend chemical energy in order to generate more information. By using well-known and simple components in our device, we are able to demystify much of the confusion over the connection between abstract information and physical energy.

A nice feature of our designs is that they are completely free-running, or autonomous. Like living systems, they can operate without any external manipulation, happily converting between chemical energy and information on its own. There’s still a lot to do on this subject; we have only analysed the simplest kind of information structure possible and have yet to look at more complex spatial or temporal correlations. In addition, our system doesn’t learn, but relies on ‘hard-coded’ knowledge about the relation between food and activators. It would be very interesting to see how machines that can learn and harness more complex correlation structures would behave.

Authored by Tom McGrath

Wednesday, 16 November 2016

Congratulatory post: Hail to the Imperial 2016 iGEM team!
By Ismael Mullor-Ruiz

With a bit of delay, we as a team would like to join in the congratulations for our colleagues and collaborators from the Imperial 2016 iGEM team, who triumphed at the iGEM 2016 Giant Jamboree at MIT.

For those who aren’t familiar with it, iGEM (acronym for “International Genetic Engineered Machine”) is the world’s largest synthetic biology contest. It was started 12 years ago at MIT as a summer side-project in which undergrad teams designed synthetic gene circuits never seen before in nature, built them and tested each of the parts. Many of these parts have subsequently pushed forward the field of synthetic biology. Even though it began as an undergrad-level competition with only a handful of teams involved, the competition grew larger and larger to include not only undergrad teams, but also postgrad teams, high school teams and even enterprises.  More than 200 teams from all around the globe that took part on the last edition.

Traditionally, synthetic biology involves tinkering with a single cell type (eg. E. coli) so that it performs some useful function – perhaps outputting an industrially or medically useful molecule. This tinkering involves altering the molecular circuitry of the cell by adding new instructions (in the form of DNA) that result in the cell producing new proteins/RNA that perform the new functions. The focus of this year’s project from the Imperial team was on the engineering of synthetic microbial ecosystems of multiple cell types (known as “cocultures”) rather than a single organism, since more complex capabilities can be derived from multiple cell types working together.

So they began by characterizing the growing conditions of six different “chassis” organisms and creating a database called ALICE. The challenge here resides in the fact that the different organisms had different growing conditions and thus maintaining a steady proportion is really hard to achieve; typically one of the populations ends up taking over in any given set of conditions. Thus, in order to allow self-tuning of the growth of the cocultures, they designed a system consisting of three biochemical modules:

1) A module that allows communication between the populations through a “quorum sensing” mechanism. Population densities of each species are communicated via chemical messengers that are produced within the cells, released and diffuse through the coculture.  Each cell type produces a unique messenger, and the overall concentration of this messenger indicates the proportion of those cells in the coculture.

2) A comparison module that enables a cell to compare the concentration of each chemical messenger. The chemical messengers were designed to trigger the production of short RNA strands in each cell; RNA strands triggered by different messengers bind to and neutralize each other. If there is an excess of the cell’s own species in the coculture, some of the RNA triggered by its own chemical messenger will not be neutralized, and can go on to influence cell behaviour.

3) An effector module. The RNA triggered in response to an excess of the cell’s own species is called “STAR”. It can bind to something known as a riboswitch (see figure below); when it is present, the cell produces a protein that suppresses its own growth. Cells therefore respond to an excess of their own population by reducing their own growth rate, allowing others to catch up. The approach of using a riboswitch for cell division control presents several advantages as its ease to design and to port at any cell type, and involves a reduced burden on the cell compared to other mechanisms.



Figure 1: Action of STAR in opening the hairpin of a riboswitch. Without STAR, the riboswitch interferes production of certain genes; STAR stops this interference so that the genes are produced.

As a demonstration of the concept, the students implemented this control system in different coloured strains of bacteria in order to create different pigments (analogous to the Pantone colour standard) through the coculture and combination of the strains. The approach is very generic, however, and as the team mention on their wiki, the possibilities of cocultures go way beyond this!

If you want to know more about the project, you can check out the team’s wiki:


Thursday, 6 October 2016

Replication, Replication, Replication I

This post and the one below it are linked. Here, I discuss a topic that interests us as a group, and below I look at some recent related papers. This post should make reasonable sense in isolation, the second perhaps less so.

Replication is at the heart of biology; whole organisms, cells and molecules all produce copies of themselves. Understanding natural self-replicating systems, and designing our own artificial analogues, is an obvious goal for scientists - many of whom share dreams of explaining the origin of life, or creating new, synthetic living systems.

Molecular-level replication is a natural place to start, since it is (in principle) the simplest, and also a necessary component of larger-scale self-replicating systems. The most obvious example in nature is the copying of DNA; prior to cell division, a single copy of the entire sequence of base pairs in the genome must be produced. But the processes of transcription (in which the information in DNA sequence is copied into an RNA sequence) and translation (in which the information in RNA sequence is copied into protein sequence) are closely related to replication. The information initially present in the DNA sequence is simply written out in a new medium, like printing off a copy of an electronic document. This process is illustrated in the figure above (which I stole from here). This figure nicely emphasies the polymer sequences (shown as letters) that are being copied into a new medium (note: three RNA bases get copied into one amino acid in a protein: AUG into M, for example). An absolutely fundamental feature of both replication and copying processes is that the copy, once produced, is physically separated from the template from which it was produced. This is important, otherwise the copies couldn't fulfill their function, and more copies could not be made from the same template.

This single fact - that useful copies must separate from their template yet retain the copied information - makes the whole engineering challenge far harder. It's (reasonably) straight-forward to design a complex (bio)chemical system that assembles on top of a template, guided by that template. All you need are sufficiently selective attractive interactions between copy components and the template. But if you then want to separate your copy from the template, these very same attractive interactions work against you, holding the copy in place - and more accurate copies hold on to the template more tightly. My collaborators and I formalise this idea, and explore some of the other consequences of needing to separate copies from templates, in this recent paper.

Largely because of this problem, no-one has yet constructed a purely chemically driven, artificial system that produces copies of long polymers, as nature does. Instead, it has proved necessary to perform external operations such as successively heating and cooling the system. Copies can then grow on the template at low temperature, and then fall off at high temperature, allowing a new copy to be made when the system is cooled down. This is exactly what is done in the PCR, an incredibly important process for amplifying a small amount of DNA in areas ranging from forensics to medicine.

As a group, we're very interested in how copying/replication can be achieved without this external intervention. Two recent papers, discussed in the blog entry below, highlight the questions at hand.