Friday, 15 June 2018

Multidimensional biochemical information processing of dynamical patterns


Biological cells can send signals to each other by emitting signalling molecules. Cells can receive the signals by measuring the concentration of the signalling molecules using surface receptors. If the concentration does not change in time then this is just a single number, which does not transmit much information. Compare this with sending electrical signals down wires; if the voltage is maintained at a constant value, then very little information is sent. Instead, the voltage is changed very quickly and the information is encoded in the time varying pattern. i.e. you can transmit a series of bits by turning the voltage on and off. In a similar way, cells can transmit messages in the dynamical concentration of signalling molecules.

A recent paper ‘Multidimensional biochemical information processing of dynamical patterns’ by Yoshihiko Hasegawa from the University of Tokyo examines how a simple model of cell sensing can extract the most information from a time varying concentration. 

The model assumes that the cell sees is a combination of two separate signals encoded in the same molecule, one signal with a fast variation and one with a slow one (see figure). The challenge for the cell is to work out the strength of each individual component from the combined signal that it sees. The better it does, the more information it has. This challenge is complicated by the fact that the cell's readout is not 100% accurate. 

The signal is made by summing a fast signal and a slow signal

The author assumes that the cell has two decoding channels (cell-surface receptors with an associated downstream network) by which to process the signal. They find that when the cell's readout is accurate, the information is maximised when each decoder independently detects one component each. By contrast, when the cell's readout is inaccurate, it is better to have two identical decoders that are simply giving independent estimates of the overall signal strength, without trying to tease out the individual components.


The author then shows how to construct chemical reaction networks that implement the optimal decoders. They find that a possible network for the decoder is a large cascade network with additional feedback and feed-forward loops, which has been found in real biological signalling networks. They also show that the number of molecular species in these cascades can be reduced and the response function is still approximately correct.

 

Monday, 4 June 2018

Probabilistic switching circuits in DNA.

As seen in previous entries in this blog, DNA reaction networks are a substrate that enables the creation of computational systems. These reaction networks have been a research topic for quite a long time and they have their own set of advantages and operational challenges. One of the present challenges is that up to the present day, all biomolecular computing architectures have been conceived as digital deterministic ones, while actual molecular signals are analog in concentration and involve stochastic (or random) reaction events. These two signal characteristics, when controlled in space and time, allow for the emergency of complex phenomena (such as patterning or morphogenesis). Besides this, it is considered that the integration of the analog/stochastic architectures would expand the capabilities of DNA computing.

Qian’s group present the first step towards the integration of stochastic/analog systems in DNA circuits. The implementation is done through irreversible strand displacement reactions in which a given input can bind to 2 different gate molecules with a probability that depends on the gate concentration. The reaction with one gate or the other leads to two different outputs. This architecture, implemented in switches and signal splitters, works stochastically in the single molecule level. But, when observed in bulk, the stochastic elements in the circuit allow the transformation of a digital signal of n bits into an analog signal able to take 2n values.




Figure 1. Example of a circuit with 3 stochastic elements (2 splitters and 1 gate), and the different analog signal values that it can output.

This architecture has been proved to be able to allow the implementation of feedback loops that allow adjustment of the intensity of the output signal. Despite some limitations (in larger circuits, fine tuning of the probability becomes harder thus requiring correction of the initial input signal), the proposed probabilistic circuits are a bridge that allows the interconnection of digital and analog computational circuits.

References:
Wilhelm, D., Bruck, J., & Qian, L. (2018). Probabilistic switching circuits in DNA. Proceedings of the National Academy of Sciences, 201715926.

Thursday, 24 May 2018

Paper Summary: Thermodynamics with continuous information flow

Paper Summary:
Thermodynamics with continuous information flow (2014) J M Horowitz and M Esposito

This paper discusses a way of quantifying the flows of information in the same way as flows of energy. Consider two random processes which hop between discrete states. Two independent systems of this type can be coupled together in a way so that the state of the first system, x, affects the transition propensities of the second system, y, and vice-versa. At any given instant, either or y could change, but crucially not both. This property defines the system as bipartite. The diagram below (taken from the paper) illustrates the creation of a bipartite system from and y.


Examples of systems of this type could include a ligand bound to a receptor which catalyses the creation of an activated complex. It is not possible for a ligand to bind to a receptor, and an activated complex to be created in a single step: they have to happen sequentially. However it is true that whether or not a ligand is bound to the receptor affects the rate with which activated complexes are formed.


If a system is bipartite all properties of the system to be decomposed into an x-subsystem component and the y-subsystem component, separating out the contributions from transitions involving the two subsystems. There are many useful properties of a system such as this, and for a rigorous thermodynamic analysis, a key one is the entropy. The second law of thermodynamics states that the entropy of the universe must increase with time. However this is only for the universe as a whole. On a more microscopic level it is possible for some parts of the system to decrease entropy, as long as other parts make up the shortfall. Traditionally we break the change in entropy down into the change in entropy of the system itself and the change of entropy in the environment due to the system. If we further decompose into separate contributions for our two subsystems an extra term appears for each,  quantifying the information flow around the system.



To the x component a term is added which takes into account how much information x learns about y during a single step. Similarly, to the y component a term is added which takes into account how much information y learns about x during a step. Splitting the entropy in this way gives a sense of how the information moves around the system; it flows from x->y or from y->x. Incorporating the information term allows us to apply the second law individually to each subsystem, rather than only at the level of the whole system. It also allows us to understand how a decrease in entropy in one subsystem can be caused by information flowing from one subsystem to the other, in a manner that is consistent with the second law. 

Tuesday, 30 January 2018

Recent Developments in DNA Origami

Javier Cabello – PhD student

Using both technique and imagination, origami allows the production of many shapes with only a folded sheet of paper. After the birth of DNA nanotechnology, the potential of this folding art inspired Paul Rothermund to develop a technique that allows the creation of complex nanoscale objects out of DNA. A single DNA strand is a flexible material that can be folded by hundreds of smaller strands (“staples”). These staples bind to specific non-consecutive regions of the larger strand (“template”) by Watson-Crick base-pairing, folding the template into a DNA structure with a particular shape.

DNA origami structures have a wide variety of potential uses that greatly differ from the original biological role of the molecule. These range from drug delivery to the production of electric circuits, with the biocompatible nature of DNA suggesting application in living organisms. However, the structures created are still limited by the size of the template, which becomes harder to fold when longer. Another problem with scaling up the technology is the cost of synthesising large numbers of staple strands.

Last November, four publications in Nature focused on overcoming the limitations of DNA origami. Three of the publications proposed new approaches for the assembly of bigger DNA structures. All three involved combining blocks of different sizes and complexity to form bigger structures.

Gigadalton-scale shape-programmable DNA assemblies
The first approach, suggested by Wagenbauer, Sigl and Dietz, involves making origami structures that can then self-polymerize. The resultant structures depend on the geometry of the individual origami building blocks. This approach, heavily inspired by viral capsid assembly, was used to form DNA rings from V-shaped DNA components. Afterwards, these rings were assembled to form DNA tubes. The method was generalised to produce different figures, as dodecahedra, by varying V-block angles and adding blocks with more than two binding sites.

Fractal assembly of micrometre-scale DNA origami arrays with arbitrary patterns
Tikhomirov, Petersen and Qian’s work revolves around the design of a robust algorithm to produce ordered assembly of complex patterns in 2D. In this work, individual origami tiles were first folded in isolation. Tiles were then combined in groups of four to produce larger “super-tiles”. Successively, four of these assembled larger tiles were employed to produce even bigger tiles. To achieve the necessary control of this process, the authors used three basic design rules, which they incorporated into a computational tool that they named the “FracTile Compiler”. This compiler allows the systematic design of complex, puzzle-like patterns. The resultant array can show an arbitrary picture, displayed by unique modifications to the surface of each tile, as they do with “La Gioconda” (ed.: the Mona Lisa for English readers) among others (see fig. 1).

Fig. 1 Complex shapes created using the method of Tikhomorov et al., reproduced with permission from Springer Nature, Fractal assembly of micrometre-scale DNA origami arrays with arbitrary patterns, G Tikhomirov, P Petersen, L Qian, Nature 552 (7683), 67.


Programmable self-assembly of three-dimensional nanostructures from 10,000 unique components
In this case, Ong et al improved a previously published variant related to DNA origami: DNA bricks. Instead of using large DNA strands as in previous approaches, the authors design structures built from smaller building blocks, of only 56 nucleotides each. Each of these bricks can bind to three others, forming a dense “lego”-like structure. By using 10,000 different sequences for these DNA bricks, the authors achieve the production of cubes of more than 1 GDa (3000 tunes the mass of the largest proteins). These cubes were proposed to serve as a canvas to produce different structures by simply removing bricks from the design, like chiselling a rock. This approach even allows the production of negative structures by carving cavities inside the DNA cube. To help in the design task, the authors created the software “NanoBricks” and proved its utility by producing different patterns. such as a helix or a teddy bear inside the DNA cube.

Biotechnological mass production of DNA origami
Despite these significant improvements in the production of larger DNA origami structures, all their possible applications are being hindered by the pecuniary costs involved in synthesising the component strands. However, new methods inspired by the current biotechnological industry are being developed to mass produce DNA origami designs.

Praetorius et al. have demonstrated the possibility of biotechnological mass production of a DNA origami design. Their approach involves encoding all the components necessary for the folding of an origami (ie. template and staples) in a single genetic sequence, and incorporating it into bacteria. Of course, this isn’t all. Between the different components they added DNA sequences with enzymatic activity: DNAzymes. The whole DNA strand is then produced in a culture of bacteria, and the DNAzymes, in the presence of zinc, cut themselves out of the sequence, separating it into the desired components. The result is that the staples and template, now separate, can assemble themselves to form the intended DNA origami structure.

This new approach, if escalated, would drastically reduce the production costs of DNA origami designs. However, the production of the initial genetic material would still require an expensive synthesis of the desired sequences for each new design produced. In order to make this solution even more affordable the produced DNA origami should be a very general or standardized one which can be customized with minor changes. Who knows if the answer lies in any of the previous papers? My humble guess is that DNA bricks are good candidates.

Read more:
- Rothemund, P. (2006). Folding DNA to create nanoscale shapes and patterns. Nature, 440(7082), pp.297-302. Link: https://www.nature.com/articles/nature04586
- Wagenbauer, K., Sigl, C. and Dietz, H. (2017). Gigadalton-scale shape-programmable DNA assemblies. Nature, 552(7683), pp.78-83. Link: https://www.nature.com/articles/nature24651
- Tikhomirov, G., Petersen, P. and Qian, L. (2017). Fractal assembly of micrometre-scale DNA origami arrays with arbitrary patterns. Nature, 552(7683), pp.67-71. Link: https://www.nature.com/articles/nature24655
Ong, L., Hanikel, N., Yaghi, O., Grun, C., Strauss, M., Bron, P., Lai-Kee-Him, J., Schueder, F., Wang, B., Wang, P., Kishi, J., Myhrvold, C., Zhu, A., Jungmann, R., Bellot, G., Ke, Y. and Yin, P. (2017). Programmable self-assembly of three-dimensional nanostructures from 10,000 unique components. Nature, 552(7683), pp.72-77. Link: https://www.nature.com/articles/nature24648
- Praetorius, F., Kick, B., Behler, K., Honemann, M., Weuster-Botz, D. and Dietz, H. (2017). Biotechnological mass production of DNA origami. Nature, 552(7683). pp.84-87. Link: https://www.nature.com/articles/nature24650

Tuesday, 2 January 2018

Why biomolecular computing could pose a cybersecurity threat, why it does not, and why it is still an interesting question.

By Ismael Mullor Ruiz

There is not much needed to construct a computer. We are not referring to the hardware here, but the basic operations that the computer must be able to implement. All possible computer configurations available are to some extent a physical approximation to the configuration envisioned by Alan Turing in his seminal paper in 1936 [1]: the Turing Machine. Turing imagined an infinitely long paper tape containing a series of sites in which two different states could be encoded (the minimum information unit, what would later be called a bit). In addition, he considered a “head” that could go through the tape performing different operations such as reading the information encoded, copying it and writing over the tape (the basic algorithmic operations) following defined rules. Such a device could, in principle, perform all possible computational tasks (it is “Turing complete”).

As said before, this setup is only a theoretical model, but in order to get a physical approximation that allows us to have a computer, we only need a physical system that can emulate this behaviour. The requirements are actually quite loose and allow the implementation of Turing complete machines in a wide array of phenomena, including the well-known electronic devices being used to read this post, but also other non-conventional configurations like biomolecular computing, membrane computing or optical computing.

It must be noted that for any reader with deep knowledge of molecular biology, the resemblance between the theoretical Turing machine and life’s basic molecular machinery is striking. For example, consider the presence of an information coding substrate – nucleic acids in biology - on which different machines – such as enzymes, ribozymes or some oligonucleotides - can operate. These natural devices can read the stored information, copy it into different formats, and overwrite or remove information given some rules.


Similarities between theoretical and biomolecular Turing machines. Adapted from Saphiro and Benenson (2006) [2].

Despite these similarities, it took almost 30 years from the publication of the central dogma of molecular biology, which outlines how the information in DNA is used by the cell, and almost 20 years from the birth of genetic engineering, to actually exploit biomolecules for computation. The birth of biomolecular computing came in 1994 with the foundational paper by Leonard Adelman in which he used DNA to solve a classic computational problem characterized by its difficulty: the Hamiltonian path problem [3]. In this problem, a network of n nodes is defined with the particularity that only certain transitions between the nodes of the network are allowed. If there exists a path in the network that allows you to go through all the nodes of the network staying only once in each node, this path is said to be Hamiltonian. During the years since, the field has grown beyond the architecture designed and developed by Adelman giving birth to a range of underlying biomolecular methods for computation [4].



(Left):Directed graph of the Hamiltonian path problem from Adelman's seminal paper [4]. (Right): DNA species implementation of the nodes and transitions for the graph.  Adapted from Parker (2003) [5].

These methods, often based on DNA, have an extraordinarily high density of information, dwarfing all the current optical/magnetic devices, as well as a very low energy consumption when compared to conventional computers and a capacity to perform astronomical numbers of operations in parallel. This means that that while in conventional computing architectures, all operations are executed sequentially doing only once at a given instant, in a biomolecular computer every single molecule in dilution is executing operations and making attempts to find a solution to the given problem at the same time.

These architectural features make DNA computers potentially efficient for crunching the solution of a set of computational challenges known as “NP problems”. For NP problems, the time for finding a solution grows exponentially with regards to the size of the problem and therefore can become impractical to solve with a conventional computer. The difficulty of solving NP problems is at the core of the security encryption protocols that are used for example in digital transactions or cryptocurrency mining. It is not hard to think that the existence of a tool that might allow those problems to be solved easily would pose a serious security breach. But despite the fact DNA computers have been designed to break cryptography protocols [6,7], they are not currently a serious threat due to a range of practical physical issues:

-The first problem comes from the fact that the information density calculations frequently cited assume essentially an essentially solid density of DNA – but such conditions are impractical for computation with any biomolecular architecture.

-Another issue with the use of massively parallel molecular computation is that only a small number of “correct” solutions may be produced in a test tube with astronomical quantities of unwanted molecules. Recovering and reading this output is an enormous challenge as it requires specifically picking these needles from the haystack using molecular interactions only.

-Related to the previous issue is the question of how to make the output readable and useful. The molecular interactions used to separate the output and read it never are completely specific. This results first in an imperfect separation of pools of molecules as well as important chance of reading as correct an incorrect output. This misreading problem becomes more prevalent when differences between correct and incorrect outputs become smaller.

-Another substantial issue to face is that most DNA-computing architectures are not reprogrammable and once the program has been run it cannot be run again.  Thus one must design new DNA strands and order those DNA strands from a supplier for a new instance of a problem, making the process of iterating an algorithm for a new set of inputs a time- and resource-consuming one. Some recent work in out-of-equilibrium chemical reaction networks aims to introduce reprogammability [8], but still is very far away from the versatility found on any personal computer.

 -The last issue that makes impractical the application of biomolecular computers for breaking cryptography protocols lies in the fact that despite algorithmic operations are easy to implement in biomolecular systems, translating that algorithmic operations into arithmetic operations is hard by itself (as proved by Winfree and Qian with their paper encoding an algorithm for calculating the square root of an integer using a 130 strands reaction network [9]).


Despite all these issues, valuable lessons can be taken. The analogy with the theoretical Turing machine underlies every conceivable computer to some extent or another, but the physical characteristics of the substrate in which the Turing machine is implemented affects deeply the efficiency of any program the machine is given. As we have seen, analogies between life’s molecular machinery and a Turing machine can be drawn and are significant. But the similarities that can be mapped between life molecular machinery and network computing are even more relevant [10,11]. Life does indeed perform certain computational tasks and it is incredibly efficient at doing so when compared to man-made computers. These tasks include inference, learning and signal processing and these remarkable capability are observed at all evolutionary levels. It is not hard to conclude that a deeper understanding of the underlying biophysics of genuine biomolecular computations would yield relevant insights improving the design and capabilities of programmable biomolecular systems that for their very own nature can directly interact with living systems and have applications in fields like synthetic biology or bio-medicine.

References
1. Turing, A. M. (1937). On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London mathematical society, 2(1), 230-265.

2. Shapiro, E., & Benenson, Y. (2006). Bringing DNA computers to life. Scientific American, 294(5), 44-51.

3. Adleman, L. M. (1994). Molecular computation of solutions to combinatorial problems. Science. 266 (5187), 1021–1024

4. Benenson, Y. (2012). Biomolecular computing systems: principles, progress and potential. Nature Reviews Genetics13(7), 455-468.

5. Parker, J. (2003). Computing with DNA. EMBO reports, 4(1), 7-10.

6. Boneh, D., Dunworth, C., & Lipton, R. J. (1995). Breaking DES using a molecular computer. DNA based computers27, 37-66.

7. Chang, W. L., Guo, M., & Ho, M. H. (2005). Fast parallel molecular algorithms for DNA-based computation: factoring integers. IEEE Transactions on Nanobioscience4(2), 149-163.

8. Del Rosso, N. V., Hews, S., Spector, L., & Derr, N. D. (2017). A Molecular Circuit Regenerator to Implement Iterative Strand Displacement Operations. Angewandte Chemie International edition129(16), 4514-4517.

9. Qian, L., & Winfree, E. (2011). Scaling up digital circuit computation with DNA strand displacement cascades. Science332(6034), 1196-1201.

10. Bray, D. (1995). Protein molecules as computational elements in living cells. Nature, 376(6538), 307-312.

11. Kim, J., Hopfield, J., & Winfree, E. (2005). Neural network computation by in vitro transcriptional circuits. In Advances in neural information processing systems (pp. 681-688).


Friday, 15 September 2017

We're famous! (ish) Tom's been answering questions on the "Naked Scientists" podcast

The Naked Scientists are based at Cambridge University's Institute of Continuing Education (ICE). In their own words, they are a team of scientists, doctors and communicators whose passion is to help the general public to understand and engage with the worlds of science, technology and medicine.

They recently contacted me to provide an answer to the following question:

"Water is solid below 0 degrees C, a liquid from 0 to 100 degrees C, and a gas above 100 degrees C. Then why does washing and the roads dry when the temperature isn't 100 degrees C?"

I think the answer's actually quite subtle, for such a deceptively simple-sounding question. My recorded answer can be downloaded from here. Full transcript below!

Interviewer: I asked Thomas Ouldridge from Imperial College London to hang Norm’s question out to dry
Tom: It is true that pure water will be a gas (called water vapour) only above 100 degrees Celsius, but temperature isn’t the only factor at play here. 
The surrounding pressure also impacts when a substance like water can be a gas – the higher the pressure, the higher the temperature required to be a gas. You've probably all heard of LPG - this stands for liquified petroleum gas. The chemicals in LPG would normally be a gas, but by keeping LPG in high-pressure cannisters it can be stored  as a liquid.
So gases, like [insert humorous reference here], often can't handle pressure. But why is this? 
To exist as a gas, water molecules have to be widely spaced out. High enough pressure will simply squeeze them back into their more compact liquid form. 
The more you heat the water, the more energy you give to the molecules and the harder they push back against their surroundings. Above 100 degrees Celsius, but not below, water molecules can push back hard enough against the pressure of the atmosphere for pure water to stay as a gas.
Well, personally I tend to hang my washing inside the earth’s atmosphere, and below boiling point, and it still dries. So what’s going on?
Well, let’s consider a puddle, for example. You might think it should stay as a liquid because the temperature is below 100 degrees, and the atmosphere is pushing down on it. However, 
we don't only have water molecules involved - the system isn't pure. The air above the surface of the puddle contains many other molecules, such as nitrogen and oxygen. 
These extra molecules can actually help to push back against the surrounding atmosphere, effectively lowering the pressure that must be supported by the water molecules if they form a gas. It’s like many people helping to lift a weight rather than just one. 
In fact, there’s so much more nitrogen and oxygen that they take almost all of the burden of the atmospheric pressure. 
This is important: any water molecules that have enough energy to escape from the puddle don’t face the full might of the atmospheric pressure – so they don’t immediately liquidise. 
This is why some water vapour can survive in the atmosphere, thanks to the hard work of the other gasses, and we can explain why evaporation happens and puddles (or clothes) dry under normal conditions.
Only a certain amount of water vapour can be supported by the other gasses though, which is why things don’t evaporate immediately, and why movement of air is important if you want things to dry faster. 
If you want to see this in action for yourself, lick your wrist and blow on it. It dries almost immediately compared to if you don’t blow.  So that’s evaporation covered, but how is that different from boiling? 
When water reaches 100 degrees at atmospheric pressure, it has so much energy that the vapour no longer needs the help of other gases to be stable. In our analogy, the water vapour is now like a weightlifter that is strong enough to support the “weight” of the atmosphere on its own.
At this point, bubbles of vapour can form within the liquid itself, converting liquid to gas much faster than slow evaporation from the liquid surface.