Tuesday, 30 January 2018

Recent Developments in DNA Origami

Javier Cabello – PhD student

Using both technique and imagination, origami allows the production of many shapes with only a folded sheet of paper. After the birth of DNA nanotechnology, the potential of this folding art inspired Paul Rothermund to develop a technique that allows the creation of complex nanoscale objects out of DNA. A single DNA strand is a flexible material that can be folded by hundreds of smaller strands (“staples”). These staples bind to specific non-consecutive regions of the larger strand (“template”) by Watson-Crick base-pairing, folding the template into a DNA structure with a particular shape.

DNA origami structures have a wide variety of potential uses that greatly differ from the original biological role of the molecule. These range from drug delivery to the production of electric circuits, with the biocompatible nature of DNA suggesting application in living organisms. However, the structures created are still limited by the size of the template, which becomes harder to fold when longer. Another problem with scaling up the technology is the cost of synthesising large numbers of staple strands.

Last November, four publications in Nature focused on overcoming the limitations of DNA origami. Three of the publications proposed new approaches for the assembly of bigger DNA structures. All three involved combining blocks of different sizes and complexity to form bigger structures.

Gigadalton-scale shape-programmable DNA assemblies
The first approach, suggested by Wagenbauer, Sigl and Dietz, involves making origami structures that can then self-polymerize. The resultant structures depend on the geometry of the individual origami building blocks. This approach, heavily inspired by viral capsid assembly, was used to form DNA rings from V-shaped DNA components. Afterwards, these rings were assembled to form DNA tubes. The method was generalised to produce different figures, as dodecahedra, by varying V-block angles and adding blocks with more than two binding sites.

Fractal assembly of micrometre-scale DNA origami arrays with arbitrary patterns
Tikhomirov, Petersen and Qian’s work revolves around the design of a robust algorithm to produce ordered assembly of complex patterns in 2D. In this work, individual origami tiles were first folded in isolation. Tiles were then combined in groups of four to produce larger “super-tiles”. Successively, four of these assembled larger tiles were employed to produce even bigger tiles. To achieve the necessary control of this process, the authors used three basic design rules, which they incorporated into a computational tool that they named the “FracTile Compiler”. This compiler allows the systematic design of complex, puzzle-like patterns. The resultant array can show an arbitrary picture, displayed by unique modifications to the surface of each tile, as they do with “La Gioconda” (ed.: the Mona Lisa for English readers) among others (see fig. 1).

Fig. 1 Complex shapes created using the method of Tikhomorov et al., reproduced with permission from Springer Nature, Fractal assembly of micrometre-scale DNA origami arrays with arbitrary patterns, G Tikhomirov, P Petersen, L Qian, Nature 552 (7683), 67.

Programmable self-assembly of three-dimensional nanostructures from 10,000 unique components
In this case, Ong et al improved a previously published variant related to DNA origami: DNA bricks. Instead of using large DNA strands as in previous approaches, the authors design structures built from smaller building blocks, of only 56 nucleotides each. Each of these bricks can bind to three others, forming a dense “lego”-like structure. By using 10,000 different sequences for these DNA bricks, the authors achieve the production of cubes of more than 1 GDa (3000 tunes the mass of the largest proteins). These cubes were proposed to serve as a canvas to produce different structures by simply removing bricks from the design, like chiselling a rock. This approach even allows the production of negative structures by carving cavities inside the DNA cube. To help in the design task, the authors created the software “NanoBricks” and proved its utility by producing different patterns. such as a helix or a teddy bear inside the DNA cube.

Biotechnological mass production of DNA origami
Despite these significant improvements in the production of larger DNA origami structures, all their possible applications are being hindered by the pecuniary costs involved in synthesising the component strands. However, new methods inspired by the current biotechnological industry are being developed to mass produce DNA origami designs.

Praetorius et al. have demonstrated the possibility of biotechnological mass production of a DNA origami design. Their approach involves encoding all the components necessary for the folding of an origami (ie. template and staples) in a single genetic sequence, and incorporating it into bacteria. Of course, this isn’t all. Between the different components they added DNA sequences with enzymatic activity: DNAzymes. The whole DNA strand is then produced in a culture of bacteria, and the DNAzymes, in the presence of zinc, cut themselves out of the sequence, separating it into the desired components. The result is that the staples and template, now separate, can assemble themselves to form the intended DNA origami structure.

This new approach, if escalated, would drastically reduce the production costs of DNA origami designs. However, the production of the initial genetic material would still require an expensive synthesis of the desired sequences for each new design produced. In order to make this solution even more affordable the produced DNA origami should be a very general or standardized one which can be customized with minor changes. Who knows if the answer lies in any of the previous papers? My humble guess is that DNA bricks are good candidates.

Read more:
- Rothemund, P. (2006). Folding DNA to create nanoscale shapes and patterns. Nature, 440(7082), pp.297-302. Link: https://www.nature.com/articles/nature04586
- Wagenbauer, K., Sigl, C. and Dietz, H. (2017). Gigadalton-scale shape-programmable DNA assemblies. Nature, 552(7683), pp.78-83. Link: https://www.nature.com/articles/nature24651
- Tikhomirov, G., Petersen, P. and Qian, L. (2017). Fractal assembly of micrometre-scale DNA origami arrays with arbitrary patterns. Nature, 552(7683), pp.67-71. Link: https://www.nature.com/articles/nature24655
Ong, L., Hanikel, N., Yaghi, O., Grun, C., Strauss, M., Bron, P., Lai-Kee-Him, J., Schueder, F., Wang, B., Wang, P., Kishi, J., Myhrvold, C., Zhu, A., Jungmann, R., Bellot, G., Ke, Y. and Yin, P. (2017). Programmable self-assembly of three-dimensional nanostructures from 10,000 unique components. Nature, 552(7683), pp.72-77. Link: https://www.nature.com/articles/nature24648
- Praetorius, F., Kick, B., Behler, K., Honemann, M., Weuster-Botz, D. and Dietz, H. (2017). Biotechnological mass production of DNA origami. Nature, 552(7683). pp.84-87. Link: https://www.nature.com/articles/nature24650

Tuesday, 2 January 2018

Why biomolecular computing could pose a cybersecurity threat, why it does not, and why it is still an interesting question.

By Ismael Mullor Ruiz

There is not much needed to construct a computer. We are not referring to the hardware here, but the basic operations that the computer must be able to implement. All possible computer configurations available are to some extent a physical approximation to the configuration envisioned by Alan Turing in his seminal paper in 1936 [1]: the Turing Machine. Turing imagined an infinitely long paper tape containing a series of sites in which two different states could be encoded (the minimum information unit, what would later be called a bit). In addition, he considered a “head” that could go through the tape performing different operations such as reading the information encoded, copying it and writing over the tape (the basic algorithmic operations) following defined rules. Such a device could, in principle, perform all possible computational tasks (it is “Turing complete”).

As said before, this setup is only a theoretical model, but in order to get a physical approximation that allows us to have a computer, we only need a physical system that can emulate this behaviour. The requirements are actually quite loose and allow the implementation of Turing complete machines in a wide array of phenomena, including the well-known electronic devices being used to read this post, but also other non-conventional configurations like biomolecular computing, membrane computing or optical computing.

It must be noted that for any reader with deep knowledge of molecular biology, the resemblance between the theoretical Turing machine and life’s basic molecular machinery is striking. For example, consider the presence of an information coding substrate – nucleic acids in biology - on which different machines – such as enzymes, ribozymes or some oligonucleotides - can operate. These natural devices can read the stored information, copy it into different formats, and overwrite or remove information given some rules.

Similarities between theoretical and biomolecular Turing machines. Adapted from Saphiro and Benenson (2006) [2].

Despite these similarities, it took almost 30 years from the publication of the central dogma of molecular biology, which outlines how the information in DNA is used by the cell, and almost 20 years from the birth of genetic engineering, to actually exploit biomolecules for computation. The birth of biomolecular computing came in 1994 with the foundational paper by Leonard Adelman in which he used DNA to solve a classic computational problem characterized by its difficulty: the Hamiltonian path problem [3]. In this problem, a network of n nodes is defined with the particularity that only certain transitions between the nodes of the network are allowed. If there exists a path in the network that allows you to go through all the nodes of the network staying only once in each node, this path is said to be Hamiltonian. During the years since, the field has grown beyond the architecture designed and developed by Adelman giving birth to a range of underlying biomolecular methods for computation [4].

(Left):Directed graph of the Hamiltonian path problem from Adelman's seminal paper [4]. (Right): DNA species implementation of the nodes and transitions for the graph.  Adapted from Parker (2003) [5].

These methods, often based on DNA, have an extraordinarily high density of information, dwarfing all the current optical/magnetic devices, as well as a very low energy consumption when compared to conventional computers and a capacity to perform astronomical numbers of operations in parallel. This means that that while in conventional computing architectures, all operations are executed sequentially doing only once at a given instant, in a biomolecular computer every single molecule in dilution is executing operations and making attempts to find a solution to the given problem at the same time.

These architectural features make DNA computers potentially efficient for crunching the solution of a set of computational challenges known as “NP problems”. For NP problems, the time for finding a solution grows exponentially with regards to the size of the problem and therefore can become impractical to solve with a conventional computer. The difficulty of solving NP problems is at the core of the security encryption protocols that are used for example in digital transactions or cryptocurrency mining. It is not hard to think that the existence of a tool that might allow those problems to be solved easily would pose a serious security breach. But despite the fact DNA computers have been designed to break cryptography protocols [6,7], they are not currently a serious threat due to a range of practical physical issues:

-The first problem comes from the fact that the information density calculations frequently cited assume essentially an essentially solid density of DNA – but such conditions are impractical for computation with any biomolecular architecture.

-Another issue with the use of massively parallel molecular computation is that only a small number of “correct” solutions may be produced in a test tube with astronomical quantities of unwanted molecules. Recovering and reading this output is an enormous challenge as it requires specifically picking these needles from the haystack using molecular interactions only.

-Related to the previous issue is the question of how to make the output readable and useful. The molecular interactions used to separate the output and read it never are completely specific. This results first in an imperfect separation of pools of molecules as well as important chance of reading as correct an incorrect output. This misreading problem becomes more prevalent when differences between correct and incorrect outputs become smaller.

-Another substantial issue to face is that most DNA-computing architectures are not reprogrammable and once the program has been run it cannot be run again.  Thus one must design new DNA strands and order those DNA strands from a supplier for a new instance of a problem, making the process of iterating an algorithm for a new set of inputs a time- and resource-consuming one. Some recent work in out-of-equilibrium chemical reaction networks aims to introduce reprogammability [8], but still is very far away from the versatility found on any personal computer.

 -The last issue that makes impractical the application of biomolecular computers for breaking cryptography protocols lies in the fact that despite algorithmic operations are easy to implement in biomolecular systems, translating that algorithmic operations into arithmetic operations is hard by itself (as proved by Winfree and Qian with their paper encoding an algorithm for calculating the square root of an integer using a 130 strands reaction network [9]).

Despite all these issues, valuable lessons can be taken. The analogy with the theoretical Turing machine underlies every conceivable computer to some extent or another, but the physical characteristics of the substrate in which the Turing machine is implemented affects deeply the efficiency of any program the machine is given. As we have seen, analogies between life’s molecular machinery and a Turing machine can be drawn and are significant. But the similarities that can be mapped between life molecular machinery and network computing are even more relevant [10,11]. Life does indeed perform certain computational tasks and it is incredibly efficient at doing so when compared to man-made computers. These tasks include inference, learning and signal processing and these remarkable capability are observed at all evolutionary levels. It is not hard to conclude that a deeper understanding of the underlying biophysics of genuine biomolecular computations would yield relevant insights improving the design and capabilities of programmable biomolecular systems that for their very own nature can directly interact with living systems and have applications in fields like synthetic biology or bio-medicine.

1. Turing, A. M. (1937). On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London mathematical society, 2(1), 230-265.

2. Shapiro, E., & Benenson, Y. (2006). Bringing DNA computers to life. Scientific American, 294(5), 44-51.

3. Adleman, L. M. (1994). Molecular computation of solutions to combinatorial problems. Science. 266 (5187), 1021–1024

4. Benenson, Y. (2012). Biomolecular computing systems: principles, progress and potential. Nature Reviews Genetics13(7), 455-468.

5. Parker, J. (2003). Computing with DNA. EMBO reports, 4(1), 7-10.

6. Boneh, D., Dunworth, C., & Lipton, R. J. (1995). Breaking DES using a molecular computer. DNA based computers27, 37-66.

7. Chang, W. L., Guo, M., & Ho, M. H. (2005). Fast parallel molecular algorithms for DNA-based computation: factoring integers. IEEE Transactions on Nanobioscience4(2), 149-163.

8. Del Rosso, N. V., Hews, S., Spector, L., & Derr, N. D. (2017). A Molecular Circuit Regenerator to Implement Iterative Strand Displacement Operations. Angewandte Chemie International edition129(16), 4514-4517.

9. Qian, L., & Winfree, E. (2011). Scaling up digital circuit computation with DNA strand displacement cascades. Science332(6034), 1196-1201.

10. Bray, D. (1995). Protein molecules as computational elements in living cells. Nature, 376(6538), 307-312.

11. Kim, J., Hopfield, J., & Winfree, E. (2005). Neural network computation by in vitro transcriptional circuits. In Advances in neural information processing systems (pp. 681-688).

Friday, 15 September 2017

We're famous! (ish) Tom's been answering questions on the "Naked Scientists" podcast

The Naked Scientists are based at Cambridge University's Institute of Continuing Education (ICE). In their own words, they are a team of scientists, doctors and communicators whose passion is to help the general public to understand and engage with the worlds of science, technology and medicine.

They recently contacted me to provide an answer to the following question:

"Water is solid below 0 degrees C, a liquid from 0 to 100 degrees C, and a gas above 100 degrees C. Then why does washing and the roads dry when the temperature isn't 100 degrees C?"

I think the answer's actually quite subtle, for such a deceptively simple-sounding question. My recorded answer can be downloaded from here. Full transcript below!

Interviewer: I asked Thomas Ouldridge from Imperial College London to hang Norm’s question out to dry
Tom: It is true that pure water will be a gas (called water vapour) only above 100 degrees Celsius, but temperature isn’t the only factor at play here. 
The surrounding pressure also impacts when a substance like water can be a gas – the higher the pressure, the higher the temperature required to be a gas. You've probably all heard of LPG - this stands for liquified petroleum gas. The chemicals in LPG would normally be a gas, but by keeping LPG in high-pressure cannisters it can be stored  as a liquid.
So gases, like [insert humorous reference here], often can't handle pressure. But why is this? 
To exist as a gas, water molecules have to be widely spaced out. High enough pressure will simply squeeze them back into their more compact liquid form. 
The more you heat the water, the more energy you give to the molecules and the harder they push back against their surroundings. Above 100 degrees Celsius, but not below, water molecules can push back hard enough against the pressure of the atmosphere for pure water to stay as a gas.
Well, personally I tend to hang my washing inside the earth’s atmosphere, and below boiling point, and it still dries. So what’s going on?
Well, let’s consider a puddle, for example. You might think it should stay as a liquid because the temperature is below 100 degrees, and the atmosphere is pushing down on it. However, 
we don't only have water molecules involved - the system isn't pure. The air above the surface of the puddle contains many other molecules, such as nitrogen and oxygen. 
These extra molecules can actually help to push back against the surrounding atmosphere, effectively lowering the pressure that must be supported by the water molecules if they form a gas. It’s like many people helping to lift a weight rather than just one. 
In fact, there’s so much more nitrogen and oxygen that they take almost all of the burden of the atmospheric pressure. 
This is important: any water molecules that have enough energy to escape from the puddle don’t face the full might of the atmospheric pressure – so they don’t immediately liquidise. 
This is why some water vapour can survive in the atmosphere, thanks to the hard work of the other gasses, and we can explain why evaporation happens and puddles (or clothes) dry under normal conditions.
Only a certain amount of water vapour can be supported by the other gasses though, which is why things don’t evaporate immediately, and why movement of air is important if you want things to dry faster. 
If you want to see this in action for yourself, lick your wrist and blow on it. It dries almost immediately compared to if you don’t blow.  So that’s evaporation covered, but how is that different from boiling? 
When water reaches 100 degrees at atmospheric pressure, it has so much energy that the vapour no longer needs the help of other gases to be stable. In our analogy, the water vapour is now like a weightlifter that is strong enough to support the “weight” of the atmosphere on its own.
At this point, bubbles of vapour can form within the liquid itself, converting liquid to gas much faster than slow evaporation from the liquid surface.

Thursday, 6 July 2017

What we learn from the learning rate

Cells need to sense their environment in order to survive. For example, some cells measure the concentration of food or the presence of signalling molecules. We are interested in studying the physical limits to sensing with limited resources, to understand the challenges faced by cells and to design synthetic sensors.

We have recently published a paper (arxiv version) where we explore the interpretation of a metric called the learning rate that has been used to measure the quality of a sensor (New J. Phys. 16 103024, Phys. Rev E 93 022116). Our motivation is that in this field a number of metrics (a metric is a number you can calculate from the properties of the sensor that, ideally, tells you how good the sensor is) have been applied to make some statement about the quality of sensing, or limits to sensory performance. For example, a limit of particular interest is the energy required for sensing. However, it is not always clear how to interpret these metrics. We want to find out what the learning rate means. If one sensor has a higher learning rate than another what does that tell you? 

The learning rate is defined as the rate at which changes in the sensor increase the information the sensor has about the signal. The information the sensor has about the signal is how much your uncertainty about the state of the signal is reduced by knowing the state of the sensor (this is known as the mutual information). From this definition, it seems plausible that the learning rate could be a measure of sensing quality, but it is not clear. Our approach is a test to destruction – challenge the learning rate in a variety of circumstances, and try to understand how it behaves and why

To do this we need a framework to model a general signal and sensor system. The signal hops between discrete states and the sensor also hops between discrete states in a way that follows the signal. A simple example is a cell using a surface receptor to detect the concentration of a molecule in its environment.

The figure shows such a system. The circles represent the states and the arrows represent transitions between the states. The signal is the concentration of a molecule in the cell’s environment. It can be in two states; high or low, where high is double the concentration of low. The sensor is a single cell surface receptor, which can be either unbound or bound to a molecule. Therefore, the joint system can be in four different states. The concentration jumps between its states with rates that don’t depend on the state of the sensor. The receptor becomes unbound at a constant rate and is bound at a rate proportional to the molecule concentration. 

We calculated the learning rate for several systems, including the one above, and compared it to the mutual information between the signal and the sensor. We found that in the simplest case, shown in the figure, the learning rate essentially reports the correlation between the sensor and the signal and so it is showing you the same thing as the mutual information. In more complicated systems the learning rate and mutual information show qualitatively different behaviour. This is because the learning rate actually reflects the rate at which the sensor must change in response to the signal, which is not, in general, the equivalent to the strength of correlations between the signal and sensor. Therefore, we do not think that the learning rate is useful as a general metric for the quality of a sensor.

Tuesday, 4 July 2017

Becoming more certain about uncertainty in molecular systems

By Jenny Poulton

Due to the unpredictability of motion at the microscopic scale, molecular processes have randomness associated with them, exhibiting what we call thermodynamic fluctuations. A group in Germany lead by Barato and Seifert have written a series of papers, beginning with "Thermodynamic uncertainty relation for biomolecular processes" (preprint here), exploring how uncertainty in the number of reaction steps taken by a molecular process is related to the degree to which the system is constantly consuming energy.

To be more precise, Barato and Seifert consider the number of times a system completes a cycle in a given time window. A good example of this kind of setup is the rotary motor F0F1-ATPsynthase (below, image taken from Wikipedia).
This motor is used to create the chemical fuel source of the cell (ATP) from its components (ADP and inorganic phosphate P). In order to drive this process, a current of hydrogen ions flows through the top half of the motor, causing it to systematically rotate in one direction with respect to the bottom half. This rotation is physically linked to the reaction ADP + P -> ATP, and so ATP is created. This one-directional rotational motion only arises because the current of hydrogen ions continuously supplies more energy (more technically, free energy) to the system than is needed to create the ATP. We say that the current of ions drives the system.

In general, small driven systems have a bias towards stepping forward, but there is still a non-zero probability of stepping backwards due to thermodynamic fluctuations. We also cannot predict exactly how long the system will take to complete each step of the cycle, and so the time taken per step is variable. Thus the number of cycles completed in a given time is uncertain. It is, however, possible to define an average of the net number of cycles in a time window µ and a variance σ2, which is a mathematical measure of the typical deviation from the average due to fluctuations. The Fano factor F = σ2/µ gives a measure of the relative importance of the random fluctuations about the average.

In the paper "Thermodynamic uncertainty relation for biomolecular processes", Barato and Seifert relate the energy consumption and the Fano factor via F ≤ 2kT /E. Here E is the energy consumed per cycle, T is the temperature and k is Boltzmann’s constant. This expression means that the Fano factor is at least as big as the quantity 2kT /E. Thus a cycle which uses a certain amount of fuel E has an upper limit to its precision, and there is an evident trade-off between the amount of energy dissipated per cycle and the Fano factor.

In the original paper, the authors only prove their relation for very simple processes. However, it has since been generalised in this paper (preprint here). The result is actually based on very deep statements about the types of fluctuating processes that are possible in physical systems. One of the challenges now is to take this fundamental insight and apply it to gain a better understanding of practical systems. Fortunately, the F0F1-ATPsynthase rotary motor is not the only example of an interesting biological system that  undergoes driven cycles; the cell contains a huge variety of molecular motors that can also be understood in this way (preprint here). Molecular timekeepers that are vital to the cellular life cycle also depend on driven cycles. Understanding the trade-offs between unwanted variability and energy consumption will be vital in engineering such systems.

Tuesday, 11 April 2017

Two papers on the fundamental principles of biomolecular copying

Single cells, which are essentially bags of chemicals, can achieve remarkable feats of information processing. Humans have designed computers to perform similar tasks in our everyday world. The question of whether it is possible to emulate cells and use molecular systems to perform complex computational tasks in parallel, at an extremely small scale and consuming a low amount of power, is one that has intrigued many scientists.

In collaboration with the ten Wolde group from AMOLF Amsterdam, we have just published two articles in Physical Review X and Physical Review Letters that get to the heart of this question. 

The readout molecules (orange) act as copies of the binding
state of the receptors (purple), through catalytic
phosphorylation/dephosphorylation reactions.

In the first, “The Thermodynamics of computational copying in biochemical systems”, we show that a simple molecular process occurring inside living cells - a phosphorylation/dephosphorylation cycle - is able to copy the state of one protein (for example, whether a food molecule is bound to it or not) into the chemical modification state of another protein (phosphorylated or not). This copy process can be rigorously related to those performed by conventional computers.
We thus demonstrated that living cells can perform the basic computational operation of copying a single bit of information. Moreover, our analysis revealed that these biochemical computations can occur rapidly and at a low power consumption. The article shows precisely how natural systems relate to and differ from traditional computing architectures, and provides a blueprint for building naturally-inspired synthetic copying  systems that approach the lower limits of power consumption.
The production of a persistent copy from a template.
The separation in the final state is essential.
A more complex natural copy operation is the production of polymer copies from polymer templates, as discussed in this previous post. Such processes are necessary for DNA replication, and also for the production of proteins from DNA templates via intermediate RNA molecules. For cells to function, the data in the original DNA sequence of bases must be faithfully reproduced - each copy therefore involves copying many bits of data. 

In the second article, "Fundamental costs in the production and destruction of persistent polymer copies", we consider such processes. We point out that these polymer copies must be persistent to be functional. In other words, the end result is two physically separate polymers: it would be useless to produce proteins that couldn't detach from their nucleic acid templates. As a result, the underlying principles are very different from the superficially similar process of self-assembly, in which molecules aggregate together according to specific interactions to form a well-defined structure. 

In particular, we show that the need to produce persistent copies implies that more accurate copies necessarily have a higher minimal production cost (in terms of resources consumed) than sloppier copies. This result, which is not true if the copies do not need to physically separate from their templates, sets a bound on the function of minimal self-replicating systems.

Additionally, the  results suggest that polymer copying processes that occur without external intervention (autonomously) must occur far from equilibrium. Being far from equilibrium means that processes are highly irreversible - taking a forwards step is much more likely than taking a backwards step. This finding draws a sharp distinction with self-assembling systems, that typically assemble most accurately when close to equilibrium. This difference may explain why recent years have shown an enormous growth in the successful design of self-assembling molecular systems, but autonomous synthetic systems that produce persistent copies through chemical means have yet to be constructed.
Taken together, these papers set a theoretical background on which to base the design of synthetic molecular systems that achieve computational processes such as copying and information transmission. The next challenge is now to develop experimental systems that exploit these ideas.