diff --git a/archive/friston/The_Free_Energy_Principle_Friston_2010.md b/archive/friston/The_Free_Energy_Principle_Friston_2010.md new file mode 100644 index 00000000..13001bd6 --- /dev/null +++ b/archive/friston/The_Free_Energy_Principle_Friston_2010.md @@ -0,0 +1,1333 @@ + REVIEWS + The free-energy principle: + a unified brain theory? + Karl Friston + Abstract | A free-energy principle has been proposed recently that accounts for action, + perception and learning. This Review looks at some key brain theories in the biological (for + example, neural Darwinism) and physical (for example, information theory and optimal + control theory) sciences from the free-energy perspective. Crucially, one key theme runs + through each of these theories — optimization. Furthermore, if we look closely at what is + optimized, the same quantity keeps emerging, namely value (expected reward, expected + utility) or its complement, surprise (prediction error, expected cost). This is the quantity that + is optimized under the free-energy principle, which suggests that several global brain + theories might be unified within a free-energy framework. + Free energy Despite the wealth of empirical data in neuroscience, Motivation: resisting a tendency to disorder. The + An information theory measure there are relatively few global theories about how the defining characteristic of biological systems is that + that bounds or limits (by being brain works. A recently proposed free-energy principle they maintain their states and form in the face of a + greater than) the surprise on 3–6 + for adaptive systems tries to provide a unified account constantly changing environment . From the point + sampling some data, given a of action, perception and learning. Although this prin- of view of the brain, the environment includes both + generative model. ciple has been portrayed as a unified brain theory1, its the external and the internal milieu. This maintenance + Homeostasis capacity to unify different perspectives on brain function of order is seen at many levels and distinguishes bio- + The process whereby an open has yet to be established. This Review attempts to place logical from other self-organizing systems; indeed, the + or closed system regulates its some key theories within the free-energy framework, in physiology of biological systems can be reduced almost + internal environment to the hope of identifying common themes. I first review entirely to their homeostasis7. More precisely, the rep- + maintain its states within the free-energy principle and then deconstruct several ertoire of physiological and sensory states in which an + bounds. global brain theories to show how they all speak to the organism can be is limited, and these states define the + Entropy same underlying idea. organism’s phenotype. Mathematically, this means that + The average surprise of the probability of these (interoceptive and exterocep- + outcomes sampled from a The free-energy principle tive) sensory states must have low entropy; in other + probability distribution or The free-energy principle (BOX 1) says that any self- words, there is a high probability that a system will + density. A density with low + entropy means that, on organizing system that is at equilibrium with its environ- be in any of a small number of states, and a low prob- + average, the outcome is ment must minimize its free energy2. The principle is ability that it will be in the remaining states. Entropy + relatively predictable. Entropy essentially a mathematical formulation of how adaptive is also the average self information or ‘surprise’8 + is therefore a measure of systems (that is, biological agents, like animals or brains) (more formally, it is the negative log-probability of an + uncertainty. + resist a natural tendency to disorder3–6. What follows is outcome). Here, ‘a fish out of water’ would be in a sur- + a non-mathematical treatment of the motivation and prising state (both emotionally and mathematically). + implications of the principle. We will see that although the A fish that frequently forsook water would have high + motivation is quite straightforward, the implications are entropy. Note that both surprise and entropy depend + The Wellcome Trust Centre complicated and diverse. This diversity allows the prin- on the agent: what is surprising for one agent (for + for Neuroimaging, ciple to account for many aspects of brain structure and example, being out of water) may not be surprising + University College London, function and lends it the potential to unify different per- for another. Biological agents must therefore mini- + Queen Square, London, + WC1N 3BG, UK. spectives on how the brain works. In subsequent sections, mize the long-term average of surprise to ensure that + e‑mail: I discuss how the principle can be applied to neuronal their sensory entropy remains low. In other words, + k.friston@fil.ion.ucl.ac.uk systems as viewed from these perspectives. This Review biological systems somehow manage to violate the + doi:10.1038/nrn2787 starts in a rather abstract and technical way but then tries fluctuation theorem, which generalizes the second law + Published online + 13 January 2010 9 + to unpack the basic idea in more familiar terms. of thermodynamics . + NATuRE REvIEWs | NeuroscieNce voluME 11 | FEBRuARy 2010 | 127 + © 2010 Macmillan Publishers Limited. All rights reserved + REVIEWS + + Box 1 | The free-energy principle + Part a of the figure shows the dependencies among the a + quantities that define free energy. These include the Environment Agent + internal states of the brain μ(t) and quantities describing its + exchange with the environment: sensory signals (and their Sensations + T ~ ~ ~ + motion) ˜s(t) = [s,s′,s″…] plus action a(t). The environment s = g(x, ϑ) + z + is described by equations of motion, which specify the + trajectory of its hidden states. The causes ϑ ⊃ {x˜ , θ, γ } of External states Internal states + sensory input comprise hidden states x˜ (t), parameters θ ~ ~ ~ ~ + ˙ + x = f(x, μ = arg min F(s, + and precisions γcontrolling the amplitude of the random a, ϑ) + w μ) + + fluctuations z˜ (t) and w˜ (t). Internal brain states and action + minimize free energy F(s˜ ,μ), which is a function of sensory Action or control signals + input and a probabilistic representation q(ϑ|μ) of its causes. ~ + a = arg min F(s, + This representation is called the recognition density and is μ) + encoded by internal states μ. + The free energy depends on two probability densities: b + the recognition density q(ϑ|μ) and one that generates Free-energy bound on surprise + sensory samples and their causes, p(s˜ ,ϑ|m). The latter ~ + F = − + + represents a probabilistic generative model (denoted by q q + m), the form of which is entailed by the agent or brain. Action minimizes prediction errors + Part b of the figure provides alternative expressions for the F = D(q(ϑ ~ + | μ) || p(ϑ)) − q + free energy to show what its minimization entails: action a = arg max Accuracy + can reduce free energy only by increasing accuracy (that is, + selectively sampling data that are predicted). Conversely, Perception optimizes predictions + optimizing brain states makes the representation an ~ ~ + Surprise approximate conditional density on the causes of sensory F = D(q(ϑ | μ) || p(ϑ | s)) − ln p(s | m) + (Surprisal or self information.) input. This enables action to avoid surprising sensory μ = arg max Divergence + The negative log-probability of encounters. A more formal description is provided below. + an outcome. An improbable optimizing the sufficient statistics (representations) + outcome (for example, water Nature Reviews | Neuroscience + Optimizing the recognition density makes it a posterior or conditional density on the causes of sensory data: this can be + flowing uphill) is therefore seen by expressing the free energy as surprise –In p(s˜ ,| m) plus a Kullback-Leibler divergence between the recognition and + surprising. conditional densities (encoded by the ‘internal states’ in the figure). Because this difference is always positive, minimizing + Fluctuation theorem free energy makes the recognition density an approximate posterior probability. This means the agent implicitly infers or + (A term from statistical represents the causes of its sensory samples in a Bayes-optimal fashion. At the same time, the free energy becomes a tight + mechanics.) Deals with the bound on surprise, which is minimized through action. + probability that the entropy optimizing action + of a system that is far from the Acting on the environment by minimizing free energy enforces a sampling of sensory data that is consistent with the + thermodynamic equilibrium current representation. This can be seen with a second rearrangement of the free energy as a mixture of accuracy and + will increase or decrease over complexity. Crucially, action can only affect accuracy (encoded by the ‘external states’ in the figure). This means that + a given amount of time. It the brain will reconfigure its sensory epithelia to sample inputs that are predicted by the recognition density — in other + states that the probability of + the entropy decreasing words, to minimize prediction error. + becomes exponentially smaller + with time. + Attractor In short, the long-term (distal) imperative — of main- Crucially, free energy can be evaluated because it is a + A set to which a dynamical taining states within physiological bounds — translates function of two things to which the agent has access: its + system evolves after a long into a short-term (proximal) avoidance of surprise. sensory states and a recognition density that is encoded + enough time. Points that surprise here relates not just to the current state, which by its internal states (for example, neuronal activity + get close to the attractor cannot be changed, but also to movement from one state and connection strengths). The recognition density is a + remain close, even under to another, which can change. This motion can be com- probabilistic representation of what caused a particular + small perturbations. + plicated and itinerant (wandering) provided that it revis- sensation. + Kullback-Leibler divergence its a small set of states, called a global random attractor10, This (variational) free-energy construct was + (Or information divergence, that are compatible with survival (for example, driving a introduced into statistical physics to convert difficult + information gain or cross car within a small margin of error). It is this motion that probability-density integration problems into eas- + entropy.) A non-commutative + 11 + measure of the non-negative the free-energy principle optimizes. ier optimization problems . It is an information + difference between two so far, all we have said is that biological agents must theoretic quantity (like surprise), as opposed to a + probability distributions. avoid surprises to ensure that their states remain within thermo dynamic quantity. variational free energy has + Recognition density physiological bounds (see supplementary information s1 been exploited in machine learning and statistics to + 12–14 + (Or ‘approximating conditional (box) for a more formal argument). But how do they solve many inference and learning problems . In this + density’.) An approximate do this? A system cannot know whether its sensations setting, surprise is called the (negative) model evidence. + probability distribution of the are surprising and could not avoid them even if it did This means that minimizing surprise is the same as + causes of data (for example, know. This is where free energy comes in: free energy is maximizing the sensory evidence for an agent’s exist- + sensory input). It is the product an upper bound on surprise, which means that if agents ence, if we regard the agent as a model of its world. In + of inference or inverting a + generative model. minimize free energy, they implicitly minimize surprise. the present context, free energy provides the answer to + 128 | FEBRuARy 2010 | voluME 11 www.nature.com/reviews/neuro + © 2010 Macmillan Publishers Limited. All rights reserved + REVIEWS + a fundamental question: how do self-organizing adap- In summary, the free energy rests on a model of how + tive systems avoid surprising states? They can do this by sensory data are generated and on a recognition density + minimizing their free energy. so what does this involve? on the model’s parameters (that is, sensory causes). Free + energy can be reduced only by changing the recognition + Implications: action and perception. Agents can density to change conditional expectations about what is + suppress free energy by changing the two things it depends sampled or by changing sensory samples (that is, sensory + Generative model on: they can change sensory input by acting on the world input) so that they conform to expectations. In what fol- + A probabilistic model (joint or they can change their recognition density by chang- lows, I consider these implications in light of some key + density) of the dependencies ing their internal states. This distinction maps nicely theories about the brain. + between causes and onto action and perception (BOX 1). one can see what this + consequences (data), from means in more detail by considering three mathematically The Bayesian brain hypothesis + which samples can be + generated. It is usually 17 + equivalent formulations of free energy (see supplementary The Bayesian brain hypothesis uses Bayesian probability + specified in terms of the information s2 (box) for a mathematical treatment). theory to formulate perception as a constructive process + likelihood of data, given their The first formulation expresses free energy as energy based on internal or generative models. The underlying + causes (parameters of a model) 18–22 + and priors on the causes. minus entropy. This formulation is important for three idea is that the brain has a model of the world that + 23–28 + reasons. First, it connects the concept of free energy as it tries to optimize using sensory inputs . This idea is + Conditional density 20 + used in information theory with concepts used in sta- related to analysis by synthesis and epistemological autom- + (Or posterior density.) The 19 + probability distribution of tistical thermodynamics. second, it shows that the free ata . In this view, the brain is an inference machine that + 18,22,25 + causes or model parameters, energy can be evaluated by an agent because the energy actively predicts and explains its sensations . Central + given some data; that is, a is the surprise about the joint occurrence of sensations to this hypothesis is a probabilistic model that can gener- + probabilistic mapping from and their perceived causes, whereas the entropy is sim- ate predictions, against which sensory samples are tested + observed data to causes. ply that of the agent’s own recognition density. Third, it to update beliefs about their causes. This generative + Prior shows that free energy rests on a generative model of the model is decomposed into a likelihood (the probability of + The probability distribution or world, which is expressed in terms of the probability of a sensory data, given their causes) and a prior (the a priori + density of the causes of data sensation and its causes occurring together. This means probability of those causes). Perception then becomes the + that encodes beliefs about that an agent must have an implicit generative model of process of inverting the likelihood model (mapping from + those causes before observing how causes conspire to produce sensory data. It is this causes to sensations) to access the posterior probability of + the data. model that defines both the nature of the agent and the the causes, given sensory data (mapping from sensations + Bayesian surprise quality of the free-energy bound on surprise. to causes). This inversion is the same as minimizing the + A measure of salience based The second formulation expresses free energy as difference between the recognition and posterior densi- + on the Kullback-Leibler surprise plus a divergence term. The (perceptual) diver- ties to suppress free energy. Indeed, the free-energy for- + divergence between the gence is just the difference between the recognition den- mulation was developed to finesse the difficult problem + recognition density (which sity and the conditional density (or posterior density) of the of exact inference by converting it into an easier optimi- + encodes posterior beliefs) and + the prior density. It causes of a sensation, given the sensory signals. This con- zation problem11–14. This has furnished some powerful + measures the information that ditional density represents the best possible guess about approximation techniques for model identification and + can be recognized in the data. the true causes. The difference between the two densities comparison (for example, variational Bayes or ensemble + Bayesian brain hypothesis is always non-negative and free energy is therefore an learning29). There are many interesting issues that attend + The idea that the brain uses upper bound on surprise. Thus, minimizing free energy the Bayesian brain hypothesis, which can be illuminated + internal probabilistic by changing the recognition density (without changing by the free-energy principle; we will focus on two. + (generative) models to update sensory data) reduces the perceptual divergence, so that The first is the form of the generative model and + posterior beliefs, using sensory the recognition density becomes the conditional density how it manifests in the brain. one criticism of Bayesian + information, in an and the free energy becomes surprise. treatments is that they ignore the question of how prior + (approximately) Bayes-optimal + fashion. The third formulation expresses free energy as com- beliefs, which are necessary for inference, are formed27. + Analysis by synthesis plexity minus accuracy, using terms from the model However, this criticism dissolves with hierarchical + Any strategy (in speech coding) comparison literature. Complexity is the difference generative models, in which the priors themselves are + 26,28 + in which the parameters of a between the recognition density and the prior density optimized . In hierarchical models, causes in one + 15 + signal coder are evaluated by on causes; it is also known as Bayesian surprise and is the level generate subordinate causes in a lower level; sen- + decoding (synthesizing) the difference between the prior density — which encodes sory data per se are generated at the lowest level (BOX 2). + signal and comparing it with beliefs about the state of the world before sensory data are Minimizing the free energy effectively optimizes empiri- + the original input signal. assimilated — and posterior beliefs, which are encoded cal priors (that is, the probability of causes at one level, + Epistemological automata by the recognition density. Accuracy is simply the sur- given those in the level above). Crucially, because empir- + Possibly the first theory for why prise about sensations that are expected under the recog- ical priors are linked hierarchically, they are informed + top-down influences (mediated nition density. This formulation shows that minimizing by sensory data, enabling the brain to optimize its prior + by backward connections in free energy by changing sensory data (without changing expectations online. This optimization makes every level + the brain) might be important the recognition density) must increase the accuracy of in the hierarchy accountable to the others, furnishing an + in perception and cognition. an agent’s predictions. In short, the agent will selectively internally consistent representation of sensory causes at + Empirical prior sample the sensory inputs that it expects. This is known multiple levels of description. Not only do hierarchical + A prior induced by hierarchical 16 + models; empirical priors as active inference . An intuitive example of this process models have a key role in statistics (for example, ran- + (when it is raised into consciousness) would be feeling dom effects and parametric empirical Bayes models30,31), + provide constraints on the our way in darkness: we anticipate what we might touch they may also be used by the brain, given the hierarchical + recognition density in the usual + 32–34 + way but depend on the data. next and then try to confirm those expectations. arrangement of cortical sensory areas . + NATuRE REvIEWs | NeuroscieNce voluME 11 | FEBRuARy 2010 | 129 + © 2010 Macmillan Publishers Limited. All rights reserved + REVIEWS + The second issue is the form of the recognition den- + Box 2 | Hierarchical message passing in the brain sity that is encoded by physical attributes of the brain, + (i) (i) (i) (i)((i – 1) i) such as synaptic activity, efficacy and gain. In general, + ξ = Π ε =Π (μ – g(μ )) + v v v v v any density is encoded by its sufficient statistics (for exam- + (i) (i) (i) (i)((i ) i) + ξ = Π ε =Π (Dμ – f(μ )) + x x x x x ple, the mean and variance of a Gaussian form). The way + the brain encodes these statistics places important con- + (3) straints on the sorts of schemes that underlie recognition: + Sensory (1) ξ + v + input ξv (2) (2) they range from free-form schemes (for example, particle + ξ + v ξ + x Backward: + (1) 26 35–38 + filtering and probabilistic population codes ), + Forward: ξx predictions + prediction which use a vast number of sufficient statistics, to sim- + error (2) pler forms, which make stronger assumptions about + μ + ~ v the shape of the recognition density, so that it can be + s(t) (1) + μ (2) + v + μ encoded with a small number of sufficient statistics. The + x + (1) + μ simplest assumed form is Gaussian, which requires only + x + the conditional mean or expectation — this is known + 39 + Lower cortical areas Higher cortical areas as the Laplace assumption , under which the free energy + (i) (i)((i) (i) i + 1) + T + ˙μ = Dμ − (∂ ε ) ξξ− is just the difference between the model’s predictions + Synaptic plasticity v v v v Synaptic gain + T (i) (i)((i) i) T + T and the sensations or representations that are predicted. + �μ = −∂ ε ξ�μ = ½tr(∂Π(ξξ − Π(μ))) + θ θ ˙μ = Dμ − (∂ ε ) ξ γ γ γ + ij ij x x x i i Minimizing free energy then corresponds to explaining + The figure details a neuronal architecture that optimizes the conditional expectations of away prediction errors. This is known as predictive coding + causes in hierarchical models of sensory input. It shows the putative cells of origin of forward and has become a popular framework for understand- + Nature Reviews | Neuroscience + driving connections that convey prediction error (grey arrows) from a lower area (for ing neuronal message passing among different levels of + example, the lateral geniculate nucleus) to a higher area (for example, V1), and nonlinear 40 + cortical hierarchies . In this scheme, prediction error + backward connections (black arrows) that construct predictions41. These predictions try to units compare conditional expectations with top-down + explain away prediction error in lower levels. In this scheme, the sources of forward and predictions to elaborate a prediction error. This predic- + backward connections are superficial and deep pyramidal cells (upper and lower triangles), tion error is passed forward to drive the units in the + respectively, where state units are black and error units are grey. The equations represent a level above that encode conditional expectations which + gradient descent on free energy using the generative model below. The two upper equations optimize top-down predictions to explain away (reduce) + describe the formation of prediction error encoded by error units, and the two lower + equations represent recognition dynamics, using a gradient descent on free energy. prediction error in the level below. Here, explaining + Generative models in the brain away just means countering excitatory bottom-up + To evaluate free energy one needs a generative model of how the sensorium is caused. inputs to a prediction error neuron with inhibitory syn- + Such models p(s˜ ,ϑ) = p(s˜ | ϑ) p(ϑ) combine the likelihood p(s˜ | ϑ) of getting some data given aptic inputs that are driven by top-down predictions + their causes and the prior beliefs about these causes, p(ϑ). The brain has to explain (see BOX 2 and REFS 41,42 for detailed discussion). The + complicated dynamics on continuous states with hierarchical or deep causal structure reciprocal exchange of bottom-up prediction errors and + and may use models with the following form top-down predictions proceeds until prediction error + is minimized at all levels and conditional expectations + + + + + +Ks +K +K +K +K + U I +Z X  θ  \ X  I +Z X  θ  \ are optimized. This scheme has been invoked to explain + …… + · · + + +K +K +K +K +K + + + + + + Z  H +Z X  θ  Y 40,43 + Z  H +Z X  θ  Y many features of early visual responses and provides + (i) (i) a plausible account of repetition suppression and mis- + Here, g and f are continuous nonlinear functions of (hidden and causal) states, with 44 + Nature Reviews | Neuroscience match responses in electrophysiology . FIGURE 1 pro- + (i) (i) (i) + parameters θ . The random fluctuations z(t) and w(t) play the part of observation + (i) vides an example of perceptual categorization that uses + noise at the sensory level and state noise at higher levels. Causal states v(t) link this scheme. + hierarchical levels, where the output of one level provides input to the next. Hidden + (i) Message passing of this sort is consistent with func- + states x(t) link dynamics over time and endow the model with memory. + Gaussian assumptions about the random fluctuations specify the likelihood 45 + tional asymmetries in real cortical hierarchies , where + and Gaussian assumptions about state noise furnish empirical priors in terms of forward connections (which convey prediction errors) + predicted motion. These assumptions are encoded by their precision (or inverse are driving and backwards connections (which model + (i) + variance), П (γ), which are functions of precision parameters γ. the nonlinear generation of sensory input) have both + recognition dynamics and prediction error driving and modulatory characteristics46. This asym- + If we assume that neuronal activity encodes the conditional expectation of states, then metrical message passing is also a characteristic feature + recognition can be formulated as a gradient descent on free energy. Under Gaussian of adaptive resonance theory47,48, which has formal simi- + assumptions, these recognition dynamics can be expressed compactly in terms larities to predictive coding. + (i) (i) (i) + of precision-weighted prediction errors ξ = П (ε) on the causal states and motion of In summary, the theme underlying the Bayesian brain + hidden states. The ensuing equations (see the figure) suggest two neuronal populations and predictive coding is that the brain is an inference + that exchange messages: causal or hidden-state units encoding expected states and engine that is trying to optimize probabilistic representa- + error units encoding prediction error. Under hierarchical models, error units receive tions of what caused its sensory input. This optimization + messages from the state units in the same level and the level above, whereas state units + are driven by error units in the same level and the level below. These provide bottom-up can be finessed using a (variational free-energy) bound + (i) + messages that drive conditional expectations μ towards better predictions, which on surprise. In short, the free-energy principle entails + (i) (i) + explain away prediction error. These top-down predictions correspond to g(μ ) and f(μ ). the Bayesian brain hypothesis and can be implemented + This scheme suggests that the only connections that link levels are forward connections by the many schemes considered in this field. Almost + conveying prediction error to state units and reciprocal backward connections that invariably, these involve some form of message passing + mediate predictions. See REFS 42,130 for details. Figure is modified from REF. 42. or belief propagation among brain areas or units. This + 130 | FEBRuARy 2010 | voluME 11 www.nature.com/reviews/neuro + © 2010 Macmillan Publishers Limited. All rights reserved + REVIEWS + a Perceptual inference allows us to connect the free-energy principle to another + Vocal centre Syrinx Sonogram principled approach to sensory processing, namely + information theory. + The principle of efficient coding + The principle of efficient coding suggests that the brain + optimizes the mutual information (that is, the mutual + predictability) between the sensorium and its internal + v representation, under constraints on the efficiency of + v = 1 + v + 2 those representations. This line of thinking was articu- + 49 + lated by Barlow in terms of a redundancy reduction + principle (or principle of efficient coding) and formal- + 50 + ized later in terms of the infomax principle . It has been + 18x − 18x applied in machine learning51, leading to methods + 2 1 + ˙x = f(x, v) = v x − 2x x − x 52 + 1 1 3 1 2 like independent component analysis , and in neuro- + 2xx − v x + 1 2 2 3 biology, contributing to an understanding of the nature + 53–56 + of neuronal responses . This principle is extremely + b Perceptual categorization effective in predicting the empirical characteristics of + Song a Song b Song c 53 + 5,000 classical receptive fields and provides a principled + explanation for sparse coding55 and the segregation of + 57 + processing streams in visual hierarchies . It has been + 4,000 extended to cover dynamics and motion trajectories58,59 + and even used to infer the metabolic constraints on neu- + 3,000 60 + equency (Hz) ronal processing . + Fr At its simplest, the infomax principle says that + 2,000 neuronal activity should encode sensory information in + 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 an efficient and parsimonious fashion. It considers the + Time (s) + c mapping between one set of variables (sensory states) + 50 3.5 and another (variables representing those states). At + c first glance, this seems to preclude a probabilistic repre- + 40 µv1 + a 3 sentation, because this would involve mapping between + 30 b sensory states and a probability density. However, the + auses b 2.5 a + c 20 c infomax principle can be applied to the sufficient sta- + v + ted 2 tistics of a recognition density. In this context, the info- + 10 2 max principle becomes a special case of the free-energy + Estima0 µ 1 principle, which arises when we ignore uncertainty + v 1.5 in probabilistic representations (and when there is no + –10 + –20 1 action); see supplementary information s3 (box) for + 0 0.2 0.4 0.6 0.8 1 10 15 20 25 30 35 mathematical details). This is easy to see by noting that + Time (s) v1 sensory signals are generated by causes. This means that it + Figure 1 | Birdsongs and perceptual categorization. a | The generative model of is sufficient to represent the causes to predict these + birdsong used in this simulation comprises a Lorenz attractor with two control parameters signals. More formally, the infomax principle can be + Nature Reviews | Neuroscience + (or causal states) (v ,v ), which, in turn, delivers two control parameters (not shown) to a understood in terms of the decomposition of free energy + 1 2 + synthetic syrinx to produce ‘chirps’ that were modulated in amplitude and frequency (an into complexity and accuracy: mutual information is + example is shown as a sonogram). The chirps were then presented as a stimulus to a optimized when conditional expectations maximize + synthetic bird to see whether it could infer the underlying causal states and thereby accuracy (or minimize prediction error), and efficiency + categorize the song. This entails minimizing free energy by changing the internal is assured by minimizing complexity. This ensures that + representation (μ ,μ ) of the control parameters. Examples of this perceptual inference or + v1 v2 no excessive parameters are applied in the generative + categorization are shown below. b | Three simulated songs are shown in sonogram format. model and leads to a parsimonious representation of + Each comprises a series of chirps, the frequency and number of which fall progressively + from song a to song c, as a causal state (known as the Raleigh number; v in part a) is sensory data that conforms to prior constraints on their + 1 + decreased. c | The graph on the left depicts the conditional expectations (μ ,μ ) of the causes. Interestingly, advanced model-optimization + v1 v2 + causal states, shown as a function of peristimulus time for the three songs. It shows that techniques use free-energy optimization to eliminate + the causes are identified after around 600 ms with high conditional precision (90% 61 + confidence intervals are shown in grey). The graph on the right shows the conditional redundant model parameters , suggesting that free- + density on the causes shortly before the end of the peristimulus time (that is, the dotted energy optimization might provide a nice explanation + line in the left panel). The blue dots correspond to conditional expectations and the grey for the synaptic pruning and homeostasis that take place + 62 63 + areas correspond to the 90% conditional confidence regions. Note that these encompass in the brain during neurodevelopment and sleep . + the true values (red dots) of (v ,v ) that were used to generate the songs. These results The infomax principle pertains to a forward mapping + 1 2 from sensory input to representations. How does this + illustrate the nature of perceptual categorization under the inference scheme in BOX 2: + here, recognition corresponds to mapping from a continuously changing and chaotic square with optimizing generative models, which map + sensory input to a fixed point in perceptual space. Figure is reproduced, with permission, from causes to sensory inputs? These perspectives can be + from REF. 130 © (2009) Elsevier. reconciled by noting that all recognition schemes based + NATuRE REvIEWs | NeuroscieNce voluME 11 | FEBRuARy 2010 | 131 + © 2010 Macmillan Publishers Limited. All rights reserved + REVIEWS + on infomax can be cast as optimizing the parameters of a by synaptic efficacy (these are μ in BOX 2) have to be + 64 θ + generative model . For example, in sparse coding mod- optimized. This corresponds to optimizing connection + 55 + els , the implicit priors posit independent causes that strengths in the brain — that is, plasticity that under- + are sampled from a heavy-tailed or sparse distribution42. lines learning. so what form would this learning take? It + The fact that these models predict empirically observed transpires that a gradient descent on free energy (that is, + receptive fields so well suggests that we are endowed changing connections to reduce free energy) is formally + with (or acquire) prior expectations that the causes of identical to Hebbian plasticity28,42 (BOX 2). This is because + our sensations are largely independent and sparse. the parameters of the generative model determine how + In summary, the principle of efficient coding says expected states (synaptic activity) are mixed to form pre- + that the brain should optimize the mutual information dictions. Put simply, when the presynaptic predictions + between its sensory signals and some parsimonious and postsynaptic prediction errors are highly correlated, + neuronal representations. This is the same as optimizing the connection strength increases, so that predictions + the parameters of a generative model to maximize the can suppress prediction errors more efficiently. + accuracy of predictions, under complexity constraints. In short, the formation of cell assemblies reflects the + Both are mandated by the free-energy principle, which encoding of causal regularities. This is just a restate- + can be regarded as a probabilistic generalization of the ment of cell assembly theory in the context of a specific + Sufficient statistics infomax principle. We now turn to more biologically implementation (predictive coding) of the free-energy + Quantities that are sufficient to inspired ideas about brain function that focus on neu- principle. It should be acknowledged that the learning + parameterize a probability ronal dynamics and plasticity. This takes us deeper into rule in predictive coding is really a delta rule, which + density (for example, mean and neurobiological mechanisms and the implementation of rests on Hebbian mechanisms; however, Hebb’s wider + covariance of a Gaussian the theoretical principles outlined above. notions of cell assemblies were formulated from a non- + density). statistical perspective. Modern reformulations suggest + Laplace assumption The cell assembly and correlation theory that both inference on states (that is, perception) and + 65 + (Or Laplace approximation or The cell assembly theory was proposed by Hebb and inference on parameters (that is, learning) minimize + method.) A saddle-point entails Hebbian — or associative — plasticity, which is a free energy (that is, minimize prediction error) and + approximation of the integral cornerstone of use-dependent or experience-dependent serve to bound surprising exchanges with the world. so + of an exponential function, that plasticity66, the correlation theory of von de Malsburg67,68 what about synchronization and the selective enabling + uses a second-order Taylor and other formal refinements to Hebbian plasticity of synapses? + expansion. When the function + 69 + is a probability density, the per se . The cell assembly theory posits that groups of + implicit assumption is that interconnected neurons are formed through a strength- Biased competition and attention + the density is approximately ening of synaptic connections that depends on corre- Causal regularities encoded by synaptic efficacy + Gaussian. lated pre- and postsynaptic activity; that is, ‘cells that fire control the deterministic evolution of states in the world. + Predictive coding together wire together’. This enables the brain to distil However, stochastic (that is, random) fluctuations in + A tool used in signal processing statistical regularities from the sensorium. The correla- these states play an important part in generating sen- + for representing a signal using tion theory considers the selective enabling of synaptic sory data. Their amplitude is usually represented as pre- + a linear predictive (generative) efficacy and its plasticity (also known as metaplastic- cision (or inverse variance), which encodes the reliability + model. It is a powerful speech 70 + analysis technique and was ity ) by fast synchronous activity induced by different of prediction errors. Precision is important, especially + first considered in vision to perceptual attributes of the same object (for example, a in hierarchical schemes, because it controls the relative + explain lateral interactions in red bus in motion). This resolves a putative deficiency influence of bottom-up prediction errors and top-down + the retina. of classical plasticity, which cannot ascribe a presynaptic predictions. so how is precision encoded in the brain? + Infomax input to a particular cause (for example, redness) in the In predictive coding, precision modulates the amplitude + An optimization principle for world67. The correlation theory underpins theoretical of prediction errors (these are μ in BOX 2), so that pre- + γ + neural networks (or functions) treatments of synchronized brain activity and its role in diction errors with high precision have a greater impact + that map inputs to outputs. It associating or binding attributes to specific objects or on units that encode conditional expectations. This + says that the mapping should causes68,71. Another important field that rests on associa- means that precision corresponds to the synaptic gain of + maximize the Shannon mutual tive plasticity is the use of attractor networks as models prediction error units. The most obvious candidates for + information between the inputs + 72–74 + and outputs, subject to of memory formation and retrieval . so how do corre- controlling gain (and implicitly encoding precision) are + constraints and/or noise lations and associative plasticity figure in the free-energy classical neuromodulators like dopamine and acetylcho- + processes. formulation? line, which provides a nice link to theories of attention + 75–77 + Stochastic Hitherto, we have considered only inference on states and uncertainty . Another candidate is fast synchro- + Governed by random effects. of the world that cause sensory signals, whereby condi- nized presynaptic input that lowers effective postsynaptic + tional expectations about states are encoded by synaptic membrane time constants and increases synchronous + Biased competition 78 + activity. However, the causes covered by the recognition gain . This fits comfortably with the correlation theory + An attentional effect mediated density are not restricted to time-varying states (for and speaks to recent ideas about the role of synchronous + by competitive interactions 79,80 + among neurons representing example, the motion of an object in the visual field): activity in mediating attentional gain . + visual stimuli; these they also include time-invariant regularities that endow In summary, the optimization of expected precision + interactions can be biased in the world with causal structure (for example, objects in terms of synaptic gain links attention to synaptic gain + favour of behaviourally relevant fall with constant acceleration). These regularities are and synchronization. This link is central to theories of + stimuli by both spatial and parameters of the generative model and have to be attentional gain and biased competition80–85, particularly + non-spatial and both 86,87 + bottom-up and top-down inferred by the brain — in other words, the conditional in the context of neuromodulation . The theories + processes. expectations of these parameters that may be encoded considered so far have dealt only with perception. + 132 | FEBRuARy 2010 | voluME 11 www.nature.com/reviews/neuro + © 2010 Macmillan Publishers Limited. All rights reserved + REVIEWS + However, from the point of view of the free-energy value or surprise is determined by the form of an agent’s + principle, perception just makes free energy a good generative model and its implicit priors — these specify + proxy for surprise. To actually reduce surprise we need the value of sensory states and, crucially, are heritable + to act. In the next section, we retain a focus on cell through genetic and epigenetic mechanisms. This means + assemblies but move to the selection and reinforcement that prior expectations (that is, the primary repertoire) + of stimulus–response links. can prescribe a small number of attractive states with + innate value. In turn, this enables natural selection to + Neural Darwinism and value learning optimize prior expectations and ensure they are con- + In the theory of neuronal group selection88, the emergence sistent with the agent’s phenotype. Put simply, valuable + of neuronal assemblies is considered in the light of selec- states are just the states that the agent expects to fre- + tive pressure. The theory has four elements: epigenetic quent. These expectations are constrained by the form of + mechanisms create a primary repertoire of neuronal its generative model, which is specified genetically and + connections, which are refined by experience-dependent fulfilled behaviourally, under active inference. + plasticity to produce a secondary repertoire of neuro- It is important to appreciate that prior expectations + nal groups. These are selected and maintained through include not just what will be sampled from the world but + reentrant signalling among neuronal groups. As in cell also how the world is sampled. This means that natural + assembly theory, plasticity rests on correlated pre- and selection may equip agents with the prior expectation + postsynaptic activity, but here it is modulated by value. that they will explore their environment until states + value is signalled by ascending neuromodulatory trans- with innate value are encountered. We will look at this + mitter systems and controls which neuronal groups more closely in the next section, where priors on motion + are selected and which are not. The beauty of neural through state space are cast in terms of policies in + Darwinism is that it nests distinct selective processes reinforcement learning. + within each other. In other words, it eschews a single unit Both neural Darwinism and the free-energy principle + of selection and exploits the notion of meta-selection try to understand somatic changes in an individual in + (the selection of selective mechanisms; for example, see the context of evolution: neural Darwinism appeals to + REF. 89). In this context, (neuronal) value confers evolu- selective processes, whereas the free energy formulation + tionary value (that is, adaptive fitness) by selecting neu- considers the optimization of ensemble or population + ronal groups that meditate adaptive stimulus–stimulus dynamics in terms of entropy and surprise. The key + associations and stimulus–response links. The capacity theme that emerges here is that (heritable) prior expecta- + of value to do this is assured by natural selection, in the tions can label things as innately valuable (unsurprising); + sense that neuronal value systems are themselves subject but how can simply labelling states engender adaptive + to selective pressure. behaviour? In the next section, we return to reinforce- + 90 + This theory, particularly value-dependent learning , ment learning and related formulations of action that try + has deep connections with reinforcement learning and to explain adaptive behaviour purely in terms of labels + Reentrant signalling related approaches in engineering (see below), such as or cost functions. + Reciprocal message passing dynamic programming and temporal difference mod- + 91,92 + among neuronal groups. els . This is because neuronal value systems reinforce Optimal control theory and game theory + connections to themselves, thereby enabling the brain value is central to theories of brain function that are + Reinforcement learning to label a sensory state as valuable if, and only if, it leads to based on reinforcement learning and optimum con- + An area of machine learning another valuable state. This ensures that agents move trol. The basic notion that underpins these treatments + concerned with how an agent through a succession of states that have acquired value to is that the brain optimizes value, which is expected + maximizes long-term reward. access states (rewards) with genetically specified innate reward or utility (or its complement — expected loss + Reinforcement learning + algorithms attempt to find a value. In short, the brain maximizes value, which may be or cost). This is seen in behavioural psychology as rein- + policy that maps states of the 98 + reflected in the discharge of value systems (for example, forcement learning , in computational neuroscience + world to actions performed by dopaminergic systems92–96). so how does this relate to and machine learning as variants of dynamic program- + the agent. the optimization of free energy? ming such as temporal difference learning99–101, and in + Optimal control theory The answer is simple: value is inversely proportional economics as expected utility theory102. The notion of + An optimization method to surprise, in the sense that the probability of a pheno- an expected reward or cost is crucial here; this is the + (based on the calculus of type being in a particular state increases with the value cost expected over future states, given a particular policy + variations) for deriving an of that state. Furthermore, the evolutionary value of that prescribes action or choices. A policy specifies the + optimal control law in a a phenotype is the negative surprise averaged over all states to which an agent will move from any given state + dynamical system. A control + problem includes a cost the states it experiences, which is simply its negative (‘motion through state space in continuous time’). This + function that is a function of entropy. Indeed, the whole point of minimizing free policy has to access sparse rewarding states using a cost + state and control variables. energy (and implicitly entropy) is to ensure that agents function, which only labels states as costly or not. The + Bellman equation spend most of their time in a small number of valuable problem of how the policy is optimized is formalized + (Or dynamic programming states. This means that free energy is the complement of in optimal control theory as the Bellman equation and its + equation.) Named after value, and its long-term average is the complement of variants99 (see supplementary information s4 (box)), + Richard Bellman, it is a adaptive fitness (also known as free fitness in evolution- which express value as a function of the optimal policy + necessary condition for ary biology97). But how do agents know what is valu- and a cost function. If one can solve the Bellman equa- + optimality associated with able? In other words, how does one generation tell the tion, one can associate each sensory state with a value + dynamic programming in + optimal control theory. next which states have value (that is, are unsurprising)? and optimize the policy by ensuring that the next state + NATuRE REvIEWs | NeuroscieNce voluME 11 | FEBRuARy 2010 | 133 + © 2010 Macmillan Publishers Limited. All rights reserved + REVIEWS + Optimal decision theory is the most valuable of the available states. In general, because it explains why agents must minimize expected + (Or game theory.) An area of it is impossible to solve the Bellman equation exactly, cost. Furthermore, free energy provides a quantitative + applied mathematics but several approximations exist, ranging from simple and seamless connection between the cost functions + concerned with identifying the 98 + values, uncertainties and other Rescorla–Wagner models to more comprehensive for- of reinforcement learning and value in evolutionary + 100 + constraints that determine an mulations like Q-learning . Cost also has a key role in biology. Finally, the dynamical perspective provides a + optimal decision. Bayesian decision theory, in which optimal decisions mechanistic insight into how policies are specified in the + 99 + minimize expected cost in the context of uncertainty brain: according to the principle of optimality cost is the + Gradient ascent about outcomes; this is central to optimal decision theory rate of change of value (see supplementary information + (Or method of steepest 102–104 + ascent.) A first-order (game theory) and behavioural economics . s4 (box)), which depends on changes in sensory states. + optimization scheme that finds so what does free energy bring to the table? If one This suggests that optimal policies can be prescribed by + a maximum of a function by assumes that the optimal policy performs a gradient prior expectations about the motion of sensory states. + changing its arguments in ascent on value, then it is easy to show that value is Put simply, priors induce a fixed-point attractor, and + proportion to the gradient of inversely proportional to surprise (see supplementary when the states arrive at the fixed point, value will stop + the function at the current information s4 (box)). This means that free energy is changing and cost will be minimized. A simple exam- + value. In short, a hill-climbing (an upper bound on) expected cost, which makes sense ple is shown in FIG. 2, in which a cued arm movement + scheme. The opposite scheme + is a gradient descent. as optimal control theory assumes that action mini- is simulated using only prior expectations that the arm + mizes expected cost, whereas the free-energy principle will be drawn to a fixed point (the target). This figure + states that it minimizes free energy. This is important illustrates how computational motor control105–109 can + be formulated in terms of priors and the suppression of + sensory prediction errors (K.J.F., J. Daunizeau, J. Kilner + Predictions and s.J. Kiebel, unpublished observations). More gener- + (2) + ξ ally, it shows how rewards and goals can be considered + (1) v Prediction errors + 16 + ξx as prior expectations that an action is obliged to fulfil + (1) (see also REF. 110). It also suggests how natural selection + μ + v + (1) (1) could optimize behaviour through the genetic specifi- + μ + ξ x + v cation of inheritable or innate priors that constrain the + Movement learning of empirical priors (BOX 2) and subsequent goal- + V trajectory directed action. + s =+ w + visual J visual It should be noted that just expecting to be attracted + (0, 0) to some states may not be sufficient to attain those states. + Motor + signals x This is because one may have to approach attractors vicar- + 1 + x iously through other states (for example, to avoid obsta- + s 1 J cles) or conform to physical constraints on action. These + =+ w 1 + prop x prop + 2 V = (v, v , v ) are some of the more difficult problems of accessing + (1) 1 2 3 + ξv J distal rewards that reinforcement learning and opti- + x2 2 mum control contend with. In these circumstances, + a Action an examination of the density dynamics, on which the + J = J + J = ( j , j ) + ˙a = −∂ εTξ Jointed arm 1 2 1 2 + a free-energy principle is based, suggests that it is sufficient + Figure 2 | A demonstration of cued reaching movements. The lower right part of the to keep moving until an a priori attractor is encountered + figure shows a motor plant, comprising a two-jointed arm with two hidden states, each of (see supplementary information s5 (box)). This entails + which corresponds to a particular angular position of the two joints; the current position destroying unexpected (costly) fixed points in the envi- + Nature Reviews | Neuroscience + of the finger (red circle) is the sum of the vectors describing the location of each joint. ronment by making them unstable (like shifting to a new + Here, causal states in the world are the position and brightness of the target (green position when sitting uncomfortably). Mathematically, + circle). The arm obeys Newtonian mechanics, specified in terms of angular inertia and this means adopting a policy that ensures a positive + friction. The left part of the figure illustrates that the brain senses hidden states directly divergence in costly states (intuitively, this is like being + in terms of proprioceptive input (S ) that signals the angular positions (x ,x ) of the + prop 1 2 pushed through a liquid with negative viscosity or + joints and indirectly through seeing the location of the finger in space (J ,J ). In addition, + 1 2 friction). see FIG. 3 for a solution to the classical + through visual input (S ) the agent senses the target location (v ,v ) and brightness (v ). + visual 1 2 3 mountain car problem using a simple prior that induces + Sensory prediction errors are passed to higher brain levels to optimize the conditional this sort of policy. This prior is on motion through state + expectations of hidden states (that is, the angular position of the joints) and causal (that + is, target) states. The ensuing predictions are sent back to suppress sensory prediction space (that is, changes in states) and enforces exploration + errors. At the same time, sensory prediction errors are also trying to suppress themselves until an attractive state is found. Priors of this sort may + by changing sensory input through action. The grey and black lines denote reciprocal provide a principled way to understand the exploration– + message passing among neuronal populations that encode prediction error and 111–113 + exploitation trade-off and related issues in evolu- + conditional expectations; this architecture is the same as that depicted in BOX 2. The 114 + blue lines represent descending motor control signals from sensory prediction-error tionary biology . The implicit use of priors to induce + units. The agent’s generative model included priors on the motion of hidden states that dynamical instability also provides a key connection + effectively engage an invisible elastic band between the finger and target (when the to dynamical systems theory approaches to the brain + target is illuminated). This induces a prior expectation that the finger will be drawn to that emphasize the importance of itinerant dynamics, + the target, when cued appropriately. The insert shows the ensuing movement trajectory metastability, self-organized criticality and winner- + caused by action. The red circles indicate the initial and final positions of the finger, less competition115–123. These dynamical phenomena + which reaches the target (green circle) quickly and smoothly; the blue line is the have a key role in synergetic and autopoietic accounts of + simulated trajectory. adaptive behaviour5,124,125. + 134 | FEBRuARy 2010 | voluME 11 www.nature.com/reviews/neuro + © 2010 Macmillan Publishers Limited. All rights reserved + REVIEWS + ab + The mountain car problem Loss functions (priors) Conditional expectations + 0.7 5 30 + 0.6 0 25 −c(t) + 0.5 –5 es20 + t + a + 0.4 e–10 c(x) 15 + c + or–15 ted st + Height0.3 ϕ(x) F 10 + 0.2 –20 Estima 5 + x + μ(t) + 0.1 –25 0 + 0 –30 –5 + -2 -1 0 12 –2 –1 012 0 20 40 60 80 100120 + Position (x) Position (x) Time (seconds) + Principle of optimality + An optimal policy has Equations of motion Trajectories Action + the property that whatever the + initial state and initial decision, ˙x x′ 2 3 + the remaining decisions must f == 1 x′ + ˙x′ −∇ϕ − ⁄8 x′ + σ(a) 2 + constitute an optimal policy x + with regard to the state 1 l a(t) + resulting from the first decision. 1 + Exploration–exploitation 0 ol signa0 + elocity + trade-off V ontr + Involves a balance between C–1 + exploration (of uncharted –1 + –2 + territory) and exploitation (of + current knowledge). In –2 –3 + reinforcement learning, it has –2 –1 012 0 20 40 60 80 100120 + been studied mainly through Position (x) Time (seconds) + the multi-armed bandit + problem. Figure 3 | solving the mountain car problem with prior expectations. a | How paradoxical but adaptive behaviour (for + Nature Reviews | Neuroscience + example, moving away from a target to ensure that it is secured later) emerges from simple priors on the motion of hidden + Dynamical systems theory states in the world. Shown is the landscape or potential energy function (with a minimum at position x = –0.5) that exerts + An area of applied forces on a mountain car. The car is shown at the target position on the hill at x =1, indicated by the red circle. The equations + mathematics that describes of motion of the car are shown below the plot. Crucially, at x = 0 the force on the car cannot be overcome by the agent, + the behaviour of complex because a squashing function –1≤σ≤1 is applied to action to prevent it being greater than 1. This means that the agent can + (possibly chaotic) dynamical access the target only by starting halfway up the left hill to gain enough momentum to carry it up the other side. b | The + systems as described by results of active inference under priors that destabilize fixed points outside the target domain. The priors are encoded in a + differential or difference cost function c(x) (top left), which acts like negative friction. When ‘friction’ is negative the car expects to go faster (see + equations. + Supplementary information S5 (box) for details). The inferred hidden states (upper right: position in blue, velocity in green + Synergetics and negative dissipation in red) show that the car explores its landscape until it encounters the target, and that friction then + Concerns the self-organization increases (that is, cost decreases) dramatically to prevent the car from escaping the target (by falling down the hill). The + of patterns and structures in ensuing trajectory is shown in blue (bottom left). The paler lines provide exemplar trajectories from other trials, with + open systems far from different starting positions. In the real world, friction is constant. However, the car ‘expects’ friction to change as it changes + thermodynamic equilibrium. It position, thus enforcing exploration or exploitation. These expectations are fulfilled by action (lower right). + rests on the order parameter + concept, which was generalized + by Haken to the enslaving + principle: that is, the dynamics In summary, optimal control and decision (game) Conclusions and future directions + of fast-relaxing (stable) modes theory start with the notion of cost or utility and try to Although contrived to highlight commonalities, this + are completely determined by + the ‘slow’ dynamics of order construct value functions of states, which subsequently Review suggests that many global theories of brain + parameters (the amplitudes of guide action. The free-energy formulation starts with function can be united under a Helmholtzian percep- + unstable modes). a free-energy bound on the value of states, which is tive of the brain as a generative model of the world it + 18,20,21,25 + Autopoietic specified by priors on the motion of hidden environ- inhabits (FIG. 4); notable examples include the + Referring to the fundamental mental states. These priors can incorporate any cost integration of the Bayesian brain and computational + dialectic between structure function to ensure that costly states are avoided. states motor control theory, the objective functions shared + and function. with minimum cost can be set (by learning or evolu- by predictive coding and the infomax principle, + Helmholtzian tion) in terms of prior expectations about motion and hierarchical inference and theories of attention, the + Refers to a device or scheme the attractors that ensue. In this view, the problem of embedding of perception in natural selection and + that uses a generative model to finding sparse rewards in the environment is nature’s the link between optimum control and more exotic + furnish a recognition density solution to the problem of how to minimize the entropy phenomena in dynamical systems theory. The constant + and learns hidden structures in (average surprise or free energy) of an agent’s states: by theme in all these theories is that the brain optimizes + data by optimizing the ensuring they occupy a small set of attracting (that is, a (free-energy) bound on surprise or its complement, + parameters of generative + models. rewarding) states. value. This manifests as perception (so as to change + NATuRE REvIEWs | NeuroscieNce voluME 11 | FEBRuARy 2010 | 135 + © 2010 Macmillan Publishers Limited. All rights reserved + REVIEWS + Attention and biased competition Computational motor control + μ = arg min dtF T + γ ∫ ˙a = −∂aε ξ + Optimization of synaptic gain Minimization of sensory + representing the precision prediction errors + (salience) of predictions + Predictive coding and hierarchical inference + (i) (i)((i) i) (i + 1) + = Dμ − ∂ ε Tξ − ξ + ˙μ + Associative plasticity v v v v + �μ = −∂ εTξ Minimization of prediction error Optimal control and value learning + θij θij with recurrent message passing + ~ + Optimization of synaptic efficacy a, μ = arg max V (s | m) + The Bayesian brain hypothesis Optimization of a free-energy + μ = arg min D ~ bound on surprise or value + Perceptual learning and memory (q(ϑ) || (p(ϑ | s)) + KL + = arg min dtF Minimizing the difference between a + μ + θ ∫ recognition density and the conditional + Optimization of synaptic efficacy density on sensory causes + to represent causal structure + in the sensorium + The free-energy principle Infomax and the redundancy + ~ + a, μ, m = arg min F (s, + Probabilistic neuronal coding μ | m) minimization principle + Minimization of the free energy of ~ μ ) − H(μ)} + q(ϑ ) = N ( μ, Σ) sensations and the representation μ = arg max {I (s, + Encoding a recognition density of their causes Maximization of the mutual + in terms of conditional information between sensations + expectations and uncertainty Model selection and evolution and representations + m = arg min dtF + ∫ + Optimizing the agent’s model and + priors through neurodevelopment + and natural selection + Figure 4 | The free-energy principle and other theories. Some of the theoretical constructs considered in this Review + Nature Reviews | Neuroscience + and how they relate to the free-energy principle (centre). The variables are described in BOXES 1,2 and a full explanation + of the equations can be found in the Supplementary information S1–S4 (boxes). + predictions) or action (so as to change the sensations to old problems that might call for a reappraisal of + that are predicted). Crucially, these predictions depend conventional notions, particularly in reinforcement + on prior expectations (that furnish policies), which learning and motor control. + are optimized at different (somatic and evolutionary) If the arguments underlying the free-energy principle + timescales and define what is valuable. hold, then the real challenge is to understand how it + What does the free-energy principle portend for the manifests in the brain. This speaks to a greater appre- + 41 + future? If its main contribution is to integrate estab- ciation of hierarchical message passing , the func- + lished theories, then the answer is probably ‘not a lot’. tional role of specific neurons and microcircuits and + Conversely, it may provide a framework in which cur- the dynamics they support (for example, what is the + rent debates could be resolved, for example whether relationship between predictive coding, attention + 129 + dopamine encodes reward prediction error or sur- and dynamic co ordination in the brain? ). Beyond + prise126,127 — this is particularly important for under- neuroscience, many exciting applications in engineering, + standing conditions like addiction, Parkinson’s disease robotics, embodied cognition and evolutionary biology + and schizophrenia. Indeed, the free-energy formulation suggest themselves; although fanciful, it is not difficult to + has already been used to explain the positive symptoms imagine building little free-energy machines that garner + 128 + of schizophrenia in terms of false inference . The free- and model sensory information (like our children) to + energy formulation could also provide new approaches maximize the evidence for their own existence. + 1. Huang, G. Is this a unified theory of the brain? paper focuses on perception and the Physics, Chemistry and Biology 3rd edn (Springer, + New Scientist 2658, 30–33 (2008). neurobiological infrastructures involved. New York, 1983). + 2. Friston K., Kilner, J. & Harrison, L. A free energy 3. Ashby, W. R. Principles of the self-organising dynamic 6. Kauffman, S. The Origins of Order: Self‑Organization + principle for the brain. J. Physiol. Paris 100, 70–87 system. J. Gen. Psychol. 37, 125–128 (1947). and Selection in Evolution (Oxford Univ. Press, Oxford, + (2006). 4. Nicolis, G. & Prigogine, I. Self‑Organisation in Non‑ 1993). + An overview of the free-energy principle that Equilibrium Systems (Wiley, New York, 1977). 7. Bernard, C. Lectures on the Phenomena Common + describes its motivation and relationship to 5. Haken, H. Synergistics: an Introduction. Non‑ to Animals and Plants (Thomas, Springfield, + generative models and predictive coding. This Equilibrium Phase Transition and Self‑Organisation in 1974). + 136 | FEBRuARy 2010 | voluME 11 www.nature.com/reviews/neuro + © 2010 Macmillan Publishers Limited. All rights reserved + REVIEWS + 8. Applebaum, D. Probability and Information: an 36. Zemel, R., Dayan, P. & Pouget, A. Probabilistic 60. Laughlin, S. B. Efficiency and complexity in neural + Integrated Approach (Cambridge Univ. Press, interpretation of population code. Neural Comput. 10, coding. Novartis Found. Symp. 239, 177–187 + Cambridge, UK, 2008). 403–430 (1998). (2001). + 9. Evans, D. J. A non-equilibrium free energy theorem 37. Paulin, M. G. Evolution of the cerebellum as a 61. Tipping, M. E. Sparse Bayesian learning and the + for deterministic systems. Mol. Physics 101, neuronal machine for Bayesian state estimation. Relevance Vector Machine. J. Machine Learn. Res. 1, + 15551–11554 (2003). J. Neural Eng. 2, S219–S234 (2005). 211–244 (2001). + 10. Crauel, H. & Flandoli, F. Attractors for random 38. Ma, W. J., Beck, J. M., Latham, P. E. & Pouget, A. 62. Paus, T., Keshavan, M. & Giedd, J. N. Why do many + dynamical systems. Probab. Theory Relat. Fields 100, Bayesian inference with probabilistic population psychiatric disorders emerge during adolescence? + 365–393 (1994). codes. Nature Neurosci. 9, 1432–1438 (2006). Nature Rev. Neurosci. 9, 947–957 (2008). + 11. Feynman, R. P. Statistical Mechanics: a Set of Lectures 39. Friston, K., Mattout, J., Trujillo-Barreto, N., 63. Gilestro, G. F., Tononi, G. & Cirelli, C. Widespread + (Benjamin, Reading, Massachusetts, 1972). Ashburner, J. & Penny, W. Variational free energy and changes in synaptic markers as a function of sleep and + 12. Hinton, G. E. & von Cramp, D. Keeping neural the Laplace approximation. Neuroimage 34, wakefulness in Drosophila. Science 324, 109–112 + networks simple by minimising the description length 220–234 (2007). (2009). + of weights. Proc. 6th Annu. ACM Conf. Computational 40. Rao, R. P. & Ballard, D. H. Predictive coding in the 64. Roweis, S. & Ghahramani, Z. A unifying review of + Learning Theory 5–13 (1993). visual cortex: a functional interpretation of some linear Gaussian models. Neural Comput. 11, 305–345 + 13. MacKay. D. J. C. Free-energy minimisation algorithm extra-classical receptive field effects. Nature Neurosci. (1999). + for decoding and cryptoanalysis. Electron. Lett. 31, 2, 79–87 (1998). 65. Hebb, D. O. The Organization of Behaviour (Wiley, + 445–447 (1995). Applies predictive coding to cortical processing to New York, 1949). + 14. Neal, R. M. & Hinton, G. E. in Learning in Graphical provide a compelling account of extra-classical 66. Paulsen, O. & Sejnowski, T. J. Natural patterns of + Models (ed. Jordan, M. I.) 355–368 (Kluwer receptive fields in the visual system. It emphasizes activity and long-term synaptic plasticity. Curr. Opin. + Academic, Dordrecht, 1998). the importance of top-down projections in Neurobiol. 10, 172–179 (2000). + 15. Itti, L. & Baldi, P. Bayesian surprise attracts human providing predictions, by modelling perceptual 67. von der Malsburg, C. The Correlation Theory of Brain + attention. Vision Res. 49, 1295–1306 (2009). inference. Function. Internal Report 81–82, Dept. Neurobiology, + 16. Friston, K., Daunizeau, J. & Kiebel, S. Active inference 41. Mumford, D. On the computational architecture of the Max-Planck-Institute for Biophysical Chemistry + or reinforcement learning? PLoS ONE 4, e6421 neocortex. II. The role of cortico-cortical loops. Biol. (1981). + (2009). Cybern. 66, 241–251 (1992). 68. Singer, W. & Gray, C. M. Visual feature integration and + 17. Knill, D. C. & Pouget, A. The Bayesian brain: the role 42. Friston, K. Hierarchical models in the brain. PLoS the temporal correlation hypothesis. Annu. Rev. + of uncertainty in neural coding and computation. Comput. Biol. 4, e1000211 (2008). Neurosci. 18, 555–586 (1995). + Trends Neurosci. 27, 712–719 (2004). 43. Murray, S. O., Kersten, D., Olshausen, B. A., Schrater, P. 69. Bienenstock, E. L., Cooper, L. N. & Munro, P. W. + A nice review of Bayesian theories of perception & Woods, D. L. Shape perception reduces activity in Theory for the development of neuron selectivity: + and sensorimotor control. Its focus is on Bayes human primary visual cortex. Proc. Natl Acad. Sci. orientation specificity and binocular interaction in + optimality in the brain and the implicit nature of USA 99, 15164–15169 (2002). visual cortex. J. Neurosci. 2, 32–48 (1982). + neuronal representations. 44. Garrido, M. I., Kilner, J. M., Kiebel, S. J. & Friston, 70. Abraham, W. C. & Bear, M. F. Metaplasticity: the + 18. von Helmholtz, H. in Treatise on Physiological Optics K. J. Dynamic causal modeling of the response to plasticity of synaptic plasticity. Trends Neurosci. 19, + Vol. III 3rd edn (Voss, Hamburg, 1909). frequency deviants. J. Neurophysiol. 101, 126–130 (1996). + 19. MacKay, D. M. in Automata Studies (eds Shannon, 2620–2631 (2009). 71. Pareti, G. & De Palma, A. Does the brain oscillate? + C. E. & McCarthy, J.) 235–251 (Princeton Univ. Press, 45. Sherman, S. M. & Guillery, R. W. On the actions that The dispute on neuronal synchronization. Neurol. Sci. + Princeton, 1956). one nerve cell can have on another: distinguishing 25, 41–47 (2004). + 20. Neisser, U. Cognitive Psychology “drivers” from “modulators”. Proc. Natl Acad. Sci. USA 72. Leutgeb, S., Leutgeb, J. K., Moser, M. B. & Moser, E. I. + (Appleton-Century-Crofts, New York, 1967). 95, 7121–7126 (1998). Place cells, spatial maps and the population code for + 21. Gregory, R. L. Perceptual illusions and brain models. 46. Angelucci, A. & Bressloff, P. C. Contribution of memory. Curr. Opin. Neurobiol. 15, 738–746 + Proc. R. Soc. Lond. B Biol. Sci. 171, 179–196 (1968). feedforward, lateral and feedback connections to the (2005). + 22. Gregory, R. L. Perceptions as hypotheses. Philos. classical receptive field center and extra-classical 73. Durstewitz, D. & Seamans, J. K. Beyond bistability: + Trans. R. Soc. Lond. B Biol. Sci. 290, 181–197 (1980). receptive field surround of primate V1 neurons. biophysics and temporal dynamics of working memory. + 23. Ballard, D. H., Hinton, G. E. & Sejnowski, T. J. Parallel Prog. Brain Res. 154, 93–120 (2006). Neuroscience 139, 119–133 (2006). + visual computation. Nature 306, 21–26 (1983). 47. Grossberg, S. Towards a unified theory of neocortex: 74. Anishchenko, A. & Treves, A. Autoassociative memory + 24. Kawato, M., Hayakawa, H. & Inui, T. A forward-inverse laminar cortical circuits for vision and cognition. retrieval and spontaneous activity bumps in small- + optics model of reciprocal connections between visual Prog. Brain Res. 165, 79–104 (2007). world networks of integrate-and-fire neurons. + areas. Network: Computation in Neural Systems 4, 48. Grossberg, S. & Versace, M. Spikes, synchrony, and J. Physiol. Paris 100, 225–236 (2006). + 415–422 (1993). attentive learning by laminar thalamocortical circuits. 75. Abbott, L. F., Varela, J. A., Sen, K. & Nelson, S. B. + 25. Dayan, P., Hinton, G. E. & Neal, R. M. The Helmholtz Brain Res. 1218, 278–312 (2008). Synaptic depression and cortical gain control. Science + machine. Neural Comput. 7, 889–904 (1995). 49. Barlow, H. in Sensory Communication (ed. Rosenblith, W.) 275, 220–224 (1997). + This paper introduces the central role of generative 217–234 (MIT Press, Cambridge, Massachusetts, 76. Yu, A. J. & Dayan, P. Uncertainty, neuromodulation + models and variational approaches to hierarchical 1961). and attention. Neuron 46, 681–692 (2005). + self-supervised learning and relates this to the 50. Linsker, R. Perceptual neural organisation: some 77. Doya, K. Metalearning and neuromodulation. Neural + function of bottom-up and top-down cortical approaches based on network models and Netw. 15, 495–506 (2002). + processing pathways. information theory. Annu. Rev. Neurosci. 13, 78. Chawla, D., Lumer, E. D. & Friston, K. J. The + 26. Lee, T. S. & Mumford, D. Hierarchical Bayesian 257–281 (1990). relationship between synchronization among neuronal + inference in the visual cortex. J. Opt. Soc. Am. A Opt. 51. Oja, E. Neural networks, principal components, and populations and their mean activity levels. Neural + Image Sci. Vis. 20, 1434–1448 (2003). subspaces. Int. J. Neural Syst. 1, 61–68 (1989). Comput. 11, 1389–1411 (1999). + 27. Kersten, D., Mamassian, P. & Yuille, A. Object 52. Bell, A. J. & Sejnowski, T. J. An information 79. Fries, P., Womelsdorf, T., Oostenveld, R. & Desimone, R. + perception as Bayesian inference. Annu. Rev. Psychol. maximisation approach to blind separation and blind The effects of visual stimulation and selective visual + 55, 271–304 (2004). de-convolution. Neural Comput. 7, 1129–1159 attention on rhythmic neuronal synchronization in + 28. Friston, K. J. A theory of cortical responses. Philos. (1995). macaque area V4. J. Neurosci. 28, 4823–4835 + Trans. R. Soc. Lond. B Biol. Sci. 360, 815–836 53. Atick, J. J. & Redlich, A. N. What does the retina know (2008). + (2005). about natural scenes? Neural Comput. 4, 196–210 80. Womelsdorf, T. & Fries, P. Neuronal coherence during + 29. Beal, M. J. Variational Algorithms for Approximate (1992). selective attentional processing and sensory-motor + Bayesian Inference. Thesis, University College London 54. Optican, L. & Richmond, B. J. Temporal encoding of integration. J. Physiol. Paris 100, 182–193 (2006). + (2003). two-dimensional patterns by single units in primate 81. Desimone, R. Neural mechanisms for visual memory + 30. Efron, B. & Morris, C. Stein’s estimation rule and its inferior cortex. III Information theoretic analysis. and their role in attention. Proc. Natl Acad. Sci. USA + competitors – an empirical Bayes approach. J. Am. J. Neurophysiol. 57, 132–146 (1987). 93, 13494–13499 (1996). + Stats. Assoc. 68, 117–130 (1973). 55. Olshausen, B. A. & Field, D. J. Emergence of simple- A nice review of mnemonic effects (such as + 31. Kass, R. E. & Steffey, D. Approximate Bayesian cell receptive field properties by learning a sparse repetition suppression) on neuronal responses and + inference in conditionally independent hierarchical code for natural images. Nature 381, 607–609 how they bias the competitive interactions between + models (parametric empirical Bayes models). J. Am. (1996). stimulus representations in the cortex. It provides + Stat. Assoc. 407, 717–726 (1989). 56. Simoncelli, E. P. & Olshausen, B. A. Natural image a good perspective on attentional mechanisms in + 32. Zeki, S. & Shipp, S. The functional logic of cortical statistics and neural representation. Annu. Rev. the visual system that is empirically grounded. + connections. Nature 335, 311–317 (1988). Neurosci. 24, 1193–1216 (2001). 82. Treisman, A. Feature binding, attention and object + Describes the functional architecture of cortical A nice review of information theory in visual perception. Philos. Trans. R. Soc. Lond. B Biol. Sci. + hierarchies with a focus on patterns of anatomical processing. It covers natural scene statistics and 353, 1295–1306 (1998). + connections in the visual cortex. It emphasizes the empirical tests of the efficient coding hypothesis in 83. Maunsell, J. H. & Treue, S. Feature-based attention in + role of functional segregation and integration (that individual neurons and populations of neurons. visual cortex. Trends Neurosci. 29, 317–322 (2006). + is, message passing among cortical areas). 57. Friston, K. J. The labile brain. III. Transients and 84. Spratling, M. W. Predictive-coding as a model of + 33. Felleman, D. J. & Van Essen, D. C. Distributed spatio-temporal receptive fields. Philos. Trans. R. Soc. biased competition in visual attention. Vision Res. 48, + hierarchical processing in the primate cerebral cortex. Lond. B Biol. Sci. 355, 253–265 (2000). 1391–1408 (2008). + Cereb. Cortex 1, 1–47 (1991). 58. Bialek, W., Nemenman, I. & Tishby, N. Predictability, 85. Reynolds, J. H. & Heeger, D. J. The normalization + 34. Mesulam, M. M. From sensation to cognition. Brain complexity, and learning. Neural Comput. 13, model of attention. Neuron 61, 168–185 (2009). + 121, 1013–1052 (1998). 2409–2463 (2001). 86. Schroeder, C. E., Mehta, A. D. & Foxe, J. J. + 35. Sanger, T. Probability density estimation for the 59. Lewen, G. D., Bialek, W. & de Ruyter van Steveninck, Determinants and mechanisms of attentional + interpretation of neural population codes. R. R. Neural coding of naturalistic motion stimuli. modulation of neural processing. Front. Biosci. 6, + J. Neurophysiol. 76, 2790–2793 (1996). Network 12, 317–329 (2001). D672–D684 (2001). + NATuRE REvIEWs | NeuroscieNce voluME 11 | FEBRuARy 2010 | 137 + © 2010 Macmillan Publishers Limited. All rights reserved + REVIEWS + 87. Hirayama, J., Yoshimoto, J. & Ishii, S. Bayesian 106. Todorov, E. & Jordan, M. I. Smoothness maximization 119. Bressler, S. L. & Tognoli, E. Operational principles of + representation learning in the cortex regulated by along a predefined path accurately predicts the speed neurocognitive networks. Int. J. Psychophysiol. 60, + acetylcholine. Neural Netw. 17, 1391–1400 (2004). profiles of complex arm movements. J. Neurophysiol. 139–148 (2006). + 88. Edelman, G. M. Neural Darwinism: selection and 80, 696–714 (1998). 120. Werner, G. Brain dynamics across levels of + reentrant signaling in higher brain function. Neuron 107. Tseng, Y. W., Diedrichsen, J., Krakauer, J. W., organization. J. Physiol. Paris 101, 273–279 (2007). + 10, 115–125 (1993). Shadmehr, R. & Bastian, A. J. Sensory prediction- 121. Pasquale, V., Massobrio, P., Bologna, L. L., + 89. Knobloch, F. Altruism and the hypothesis of meta- errors drive cerebellum-dependent adaptation of Chiappalone, M. & Martinoia, S. Self-organization and + selection in human evolution. J. Am. Acad. reaching. J. Neurophysiol. 98, 54–62 (2007). neuronal avalanches in networks of dissociated cortical + Psychoanal. 29, 339–354 (2001). 108. Bays, P. M. & Wolpert, D. M. Computational neurons. Neuroscience 153, 1354–1369 (2008). + 90. Friston, K. J., Tononi, G., Reeke, G. N. Jr, Sporns, O. & principles of sensorimotor control that minimize 122. Kitzbichler, M. G., Smith, M. L., Christensen, S. R. & + Edelman, G. M. Value-dependent selection in the uncertainty and variability. J. Physiol. 578, 387–396 Bullmore, E. Broadband criticality of human brain + brain: simulation in a synthetic neural model. (2007). network synchronization. PLoS Comput. Biol. 5, + Neuroscience 59, 229–243 (1994). A nice overview of computational principles in e1000314 (2009). + 91. Sutton, R. S. & Barto, A. G. Toward a modern theory of motor control. Its focus is on representing 123. Rabinovich, M., Huerta, R. & Laurent, G. Transient + adaptive networks: expectation and prediction. uncertainty and optimal estimation when dynamics for neural processing. Science 321 48–50 + Psychol. Rev. 88, 135–170 (1981). extracting the sensory information required for (2008). + 92. Montague, P. R., Dayan, P., Person, C. & Sejnowski, motor planning. 124. Tschacher, W. & Hake, H. Intentionality in non- + T. J. Bee foraging in uncertain environments using 109. Shadmehr, R. & Krakauer, J. W. A computational equilibrium systems? The functional aspects of self- + predictive Hebbian learning. Nature 377, 725–728 neuroanatomy for motor control. Exp. Brain Res. 185, organised pattern formation. New Ideas Psychol. 25, + (1995). 359–381 (2008). 1–15 (2007). + A computational treatment of behaviour that 110. Verschure, P. F., Voegtlin, T. & Douglas, R. J. 125. Maturana, H. R. & Varela, F. De máquinas y seres + combines ideas from optimal control theory and Environmentally mediated synergy between vivos (Editorial Universitaria, Santiago, 1972). + dynamic programming with the neurobiology of perception and behaviour in mobile robots. Nature English translation available in Maturana, H. R. & + reward. This provided an early example of value 425, 620–624 (2003). Varela, F. in Autopoiesis and Cognition (Reidel, + learning in the brain. 111. Cohen, J. D., McClure, S. M. & Yu, A. J. Should I stay Dordrecht, 1980). + 93. Schultz, W. Predictive reward signal of dopamine or should I go? How the human brain manages the 126. Fiorillo, C. D., Tobler, P. N. & Schultz, W. Discrete + neurons. J. Neurophysiol. 80, 1–27 (1998). trade-off between exploitation and exploration. Philos. coding of reward probability and uncertainty by + 94. Daw, N. D. & Doya, K. The computational Trans. R. Soc. Lond. B Biol. Sci. 362, 933–942 dopamine neurons. Science 299, 1898–1902 + neurobiology of learning and reward. Curr. Opin. (2007). (2003). + Neurobiol. 16, 199–204 (2006). 112. Ishii, S., Yoshida, W. & Yoshimoto, J. Control of 127. Niv, Y., Duff, M. O. & Dayan, P. Dopamine, + 95. Redgrave, P. & Gurney, K. The short-latency dopamine exploitation-exploration meta-parameter in uncertainty and TD learning. Behav. Brain Funct. 1, 6 + signal: a role in discovering novel actions? Nature Rev. reinforcement learning. Neural Netw. 15, 665–687 (2005). + Neurosci. 7, 967–975 (2006). (2002). 128. Fletcher, P. C. & Frith, C. D. Perceiving is believing: a + 96. Berridge, K. C. The debate over dopamine’s role in 113. Usher, M., Cohen, J. D., Servan-Schreiber, D., Bayesian approach to explaining the positive + reward: the case for incentive salience. Rajkowski, J. & Aston-Jones, G. The role of locus symptoms of schizophrenia. Nature Rev. Neurosci. 10, + Psychopharmacology (Berl.) 191, 391–431 (2007). coeruleus in the regulation of cognitive performance. 48–58 (2009). + 97. Sella, G. & Hirsh, A. E. The application of statistical Science 283, 549–554 (1999). 129. Phillips, W. A. & Silverstein, S. M. Convergence of + physics to evolutionary biology. Proc. Natl Acad. Sci. 114. Voigt, C. A., Kauffman, S. & Wang, Z. G. Rational biological and psychological perspectives on cognitive + USA 102, 9541–9546 (2005). evolutionary design: the theory of in vitro protein coordination in schizophrenia. Behav. Brain Sci. 26, + 98. Rescorla, R. A. & Wagner, A. R. in Classical evolution. Adv. Protein Chem. 55, 79–160 (2000). 65–82 (2003). + Conditioning II: Current Research and Theory (eds 115. Freeman, W. J. Characterization of state transitions in 130. Friston, K. & Kiebel, S. Cortical circuits for perceptual + Black, A. H. & Prokasy, W. F.) 64–99 (Appleton spatially distributed, chaotic, nonlinear, dynamical inference. Neural Netw. 22, 1093–1104 (2009). + Century Crofts, New York, 1972). systems in cerebral cortex. Integr. Physiol. Behav. Sci. + 99. Bellman, R. On the Theory of Dynamic Programming. 29, 294–306 (1994). Acknowledgments + Proc. Natl Acad. Sci. USA 38, 716–719 (1952). 116. Tsuda, I. Toward an interpretation of dynamic neural This work was funded by the Wellcome Trust. I would like to + 100. Watkins, C. J. C. H. & Dayan, P. Q-learning. Mach. activity in terms of chaotic dynamical systems. Behav. thank my colleagues at the Wellcome Trust Centre for + Learn. 8, 279–292 (1992). Brain Sci. 24, 793–810 (2001). Neuroimaging, the Institute of Cognitive Neuroscience and the + 101. Todorov, E. in Advances in Neural Information 117. Jirsa, V. K., Friedrich, R., Haken, H. & Kelso, J. A. Gatsby Computational Neuroscience Unit for collaborations + Processing Systems (eds Scholkopf, B., Platt, J. & A theoretical model of phase transitions in the human and discussions. + Hofmann T.) 19, 1369–1376 (MIT Press, 2006). brain. Biol. Cybern. 71, 27–35 (1994). + 102. Camerer, C. F. Behavioural studies of strategic thinking This paper develops a theoretical model (based on Competing interests statement + in games. Trends Cogn. Sci. 7, 225–231 (2003). synergetics and nonlinear oscillator theory) that The author declares no competing financial interests. + 103. Smith, J. M. & Price, G. R. The logic of animal conflict. reproduces observed dynamics and suggests a + Nature 246, 15–18 (1973). formulation of biophysical coupling among brain SUPPLEMENTARY INFORMATION + 104. Nash, J. Equilibrium points in n-person games. systems. See online article: S1 (box) | S2 (box) | S3 (box) | S4 (box) | + Proc. Natl Acad. Sci. USA 36, 48–49 (1950). 118. Breakspear, M. & Stam, C. J. Dynamics of a S5 (box) + 105. Wolpert, D. M. & Miall, R. C. Forward models for neural system with a multiscale architecture. Philos. + physiological motor control. Neural Netw. 9, Trans. R. Soc. Lond. B Biol. Sci. 360, 1051–1074 All liNks Are AcTive iN The oNliNe pdf + 1265–1279 (1996). (2005). + 138 | FEBRuARy 2010 | voluME 11 www.nature.com/reviews/neuro + © 2010 Macmillan Publishers Limited. All rights reserved + + SUPPLEMENTARY INFORMATION In format provided by Friston (FEBRUARY 2010) + Supplementary information S1 (box): The entropy of sensory states and their causes + This box shows that the entropy of hidden states in the environment is bounded by the + entropy of sensory states. This means that if the entropy of sensory signals is minimised, so + is the entropy of the environmental states that caused them. For any agent or model m the + entropy of generalised sensory states % ′ ′′ T is simply their average surprise + s(t) =[s,s ,s ,K] + % + −ln p(s |m) (with a sight abuse of notion) + + T + % % % % % S1.1 + H(s|m):=∫−p(s|m)ln p(s|m)ds = lim∫−ln p(s(t)|m)dt + T→• + 0 + + Under ergodic assumptions, this is just the long-term time or path-integral of surprise. We will + assume sensory states are an analytic function of hidden environmental states plus some + generalised random fluctuations + + % % % + s = g(x,θ)+ z + & S1.2 + % % % + x = f (x,θ)+w + + Here, hidden states change according to the stochastic differential equations of motion (with + % % + parameters θ ) in S1.2. Because x and z are statistically independent, we have (see Eq. + 6.4.6 in Jones 1979, p149) + + % % % % % % + I(s,z) = H(s |m)−H(x|m)− p(x|m)ln|∂%g|dx S1.3 + ∫ x + + % % + Here, I(s,z) ≥ 0 is the mutual information between the sensory states and noise. By Gibb’s + inequality this cross-entropy or Kullback-Leibler divergence is non-negative (Theorem 6.5; + Jones 1979, p151). This means the entropy of the sensory states is greater than the entropy + of the sensory mapping. Here. ∂%g is the sensitivity or gradient of the sensory mapping with + x + respect to the hidden states. The integral in S1.3 reflects the fact that entropy is not invariant + NATURE REVIEWS | NEUROSCIENCE www.nature.com/reviews/neuro + © 2010 Macmillan Publishers Limited. All rights reserved. + + SUPPLEMENTARY INFORMATION In format provided by Friston (FEBRUARY 2010) + % % + to a change of variables and assumes that the sensory mapping g : x → s is diffeomorphic + (i.e., bijective and smooth). This requires the hidden and sensory state-spaces to have the + same dimension, which can be assured by truncating generalised states at an appropriately + high order. For example, if we had n hidden states in m generalised coordinates of motion, + we would consider m sensory states in n generalised coordinates; so that + % % + dim(x)=dim(s)=n×m. Finally, rearranging S1.3 gives + + % % % % + H(x|m)≤H(s|m)− p(x|m)ln|∂%g|dx S1.4 + ∫ x + + In conclusion, the entropy of hidden states is upper-bounded by the entropy of sensations, + assuming their sensitivity to hidden states is constant, over the range of states encountered. + + Clearly, the ergodic assumption in S1.1 only holds over certain temporal scales for real + organisms that are on a trajectory from birth to death. This scale can be somatic (e.g., over + days or months, where development is locally stationary) or evolutionary (e.g., over + generations, where evolution is locally stationary). + + + Reference + Jones, DS. (1979). Elementary information theory. Publisher: Oxford: Clarendon Press; New + York: Oxford University Press + + NATURE REVIEWS | NEUROSCIENCE www.nature.com/reviews/neuro + © 2010 Macmillan Publishers Limited. All rights reserved. + + SUPPLEMENTARY INFORMATION In format provided by Friston (FEBRUARY 2010) + Supplementary information S2 (box): Variational free energy + Here, we derive the free-energy and show how its various formulations relate to each other. + We start with the quantity we want to bound; namely, the surprise or log-evidence associated + % % + with sensory states s(t) that have been caused by some unknown quantities ϑ …{x,θ} , + which include the hidden states and parameters in box (S1) + + % % + −ln p(s(t)) = −ln∫ p(s(t),ϑ)dϑ S2.1 + + % + To create a free-energy bound on surprise F (s(t),q(ϑ)), we simply add a non-negative + cross-entropy between an arbitrary (recognition) density on the causes q(ϑ) and their + % + posterior density p(ϑ | s) (dropping the dependency on m for clarity). + + q(ϑ) % + F =∫q(ϑ)ln dϑ−ln p(s) + % + p(ϑ|s) + S2.2 + % % + =D(q(ϑ)|| p(ϑ|s))−ln p(s) + + The cross-entropy term is non-negative by Gibb’s inequality. In short, free-energy is cross- + entropy plus surprise. Because surprise depends only on sensory states, we can bring it + % % % + inside the integral and use p(ϑ,s) = p(ϑ | s)p(s) to show free-energy is expected energy + minus entropy + + F =∫q(ϑ)ln q(ϑ) dϑ + % % + p(ϑ|s)p(s) + % + =∫q(ϑ)lnq(ϑ)dϑ−∫q(ϑ)ln p(ϑ,s)dϑ S2.3 + % + =− ln p(ϑ,s) − −lnq(ϑ) + q q + + NATURE REVIEWS | NEUROSCIENCE www.nature.com/reviews/neuro + © 2010 Macmillan Publishers Limited. All rights reserved. + + SUPPLEMENTARY INFORMATION In format provided by Friston (FEBRUARY 2010) + % % % + where −ln p(ϑ,s) is Gibb’s energy. A final rearrangement, using p(ϑ,s) = p(s |ϑ)p(ϑ), + shows free-energy is also complexity minus accuracy, where complexity is the cross-entropy + between the recognition q(ϑ) and prior density p(ϑ) + + F =∫q(ϑ)ln q(ϑ) dϑ + % + p(s |ϑ)p(ϑ) + q(ϑ) % + =∫q(ϑ)ln p(ϑ)dϑ−∫q(ϑ)ln p(s|ϑ)dϑ S2.4 + % + =D(q(ϑ)|| p(ϑ))− ln p(s|ϑ) + q + + + NATURE REVIEWS | NEUROSCIENCE www.nature.com/reviews/neuro + © 2010 Macmillan Publishers Limited. All rights reserved. + + SUPPLEMENTARY INFORMATION In format provided by Friston (FEBRUARY 2010) + Supplementary information S3 (box): The free-energy principle and infomax + Here, we show that the free-energy principle is a probabilistic generalisation of the infomax + % + principle. The infomax principle requires the mutual information I(s,µ) between sensory + data and their conditional representation µ(t) to be maximal, under prior constraints on the + representations; e.g., p(µ) = N (0,I). This can be stated as an optimisation of an infomax + criterion + + µ∗ =argmaxG + µ + S3.1 + % + G=I(s,µ)−H(µ) + % % + =H(s)−H(s|µ)−H(µ) + + Because the representations do not change sensory data, they are only required to minimise + the average surprise about them, given the representations; and the average surprise about + the representations, given their prior constraints. These are the last two terms in (S3.1). If the + recognition density is a point mass at µ(t); i.e., q(ϑ) =δ(ϑ −µ), the free-energy from + (S2.4) reduces to + + % + F =−ln p(s|µ)−ln p(µ) S3.2 + + From (S1.1), the path-integral of free-energy (also known as free-action) becomes + + % % + AF=∫dt (s(t),µ(t))µ H(s|µ)+H(µ) S3.3 + + This means optimising the conditional expectations with respect to free-energy and (by the + fundamental lemma of variational calculus) free-action, is exactly the same as same as + optimising the infomax criterion + + NATURE REVIEWS | NEUROSCIENCE www.nature.com/reviews/neuro + © 2010 Macmillan Publishers Limited. All rights reserved. + + SUPPLEMENTARY INFORMATION In format provided by Friston (FEBRUARY 2010) + µ∗ =argminFA=argmin =argmaxG S3.4 + µ µ µ + + In short, the infomax principle is a special case of the free-energy principle that obtains when + we discount uncertainty and represent sensory data with point estimates of their causes. + Alternatively, the free-energy is a generalisation of the infomax principle that covers + probability densities on the unknown causes of data. In this context, high mutual information + is assured by maximising accuracy (e.g., minimising prediction error) and the prior constraints + are enforced by minimising complexity (see S2.4) + NATURE REVIEWS | NEUROSCIENCE www.nature.com/reviews/neuro + © 2010 Macmillan Publishers Limited. All rights reserved. + + SUPPLEMENTARY INFORMATION In format provided by Friston (FEBRUARY 2010) + Supplementary information S4 (box): Value and surprise + Here, we compare and contrast optimal control and free-energy formulations of dynamics on + hidden or sensory states. To keep things simple, we will assume the hidden states are known + % + (as is usually assumed in control theory) and ignore random fluctuations; i.e., w(t) = 0 (see + box S1). In optimum control, one starts with a loss or cost-function (negative reward or utility), + % + c(x) and optimises the motion of states to maximise value or expected reward over time + + ∗ % % + a =argmax f(x,a)⋅∇V(x) + a + • S4.1 + % % & % % + V(x(0)) = ∫−c(x(t))dt ⇒V(x(t)) = c(x) + 0 + + The first equality says that motion ascends the gradients of the value-function and the second + just defines value as reward that will be accumulated in the future. Note the equations of + & + motion % % now include action. The value-function is the solution to the celebrated + x = f (x,a) + Hamilton-Jacobi-Bellman equation + + & % % + max V(x(t))−c(x) =0⇒ + { } + a S4.2 + % % % + max f(x,a)⋅∇V(x)−c(x) =0 + { } + a + + This solution ensures that the rate of change of value is cost, as required by the definition of + value. In summary, (S4.1) says that action maximises value and (S4.2) means that value is + the reward expected under this policy. This ensures low-cost regions attract all trajectories + through state-space. + + We now revisit value from the perspective of surprise and free-energy. If we put the random + fluctuations back and assume a general form (the Helmholtz decomposition) for motion: + f =∇V +∇×W , it is fairly easy to relate value and surprise (using the Fokker-Planck + equation, subject to ∇V ⋅(∇×W) = 0) + NATURE REVIEWS | NEUROSCIENCE www.nature.com/reviews/neuro + © 2010 Macmillan Publishers Limited. All rights reserved. + + SUPPLEMENTARY INFORMATION In format provided by Friston (FEBRUARY 2010) + + + % % + V(x)=γ ln p(x|m) + 2 S4.3 + % + c(x) = f ⋅∇V +γ∇ V + + Here, γ > 0 encodes the amplitude of the random fluctuations (and is known as an inverse + sensitivity or temperature parameter). The first equality shows that value is inversely + proportional to surprise, where free-energy is surprise because we know the true states. This + means the value of a state is proportional to the log-probability of finding an agent m in that + state. This is also the log-sojourn time or the proportion of time the state is occupied by that + agent. + + In the limit of small fluctuations γ → 0, the ensemble density % −1 % + p(x|m)=exp(γ V(x)) + becomes a point mass at the minimum of the cost-function. This somewhat trivial case serves + to connect optimal control theory to the equilibrium treatment that underpins the free-energy + scheme. In this limit, cost is just the rate of change of value: % & % , as + c(x) = f ⋅∇V =V(x(t)) + mandated by the definition of value in Equation S4.1, which is the solution to the + (deterministic) Hamilton-Jacobi-Bellman equation (S4.2). + + Crucially, Equation S4.3 also shows that peaks of the equilibrium density can only exist where + cost is zero or less + + ∇V(x)=0 + ⇒c(x)≤0 S4.4 + 2 + ∇V(x)≤0 + + with c(x) = 0 in the limit γ → 0. + + In summary, optimal control theory starts with a cost-function and solves for a value-function + that guides the flow or policy to minimise expected cost. Conversely, the equilibrium + NATURE REVIEWS | NEUROSCIENCE www.nature.com/reviews/neuro + © 2010 Macmillan Publishers Limited. All rights reserved. + + SUPPLEMENTARY INFORMATION In format provided by Friston (FEBRUARY 2010) + perspective starts with flow and derives the implicit value and cost-functions, where value is + inversely proportional to surprise. In the last supplementary information box (S5), we show + how cost can define policies, without solving the (generally intractable) Hamilton-Jacobi- + Bellman equation. + NATURE REVIEWS | NEUROSCIENCE www.nature.com/reviews/neuro + © 2010 Macmillan Publishers Limited. All rights reserved. + + SUPPLEMENTARY INFORMATION In format provided by Friston (FEBRUARY 2010) + Supplementary information S5 (box): Policies and cost + This box describes a scheme that ensures agents are attracted to locations in state-space, + using prior expectations about the motion of hidden states; % ′ T comprising + x(t) =[x,x ] ∈X + position and velocity. This formulation of how an ensemble density can be restricted to an + attractive subset of state-space A⊂ X rests on the Fokker-Planck description (see Frank + 2004) of how the density changes with time + + & % 2 + p(x|m)=γ∇ p− p∇⋅ f − f ⋅∇p + + & % + At equilibrium, p(x | m) = 0 and + + γ∇2p− f ⋅∇p + % + p(x|m)= ∇⋅ f S5.1 + + Notice that as the divergence ∇⋅ f increases, the sojourn time (i.e., the proportion of time a + state is occupied) falls. Crucially, at the peaks of the ensemble density, the gradient is zero + and its curvature is negative, which means the divergence must be negative (from Equation + S5.1) + + p>0 + ∇p=0⇒∇⋅f <0 +  S5.2 + ∇2p<0 +  + + This provides a simple and general mechanism to ensure peaks of the ensemble density lie + A⊂X % % % + in, and only in . This is assured if ∇⋅ f (x) < 0 when x∈ A and ∇⋅ f (x) ≥ 0 + otherwise. We can exploit this using the generic equations of motion + +  x′  + f = cx′−∂ ϕ ⇒ ∇⋅f =c S5.3 +  x  + NATURE REVIEWS | NEUROSCIENCE www.nature.com/reviews/neuro + © 2010 Macmillan Publishers Limited. All rights reserved. + + SUPPLEMENTARY INFORMATION In format provided by Friston (FEBRUARY 2010) + + This flow describes the Newtonian motion of a unit mass in a potential energy well ϕ(x,θ) , + where cost plays the role of negative dissipation or friction. Crucially, under this policy or flow, + divergence is simply cost; meaning the associated ensemble density can only have maxima + in regions of negative cost. This provides a means to specify attractive regions A⊂ X by + assigning them negative cost + + c(x) ≤ 0: x∈A + c(x) > 0: x∉A S5.4 + + Put simply, this scheme ensures that agents are expelled from high-cost regions of state- + space and get ‘stuck’ in attractive regions. + + In summary, the previous supplementary information box (S4) showed that any flow can be + described in terms of a scalar value-function (and vector potential W ), from which an implicit + cost-function can be derived. In this box (S5), we have addressed the inverse problem of how + cost can be used to constrain flow, ensuring that it leads to attractive, low-cost states. The + ensuing policy or flow can be used in a generative model of flow or state-transitions to provide + predictions that action fulfils, under the free-energy principle. A full discussion of these and + related ideas will be presented in Friston et al (in preparation). + + + Reference + Frank TD (2004). Nonlinear Fokker-Planck Equations: Fundamentals and Applications. + Springer Series in Synergetics (Berlin: Springer) + + NATURE REVIEWS | NEUROSCIENCE www.nature.com/reviews/neuro + © 2010 Macmillan Publishers Limited. All rights reserved. diff --git a/archive/friston/The_Free_Energy_Principle_Friston_2010.pdf b/archive/friston/The_Free_Energy_Principle_Friston_2010.pdf new file mode 100644 index 00000000..cf4717c9 --- /dev/null +++ b/archive/friston/The_Free_Energy_Principle_Friston_2010.pdf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1015762480c0db36f5701f9d6048bf0e785bdec892b00777153ab6afdec2d4c1 +size 2407414 diff --git a/archive/tononi/Consciousness_as_Integrated_Information_Tononi_2008.md b/archive/tononi/Consciousness_as_Integrated_Information_Tononi_2008.md new file mode 100644 index 00000000..3f9c4cf8 --- /dev/null +++ b/archive/tononi/Consciousness_as_Integrated_Information_Tononi_2008.md @@ -0,0 +1,1577 @@ + Reference: Biol. Bull. 215: 216–242. (December 2008) + ©2008 Marine Biological Laboratory + Consciousness as Integrated Information: + a Provisional Manifesto + GIULIO TONONI + Department of Psychiatry, University of Wisconsin, Madison, Wisconsin + Abstract. The integrated information theory (IIT) starts INTRODUCTION + from phenomenology and makes use of thought experi- + ments to claim that consciousness is integrated information. Everybody knows what consciousness is: it is what van- + Specifically: (i) the quantity of consciousness corresponds ishes every night when we fall into dreamless sleep and + to the amount of integrated information generated by a reappears when we wake up or when we dream. It is also all + complex of elements; (ii) the quality of experience is spec- we are and all we have: lose consciousness and, as far as + ified by the set of informational relationships generated you are concerned, your own self and the entire world + within that complex. Integrated information () is defined dissolve into nothingness. + as the amount of information generated by a complex of Yet almost everybody thinks that understanding con- + elements, above and beyond the information generated by sciousness at the fundamental level is currently beyond the + its parts. Qualia space (Q) is a space where each axis reach of science. The best we can do, it is often argued, is + represents a possible state of the complex, each point is a gather more and more facts about the neural correlates of + probability distribution of its states, and arrows between consciousness—those aspects of brain function that change + points represent the informational relationships among its whensomeaspectsofconsciousnesschange—andhopethat + elements generated by causal mechanisms (connections). one day we will come up with an explanation. Others are + Together, the set of informational relationships within a more pessimistic: we may learn all about the neural corre- + lates of consciousness and still not understand why certain + complex constitute a shape in Q that completely and univo- physical processes seem to generate experience while others + cally specifies a particular experience. Several observations do not. + concerning the neural substrate of consciousness fall natu- It is not that we do not know relevant facts about con- + rally into place within the IIT framework. Among them are sciousness. For example, we know that the widespread + the association of consciousness with certain neural systems destruction of the cerebral cortex leaves people permanently + rather than with others; the fact that neural processes un- unconscious (vegetative), whereas the complete removal of + derlying consciousness can influence or be influenced by the cerebellum, even richer in neurons, hardly affects con- + neural processes that remain unconscious; the reduction of sciousness. We also know that neurons in the cerebral + consciousness during dreamless sleep and generalized sei- cortex remain active throughout sleep, yet at certain times + zures; and the distinct role of different cortical architectures during sleep consciousness fades, while at other times we + in affecting the quality of experience. Equating conscious- dream. Finally, we know that different parts of the cortex + ness with integrated information carries several implications influence different qualitative aspects of consciousness: + for our view of nature. damage to certain parts of the cortex can impair the expe- + rience of color, whereas other lesions may interfere with the + perception of shapes. In fact, increasingly refined neurosci- + entific tools are uncovering increasingly precise aspects of + Received 20 August 2008; accepted 10 October 2008. the neural correlates of consciousness (Koch, 2004). And + * To whom correspondence should be addressed. E-mail: gtononi@ yet, when it comes to explaining why experience blossoms + wisc.edu in the cortex and not in the cerebellum, why certain stages + Abbreviations: , integrated information; IIT, integrated information + theory; MIP, minimum information partition. of sleep are experientially underprivileged, or why some + 216 + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + CONSCIOUSNESS AS INTEGRATED INFORMATION 217 + cortical areas endow our experience with colors and others cations of the theory concerning the place of experience in + with sound, we are still at a loss. our view of the world. + Our lack of understanding is manifested most clearly + when scientists are asked questions about consciousness in APhenomenological Analysis: Consciousness as + “difficult” cases. For example, is a person with akinetic Integrated Information + mutism—awake with eyes open, but mute, immobile, and The integrated information theory (IIT) of consciousness + nearly unresponsive—conscious or not? How much con- claims that, at the fundamental level, consciousness is inte- + sciousness is there during sleepwalking or psychomotor grated information, and that its quality is given by the + seizures? Are newborn babies conscious, and to what ex- informational relationships generated by a complex of ele- + tent? Are animals conscious? If so, are some animals more ments (Tononi, 2004). These claims stem from realizing + conscious than others? Can they feel pain? Does a bat feel that information and integration are the essential properties + space the same way we do? Can bees experience colors, or of our own experience. This may not be immediately evi- + merely react to them? Can a conscious artifact be con- dent, perhaps because, being endowed with consciousness + structed with non-neural ingredients? I believe it is fair to most of the time, we tend to take its gifts for granted. To + say that no consciousness expert, if there is such a job regain some perspective, it is useful to resort to two thought + description, can be confident about the correct answer to experiments, one involving a photodiode and the other a + such questions. This is a remarkable state of affairs. Just digital camera. + consider comparable questions in physics: Do stars have + mass? Do atoms? How many different kinds of atoms and Information: the photodiode thought experiment + elementary particles are there, and of what are they made? Consider the following: You are facing a blank screen + Is energy conserved? And how can it be measured? Or that is alternately on and off, and you have been instructed + consider biology: What are species, and how do they to say “light” when the screen turns on and “dark” when it + evolve? How are traits inherited? How do organisms de- turns off. A photodiode—a simple light-sensitive device— + velop? How is energy produced from nutrients? How does has also been placed in front of the screen. It contains a + echolocation work in bats? How do bees distinguish among sensor that responds to light with an increase in current and + colors? And so on. Obviously, we expect satisfactory an- a detector connected to the sensor that says “light” if the + swers by any competent physicist and biologist. current is above a certain threshold and “dark” otherwise. + What’s the matter with consciousness, then, and how The first problem of consciousness reduces to this: when + should we proceed? Early on, I came to the conclusion that you distinguish between the screen being on or off, you + a genuine understanding of consciousness is possible only if have the subjective experience of seeing light or dark. The + empirical studies are complemented by a theoretical analy- photodiodecanalsodistinguishbetweenthescreenbeingon + sis. Indeed, neurobiological facts constitute both challeng- or off, but presumably it does not have a subjective expe- + ing paradoxes and precious clues to the enigma of con- rience of light and dark. What is the key difference between + sciousness. This state of affairs is not unlike the one faced you and the photodiode? + by biologists when, knowing a great deal about similarities According to the IIT, the difference has to do with how + and differences between species, fossil remains, and breed- much information is generated when that distinction is + ing practices, they still lacked a theory of how evolution made. Information is classically defined as reduction of + might occur. What was needed, then as now, were not just uncertainty: the more numerous the alternatives that are + more facts, but a theoretical framework that could make ruled out, the greater the reduction of uncertainty, and thus + sense of them. the greater the information. It is usually measured using the + In what follows, I discuss the integrated information entropy function, which is the logarithm of the number of + theory of consciousness (IIT; Tononi, 2004)—an attempt to alternatives (assuming they are equally likely). For exam- + understand consciousness at the fundamental level. To ple, tossing a fair coin and obtaining heads corresponds to + present the theory, I first consider phenomenological log2(2)  1 bit of information, because there are just two + thought experiments indicating that subjective experience alternatives; throwing a fair die yields log (6)  2.59 bits of + has to do with the generation of integrated information. 2 + information, because there are six. + Next, I consider how integrated information can be defined Let us now compare the photodiode with you. When the + mathematically. I then show how basic facts about con- blank screen turns on, the mechanism in the photodiode tells + sciousness and the brain can be accounted for in terms of the detector that the current from the sensor is above rather + integrated information. Finally, I discuss how the quality of than below the threshold, so it reports “light.” In performing + consciousness can be captured geometrically by the shape this discrimination between two alternatives, the detector in + of informational relationships within an abstract space the photodiode generates log (2)  1 bit of information. + 2 + called qualia space. I conclude by examining some impli- When you see the blank screen turn on, on the other hand, + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + 218 G. TONONI + the situation is quite different. Though you may think you In short, the only specification a photodiode can make is + are performing the same discrimination between light and whether things are this or that way: any further specification + dark as the photodiode, you are in fact discriminating is impossible because it does not have mechanisms for it. + among a much larger number of alternatives, thereby gen- Therefore, when the photodiode detects “light,” such “light” + erating many more bits of information. cannot possibly mean what it means for us; it does not even + This is easy to see. Just imagine that, instead of turning mean that it is a visual attribute. By contrast, when we see + light and dark, the screen were to turn red, then green, then “light” in full consciousness, we are implicitly being much + blue, and then display, one after the other, every frame from morespecific: we simultaneously specify that things are this + every movie that was ever produced. The photodiode, in- way rather than that way (light as opposed to dark), that + evitably, would go on signaling whether the amount of light whatever we are discriminating is not colored (in any par- + for each frame is above or below its threshold: to a photo- ticular color), does not have a shape (any particular one), is + diode, things can only be one of two ways, so when it visual as opposed to auditory or olfactory, sensory as op- + reports “light,” it really means just “this way” versus “that posed to thought-like, and so on. To us, then, light is much + way.” For you, however, a light screen is different not only more meaningful precisely because we have mechanisms + fromadarkscreen, but from a multitude of other images, so that can discriminate this particular state of affairs we call + when you say “light,” it really means this specific way “light” against a large number of alternatives. + versus countless other ways, such as a red screen, a green According to the IIT, it is all this added meaning, pro- + screen, a blue screen, this movie frame, that movie frame, vided implicitly by how we discriminate pure light from all + and so on for every movie frame (not to mention for a these alternatives, that increases the level of consciousness. + sound, smell, thought, or any combination of the above). This central point may be appreciated either by “subtrac- + Clearly, each frame looks different to you, implying that tion” or by “addition.” By subtraction, one may realize that + some mechanism in your brain must be able to tell it apart our being conscious of “light” would degrade more and + from all the others. So when you say “light,” whether you more—would lose its non-coloredness, its non-shapedness, + think about it or not (and you typically won’t), you have just would even lose its visualness—as its meaning is progres- + made a discrimination among a very large number of alter- sively stripped down to just “one of two ways,” as with the + natives, and thereby generated many bits of information. photodiode. By addition, one may realize that we can only + This point is so deceivingly simple that it is useful to see “light” as we see it, as progressively more and more + elaborate a bit on why, although a photodiode may be as meaning is added by specifying how it differs from count- + goodasweareindetectinglight, it cannot possibly see light less alternatives. Either way, the theory says that the more + the way we do—in fact, it cannot possibly “see” anything at specifically one’s mechanisms discriminate between what + all. Hopefully, by realizing what the photodiode lacks, we pure light is and what it is not (the more they specify what + may appreciate what allows us to consciously “see” the light means), the more one is conscious of it. + light. + The key is to realize how the many discriminations we Integration: the camera thought experiment + cando,andthephotodiodecannot,affectthemeaningofthe + discrimination at hand, the one between light and dark. For Information—the ability to discriminate among a large + example, the photodiode has no mechanism to discriminate number of alternatives—may thus be essential for con- + colored from achromatic light, even less to tell which par- sciousness. However, information always implies a point of + ticular color the light might be. As a consequence, all light view, and we need to be careful about what that point of + is the same to it, as long as it exceeds a certain threshold. So view might be. To see why, consider another thought ex- + for the photodiode, “light” cannot possibly mean achro- periment, this time involving a digital camera, say one + matic as opposed to colored, not to mention of which whose sensor chip is a collection of a million binary pho- + particular color. Also, the photodiode has no mechanism to todiodes, each sporting a sensor and a detector. Clearly, + distinguish between a homogeneous light and a bright taken as a whole, the camera’s detectors could distinguish + 1,000,000 + shape—any bright shape—on a darker background. So for among 2 alternative states, an immense number, + the photodiode, light cannot possibly mean full field as corresponding to 1 million bits of information. Indeed, the + opposed to a shape—any of countless particular shapes. camera would easily respond differently to every frame + Worse, the photodiode does not even know that it is detect- from every movie that was ever produced. Yet few would + ing a visual attribute (the “visualness” of light) as it has no argue that the camera is conscious. What is the key differ- + mechanism to tell visual attributes, such as light or dark, ence between you and the camera? + from non-visual ones, such as hot and cold, light or heavy, According to the IIT, the difference has to do with + loud or soft, and so on. As far as it knows, the photodiode integrated information. From the point of view of an exter- + might just as well be a thermistor—it has no way of know- nal observer, the camera may be considered as a single + 1,000,000 + ing whether it is sensing light versus dark or hot versus cold. system with a repertoire of 2 - + states. In reality, how + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + CONSCIOUSNESS AS INTEGRATED INFORMATION 219 + ever, the chip is not an integrated entity: since its 1 million A. SENSOR DETECTOR + photodiodes have no way to interact, each photodiode per- + forms its own local discrimination between a low and a high 1 2 + current completely independent of what every other photo- + diode might be doing. In reality, the chip is just a collection + of 1 million independent photodiodes, each with a repertoire B. + of two states. In other words, there is no intrinsic point of P + view associated with the camera chip as a whole. This is + easy to see: if the sensor chip were cut into 1 million pieces 1/2 + each holding its individual photodiode, the performance of + the camera would not change at all. p(X0(mech, x1)) + By contrast, you discriminate among a vast repertoire of 0011 + states as an integrated system, one that cannot be broken 0101 + down into independent components each with its own sep- + arate repertoire. Phenomenologically, every experience is + an integrated whole, one that means what it means by virtue P + of being one, and that is experienced from a single point of + view. For example, the experience of a red square cannot be + decomposed into the separate experience of red and the 1/4 + separate experience of a square. Similarly, experiencing the p(X (maxH)) + full visual field cannot be decomposed into experiencing 1 0011 0 + separately the left half and the right half: such a possibility 2 0101 + does not even make sense to us, since experience is always ei(X(mech,x )) = H [p(X (mech, x )) || p(X (maxH))] = 1 bit + whole. Indeed, the only way to split an experience into 1 0 1 0 + independent experiences seems to be to split the brain in + two, as in patients who underwent the section of the corpus Figure 1. Effective information. (A) A “photodiode” consisting of a + callosum to treat severe epilepsy (Gazzaniga, 2005). Such sensor and detector unit. The photodiode’s mechanism is such that the detector + patients do indeed experience the left half of the visual field unit turns on if the sensor’s current is above a threshold. Here both units are on + (binary 1, indicated in gray). (B) For the entire system (sensor unit, detector + independently of the right side, but then the surgery has unit) there are four possible states: (00,01,10,11). The potential distribution + created two separate consciousnesses instead of one. Mech- p(X (maxH))  (1/4,1/4,1/4,1/4) is the maximum entropy distribution on the + 0 + anistically then, underlying the unity of experience must be four states. Given the photodiode’s mechanism and the fact that the detector is + causal interactions among certain elements within the brain. on, the sensor must have been on. Thus, the photodiode’s mechanism and its + This means that these elements work together as an inte- current state specifies the following distribution: two of the four possible states + (00,01) are ruled out; the other two states (10,11) are equally likely since they + grated system, which is why their performance, unlike that are indistinguishable to the mechanism (the prior state of the detector makes no + of the camera, breaks down if they are disconnected. difference to the current state of the sensor). The actual distribution is therefore + p(X - + (mech, x ))  (0,0,1/2,1/2). Relative entropy (Kullback-Leibler diver + 0 1 + gence) between two probability distributions p and q is H[p|q]  p log p/q, + AMathematical Analysis: Quantifying Integrated i 2 i i + so the effective information ei(X(mech, x )) associated with output x  11 is + Information 1 1 + 1 bit (effective information is the entropy of the actual relative to the potential + This phenomenological analysis suggests that, to gener- distributions). + ate consciousness, a physical system must be able to dis- + criminate among a large repertoire of states (information) in Figure 1, which can be thought of as an idealized version + and it must be unified; that is, it should be doing so as a of a photodiode composed of a sensor S and a detector D. + single system, one that is not decomposable into a collection The system is characterized by a state it is in, which in this + of causally independent parts (integration). But how can one case is 11 (first digit for the sensor, second digit for the + measure integrated information? As I explain below, the detector), and by a mechanism. This is mediated by a + central idea is to quantify the information generated by a connection (arrow) between the sensor and the detector that + system, above and beyond the information generated inde- implements a causal interaction: in this case, the elementary + pendently by its parts (Tononi, 2001, 2004; Balduzzi and mechanismofthesystemisthatthedetectorchecksthestate + 1 + Tononi, 2008). of the sensor and turns on if the sensor is on, and off + Information otherwise (more generally, the specific causal interaction + can be described by an input-output table). + First, we must evaluate how much information is gener- Potentially, a system of two binary elements could be in + ated by the system. Consider the system of two binary units any of four possible states (00,01,10,11) with equal proba- + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + 220 G. TONONI + bility: p  (1/4,1/4,1/4,1/4). Formally, this potential (a an “intrinsic” property of a system. To calculate it explic- + priori) repertoire is represented by the maximum entropy or itly, from an extrinsic perspective, one can perturb the + uniform distribution of possible system states at time t0, system in all possible ways (i.e., try out all possible input + which expresses complete uncertainty (p(X - states, corresponding to the maximum entropy distribution + 0(maxH))). Con + sidering the potential repertoire as the set of all possible or potential repertoire) to obtain the forward repertoire of + input states, the particular mechanism X(mech) of this sys- output states given the system’s mechanism. Finally one can + tem can be thought of as specifying a forward repertoire— calculate, using Bayes’ rule, the actual repertoire given the + 3 + the probability distribution of output states produced by the system’s state (Balduzzi and Tononi, 2008). + systemwhenperturbedwithallpossibleinputstates.Butthe + system is actually in a particular output state (in this case, at Integration + time t1, x - + 1  11). In actuality, a system with this mech + anism being in state 11 specifies that the previous system Second, we must find out how much of the information + state x0 must have been either 11 or 10, rather than 00 or 01, generated by a system is integrated information; that is, how + corresponding to p  (0,0,1/2,1/2) (in this system, there is muchinformationisgeneratedbyasingleentity,asopposed + no mechanism to specify the detector state, which remains to a collection of independent parts. The idea here is to + uncertain). Formally, then, the mechanism and the state 11 consider the parts of the system independently, ask how + specify an actual (a posteriori) distribution or repertoire of muchinformationtheygenerate by themselves, and compare it + system states p(X (mech,x )) at time t0 that could have with the information generated by the system as a whole. + 0 1 + caused (led to) x at time t1, while ruling out (giving This can be done by resorting again to relative entropy to + 1 + probability zero to) states that could not. In this way, the measure the difference between the probability distribution + system’s mechanism and state constitute information (about generated by the system as a whole (p(X (mech,x )), the + 0 1 + the system’s previous state), in the classic sense of reduction actual repertoire of the system x) with the probability dis- + of uncertainty or ignorance. More precisely, the system’s tribution generated by the parts considered independently + k + mechanism and state generate 1 bit of information by dis- (p( M0(mech,1)), the product of the actual repertoire of + tinguishing between things being one way (11 or 10, which the parts kM). Integrated information is indicated with the + remain indistinguishable to it) rather than another way (00 symbol  (the vertical bar “I” stands for information, the + or 01, which also remain indistinguishable to it). circle “O” for integration): + In general, the information generated when a system + characterized by a certain mechanism in a particular state Xmech,x1 + can be measured by the relative entropy H between the HpX k k + mech,x  p M mech,  for M MIP + actual and the potential repertoires (“relative to” is indicated 0 1  0 1 0 + by ), captured by the effective information (ei): That is, the actual repertoire for each part is specified by + causal interactions internal to each part, considered as a + eiXmech,x   HpX mech,x pX maxH system in its own right, while external inputs are treated as + 1 0 1 0 + a source of extrinsic noise. The comparison is made with the + Relative entropy, also known as Kullback-Leibler diver- particular decomposition of the system into parts that leaves + gence, is a difference between probability distributions the least information unaccounted for. This minimum infor- + (Cover and Thomas, 2006): if the distributions are identical, mation partition (MIP) decomposes the system into its + relative entropy is zero; the more different they are, the minimal parts. + 2 + higher the relative entropy. Figuratively, the system’s To see how this works, consider two of the million + mechanism and state generate information by sharpening photodiodes in the digital camera (Fig. 2, left). By turning + the uniform distribution into a less uniform one—this is on or off depending on its input, each photodiode generates + how much uncertainty is reduced. Clearly, the amount of 1 bit of information, just as we saw before. Considered + effective information generated by a system is high if it has independently, then, two photodiodes generate 2 bits of + a large potential repertoire and a small actual repertoire, information, and 1 million photodiodes generate 1 million + since a large number of initial states are ruled out. By bits of information. However, as shown in the figure, the + contrast, the information generated is little if the system’s product of the actual distributions generated independently + repertoire is small, or if many states could lead to the current by the parts is identical to the actual distribution for the + outcome, since few states are ruled out. For instance, if system. Therefore, the relative entropy between the two + noise dominates (any state could have led to the current distributions is zero: the system generates no integrated + one), no alternatives are ruled out, and no information is information ( (X(mech,x ))  0) above and beyond what + 1 + generated. is generated by its parts. + Since effective information is implicitly specified once a Clearly, for integrated information to be high, a system + mechanismandstatearespecified,itcanbeconsideredtobe must be connected in such a way that information is gen- + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + CONSCIOUSNESS AS INTEGRATED INFORMATION 221 + erated by causal interactions among rather than within its network) with functional integration (there are many path- + parts. Thus, a system can generate integrated information waysforinteractions among the elements, Fig. 4A.). In very + only to the extent that it cannot be decomposed into infor- rough terms, this kind of architecture is characteristic of the + mationally independent parts. A simple example of such a mammalian corticothalamic system: different parts of the + system is shown in Figure 2 (right). In this case, the inter- cerebral cortex are specialized for different functions, yet a + action between the minimal parts of the system generates vast network of connections allows these parts to interact + information above and beyond what is accounted for by the profusely. And indeed, as much neurological evidence in- + parts by themselves ( (X(mech,x ))  0). dicates (Posner and Plum, 2007), the corticothalamic system + 1 + In short, integrated information captures the information is precisely the part of the brain that cannot be severely + generated by causal interactions in the whole, over and impaired without loss of consciousness. + 4 + above the information generated by the parts. Conversely,  is low for systems that are made up of + small, quasi-independent modules (Fig. 4B; Tononi, 2004; + Complexes Balduzzi and Tononi, 2008). This may be why the cerebel- + Finally, by measuring  values for all subsets of elements lum, despite its large number of neurons, does not contrib- + within a system, we can determine which subsets form ute much to consciousness: its synaptic organization is such + complexes. Specifically, a complex X is a set of elements that individual patches of cerebellar cortex tend to be acti- + that generate integrated information (0) that is not fully vated independently of one another, with little interaction + contained in some larger set of higher  (Fig. 3). A com- between distant patches (Bower, 2002). + plex, then, can be properly considered to form a single Computer simulations also show that units along multi- + entity having its own, intrinsic “point of view” (as opposed ple, segregated incoming or outgoing pathways are not + to being treated as a single entity from an outside, extrinsic incorporated within the repertoire of the main complex (Fig. + point of view). Since integrated information is generated 4C; Tononi, 2004; Balduzzi and Tononi, 2008). This may + within a complex and not outside its boundaries, experience be why neural activity in afferent pathways (perhaps as far + is necessarily private and related to a single point of view or as V1), though crucial for triggering this or that conscious + perspective (Tononi and Edelman, 1998; Tononi, 2004). A experience, does not contribute directly to conscious expe- + given physical system, such as a brain, is likely to contain rience; nor does activity in efferent pathways (perhaps start- + more than one complex, many small ones with low  ing with primary motor cortex), though it is crucial for + values, and perhaps a few large ones (Tononi and Edelman, reporting each different experience. + 1998; Tononi, 2004). In fact, at any given time there may be The addition of many parallel cycles also generally does + a single main complex of comparatively much higher  that not change the composition of the main complex, although + underlies the dominant experience (a main complex is such  values can be altered (Fig. 4D). Instead, cortical and + that its subsets have strictly lower ). As shown in Figure subcortical cycles or loops implement specialized subrou- + 3, a main complex can be embedded into larger complexes tines that are capable of influencing the states of the main + of lower . Thus, a complex can be casually connected, corticothalamic complex without joining it. Such informa- + through ports-in and ports-out, to elements that are not part tionally insulated cortico-subcortical loops could constitute + of it. According to the IIT, such elements can indirectly the neural substrates for many unconscious processes that + influence the state of the main complex without contributing can affect and be affected by conscious experience (Baars, + directly to the conscious experience it generates (Tononi 1988; Tononi, 2004), such as those that enable object rec- + and Sporns, 2003). ognition, language parsing, or translating our vague inten- + tions into the right words. + ANeurobiological Reality Check: Accounting for At this stage, it is hard to say precisely which cortical + Empirical Observations circuits may work as a large complex of high , and which + instead may remain informationally insulated. Does the + Can this approach account, at least in principle, for some dense mesial connectivity revealed by diffusion spectral + of the basic facts about consciousness that have emerged imaging (Hagmann et al., 2008) constitute the “backbone” + from decades of clinical and neurobiological observations? of a corticothalamic main complex? Do parallel loops + Measuring  and finding complexes is not easy for realistic through basal ganglia implement informationally insulated + systems, but it can be done for simple networks that bear subroutines? Are primary sensory cortices organized like + some structural resemblance to different parts of the brain massive afferent pathways to a main complex higher up in + (Tononi, 2004; Balduzzi and Tononi, 2008). the cortical hierarchy (Koch, 2004)? Is much of prefrontal + For example, by using computer simulations, it is possi- cortex organized like a massive efferent pathway? Do cer- + ble to show that high  requires networks that conjoin tain cortical areas, such as those belonging to the dorsal + functional specialization (due to its specialized connectiv- visual stream, remain partly segregated from the main com- + ity; each element has a unique functional role within the plex? Unfortunately, answering these questions and prop- + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + 222 G. TONONI + INFORMATION GENERATED BY THE SYSTEM + A P A’ P 1 + 1/4 + 1 3 actual: p(X (mech,x )) 1 3 + 0 1 actual: p(X (mech,x )) + 0 1 + 2 4 2 4 + 1/16 1/16 + potential: p(X0(maxH)) potential: p(X (maxH)) + ei(X(mech,x1)) = 2 bits ei(X(mech,x )) = 4 bits 0 + 1 + INFORMATION GENERATED BY THE PARTS + B aM bM B’ aM bM + P P P P + 1/2 1/2 2/3 3/8 + 1 3 a b 1 3 a b + p( M (mech,µ )) p( M (mech,µ )) p( M (mech,µ )) p( M (mech,µ )) + 0 1 0 1 0 1 0 1 + 2 4 2 4 + 1/4 1/4 1/4 1/4 + aM bM aM bM + a b a b + MIP p( M0(maxH)) p( M0(maxH)) MIP p( M (maxH)) p( M0(maxH)) + 0 + a b a b + ei( M(mech,µ ))=1 bit ei( M(mech,µ ))=1 bit ei( M(mech,µ ))=1.1 bits ei( M(mech,µ ))=1 bit + 1 1 1 1 + INTEGRATED INFORMATION GENERATED BY THE SYSTEM ABOVE AND BEYOND THE PARTS + C P C’ P 1 + 1/4 + 1 3 p(X (mech,x )) 1 3 + 0 1 p(X (mech,x )) + 0 1 + 2 4 2 4 + 1/4 1/4 + Πp(kM(mech,µ )) MIP Πp(kM(mech,µ )) + MIP K=1,2 0 1 K=1,2 0 1 + k k + φ(X(mech,x ))=H[p(X (mech,x ))||Πp( M (mech,µ ))]=0 bits φ(X(mech,x ))=H[p(X (mech,x ))||Πp( M (mech,µ ))]=2 bits + 1 0 1 K=1,2 0 1 1 0 1 K=1,2 0 1 + Figure 2. Integrated information. Left-hand side: two photodiodes in a digital camera. (A) Information + generated by the system as a whole. The system as a whole generates 2 bits of effective information by + specifying that n and n must have been on. (B) Information generated by the parts. The minimum information + 1 3 + partition (MIP) is the decomposition of a system into (minimal) parts, that is, the decomposition that leaves the + least information unaccounted for. Here the parts are two photodiodes. (C) The information generated by the + system as a whole is completely accounted for by the information generated by its parts. In this case, the actual + repertoire of the whole is identical to the combined actual repertoires of the parts (the product of their + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + CONSCIOUSNESS AS INTEGRATED INFORMATION 223 + erly testing the predictions of the theory requires a much The most common example of a marked change in the + better understanding of cortical neuroanatomy than is cur- level of experience is the fading of consciousness that + rently available. occurs during certain periods of sleep. Subjects awakened in + Other simulations show that the effects of cortical dis- deep NREM (non–rapid eye movement) sleep, especially + connections are readily captured in terms of integrated early in the night, often report that they were not aware of + information (Tononi, 2004): a “callosal” cut produces, out themselves or of anything else, though cortical and thalamic + of a large complex corresponding to the connected cortico- neurons remain active. Awakened at other times, mainly + thalamic system, two separate complexes, in line with many during REM sleep or during lighter periods of NREM sleep + studies of split-brain patients (Gazzaniga, 2005). However, later in the night, they report dreams characterized by vivid + because there is great redundancy between the two hemi- images (Hobson et al., 2000). From the perspective of + spheres, their  value is not greatly reduced compared to integrated information, a reduction of consciousness during + when they form a single complex. Functional disconnec- early sleep would be consistent with the bistability of cor- + tions may also lead to a restriction of the neural substrate of tical circuits during deep NREM sleep. Due to changes in + consciousness, as is seen in neurological neglect phenom- intrinsic and synaptic conductances triggered by neuro- + ena, in psychiatric conversion and dissociative disorders, modulatory changes (e.g., low acetylcholine), cortical neu- + and possibly during dreaming and hypnosis. It is also likely rons cannot sustain firing for more than a few hundred + that certain attentional phenomena may correspond to milliseconds and invariably enter a hyperpolarized down- + changes in the composition of the main complex underlying state. Shortly afterward, they inevitably return to a depolar- + consciousness (Koch and Tsuchiya, 2007). The attentional ized up-state (Steriade et al., 2001). Indeed, computer sim- + 5 + blink, where a fixed sensory input may at times make it to ulations show that values of  are low in systems with such + consciousness and at times not, may also be due to changes bistable dynamics (Fig. 4F, Balduzzi and Tononi, 2008). + in functional connectivity: access to the main corticotha- Consistent with these observations, studies using TMS, a + lamic complex may be enabled or not based on dynamics technique for stimulating the brain non-invasively, in con- + intrinsic to the complex (Dehaene et al., 2003). Similarly, junction with high-density EEG, show that early NREM + binocular rivalry6 - sleep is associated either with a breakdown of the effective + may be related, at least in part, to dy + namic changes in the composition of the main corticotha- connectivity among cortical areas, and thereby with a loss of + lamic complex caused by transient changes in functional integration (Massimini et al., 2005, 2007), or with a stereo- + connectivity. Computer simulations confirm that functional typical global response suggestive of a loss of repertoire and + disconnection can reduce the size of a complex and reduce thus of information (Massimini et al., 2007). Similar + its capacity to integrate information (Tononi, 2004). While changes are seen in animal studies of anesthesia (Alkire et + it is not easy to determine, at present, whether a particular al., 2008). + group of neurons is excluded from the main complex Finally, consciousness not only requires a neural sub- + because of hard-wired anatomical constraints or is tran- strate with appropriate anatomical structure and appropriate + siently disconnected due to functional changes, the set of physiological parameters, it also needs time (Bachmann, + elements underlying consciousness is not static, but form 2000). The theory predicts that the time requirement for the + a“dynamic complex”or“dynamic core” (Tononi and generation of conscious experience in the brain emerges + Edelman, 1998). directly from the time requirements for the build-up of an + Computer simulations also indicate that the capacity to integrated repertoire among the elements of the corticotha- + integrate information is reduced if neural activity is ex- lamic main complex so that discriminations can be highly + tremely high and near-synchronous, due to a dramatic de- informative (Tononi, 2004; Balduzzi and Tononi, unpubl.). + crease in the repertoire of discriminable states (Fig. 4E; To give an obvious example, if one were to perturb half of + Balduzzi and Tononi, 2008). This reduction in degrees of the elements of the main complex for less than a millisec- + freedom could be the reason that consciousness is reduced ond, no perturbations would produce any effect on the other + or eliminated in absence seizure (petit mal) and other con- half within this time window, and  would be zero. After, + ditions during which neural activity is both high and syn- say, 100 ms, however, there is enough time for differential + chronous (Blumenfeld and Taylor, 2003). effects to be manifested, and  should grow. + respective probability distributions), so that relative entropy is zero. The system generates no information above and beyond the parts, so it cannotbe + considered a single entity. Right-hand side: an integrated system. Elements in the system are on if they receive two or more spikes. The system is in state + x 1000.(A )Themechanismspecifiesauniquepriorstatethatcancausestatex ,sothesystemgenerates4bitsofeffectiveinformation.Allotherinitial + 1 1 + states are ruled out, since they cause different outputs. (B ) Effective information generated by the two minimal parts, considered as systems in their own + right. External inputs are treated as extrinsic noise. (C ) Integrated information is information generated by the whole (black arrows) over and above the + parts (gray arrows). In this case, the actual repertoire of the whole is different from the combined actual repertoires of the parts, and the relative entropy + is 2 bits. The system generates information above and beyond the parts, so it can be considered a single entity (a complex). + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + 224 G. TONONI + only tell that things are one way rather than another way. On + =1)1b the other hand, when we see “light,” we discriminate against + Φ (b) = 2 + φ( many more states of affairs, and thus generate much more + information. In fact, I argued that “light” means what it + =2 meansandbecomesconscious“light” by virtue of being not + )1s + Φ (s) = 1 + φ( just the opposite of dark, but also different from any color, + any shape, any combination of colors and shapes, any frame + of every possible movie, any sound, smell, thought, and so on. + What needs to be emphasized at this point is that dis- + criminating “light” against all these alternatives implies not + just picking one thing out of “everything else” (an undif- + =2 ferentiated bunch), but distinguishing at once, in a specific + )1x + Φ (xx)= 1 φ( + 1 way, between each and every alternative. Consider a very + =3)1a + Φ (a)= 3 simple example: a binary counter capable of discriminating + φ( + among the four numbers: 00, 01, 10, 11. When the counter + says binary “3,” it is not just discriminating 11 from every- + Figure 3. Complexes. In this system, the mechanism is that elements thing else as an undifferentiated bunch, otherwise it would + fire in response to an odd number of spikes on their afferent connections not be a counter, but a 11 detector. To be a counter, the + (links without arrows are bidirectional connections). Analyzing the system system must be able to tell 11 apart from 00 as well as from + in terms of integrated information shows that the system constitutes a 10 as well as from 01 in different, specific ways. It does so, + complex (x, light gray) that contains three smaller complexes (s,a,b, in + different shades of gray). Observe that (i) complexes can overlap; (ii) a of course, by making choices through its mechanisms; for + complex can interact causally with elements not part of it; (iii) groups of example: is this the first or the second digit? Is ita0ora1? + elements with identical architectures (a and b) generate different amounts Each mechanism adds its specific contribution to the dis- + of integrated information, depending on their ports-in and ports-out. crimination they perform together. Similarly, when we see + light, mechanisms in our brain are not just specifying “light” + The Quality of Consciousness: Characterizing with respect to a bunch of undifferentiated alternatives. + Informational Relationships Rather, these mechanisms are specifying that light is what it + is by virtue of being different, in this and that specific way, + If the amount of integrated information generated by fromeveryotheralternative—fromdarktoanycolor,toany + different brain structures (or by the same structure function- shape, movie frame, sound or smell, and so on. + ing in different ways) can in principle account for changes In short, generating a large amount of integrated infor- + in the level of consciousness, what is responsible for the mation entails having a highly structured set of mechanisms + quality of each particular experience? What determines that that allow us to make many nested discriminations (choices) + colors look the way they do and are different from the way as a single entity. According to the IIT, these mechanisms + music sounds? Once again, empirical evidence indicates working together generate integrated information by speci- + that different qualities of consciousness must be contributed fying a set of informational relationships that completely + by different cortical areas. Thus, damage to certain parts of and univocally determine the quality of experience. + the cerebral cortex forever eliminates our ability to experi- + ence color (whether perceived, imagined, remembered, or Experience as a shape in qualia space + dreamt), whereas damage to other parts selectively elimi- + nates our ability to experience visual shapes. There is ob- To see how this intuition can be given a mathematical + viously something about different parts of the cortex that formulation, let us consider again a complex of n binary + can account for their different contribution to the quality of elements X(mech,x ) having a particular mechanism and + 1 + experience. What is this something? being in a particular state. The mechanism of the system is + conn + The IIT claims that, just as the quantity of consciousness implemented by a set of connections X - + among its ele + generated by a complex of elements is determined by the ments. Let us now suppose that each possible state of the + amount of integrated information it generates above and system constitutes an axis or dimension of a qualia space + n + beyond its parts, the quality of consciousness is determined (Q) having 2 dimensions. Each axis is labeled with the + by the set of all the informational relationships its mecha- probability p for that state, going from 0 to 1, so that a + nisms generate. That is, how integrated information is gen- repertoire (i.e., a probability distribution on the possible + erated within a complex determines not only the amount of states of the complex) corresponds to a point in Q (Fig. 5). + consciousness it has, but also what kind of consciousness. Let us now examine how the connections among the + Consider again the photodiode thought experiment. As I elements of the complex specify probability distributions; + discussed before, when the photodiode reacts to light, it can that is, how a set of mechanisms specifies a set of informa- + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + CONSCIOUSNESS AS INTEGRATED INFORMATION 225 + INTEGRATED INFORMATION & NEUROANATOMY + A CORTICOTHALAMIC SYSTEM B CEREBELLAR SYSTEM + f = 1.8 + f = 1.3 + Φ = 4 Φ = 1.8 Φ = .4 + AFFERENT PATHWAYS CORTICAL-SUBCORTICAL LOOPS + CD + Φ= 3.6 Φ= 3.6 + Φ= 1 Φ= 1.9 + INTEGRATED INFORMATION & NEUROPHYSIOLOGY + STEMS SLEEPING SYSTEM + BALANCED & EPILEPTIC SY + , + OSE + T + EF + OMA + C + 100 % active 2 + Φ + Φ= 0 + Φ= 3.7 4 Φ + Φ3 50 1 + + x % activity + Ma2 + 1 + 0 0 + 00 2 4 6 8 + Elements firing 0204060 + Φ= .17 time (ticks) + Figure 4. Relating integrated information to neuroanatomy and neurophysiology. Elements fire in + response to two or more spikes (except elements targeted by a single connection, which copy their input); links + without arrows are bidirectional. (A) Computing  in simple models of neuroanatomy suggests that a + functionally integrated and functionally specialized network—like the corticothalamic system—is well suited to + generating high values of . (B, C, D) Architectures modeled on the cerebellum, afferent pathways, and + cortical-subcortical loops give rise to complexes containing more elements, but with reduced  compared to the + main corticothalamic complex. (E)  peaks in balanced states; if too many or too few elements are active,  + collapses. (F) In a bistable (“sleeping”) system (same as in (E)),  collapses when the number of firing elements + (dotted line) is too high (high % activity), remains low during the “DOWN” state (zero % activity), and only + recovers at the onset of the next “UP” state. + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + 226 G. TONONI + A B 1 3 + 1 3 1 3 + 2 4 + 2 4 1111 110 + 1 1101 00 1 1 bit + 2 4 11 101 0 + 2 bits 101 + 1001 + 00 + 10 + 1 3 MIP 0111 + 1 3 .5 bits 0110 1 3 + 2 4 0101 + r = c 2.1 bits 2 4 + 2 4 43 100 + 0 + 0 1/16 0011 + 1 3 1.1 bits 0010 + 2 4 01 + 00 + 1 3 0000 .45 bits 1 3 + 1 0000000011111111 0 1/16 1/2 1 + 2 4 1 3 2 4 + 2 00 01 1111 + 0 1110000 .18 bits 1.4 bits + 3 00 10 0011 + 1 0110011 2 4 + 4 01 10 0101 + 0 1010101 + 1 3 1 3 + 2 4 2 4 + Figure 5. Qualia. (A) The system in the inset is the same as in Fig. 2A . Qualia (Q)-space for a system of + four units is 16-dimensional (one axis per possible state; since axes are displayed flattened onto the page, and + points and arrows cannot be properly drawn in 2-dimensions, their position and direction is for illustration only). + In state x1  1000, the complex generates a quale or shape in Q, as follows. The maximum entropy distribution + (the “bottom” of the quale, indicated by a black square) is a point assigning equal probability (p  1/16  + 0.0625) to all 16 system states, close to the origin of the 16-dimensional space. Engaging a single connection + “r” between elements 4 and 3 (c ) specifies that, since element n has not fired, the probability of element n + 43 3 4 + having fired in the previous time step is reduced to p  0.25 compared to its maximum entropy value (p  0.5), + while the probability of n4 not having fired is increased to p  0.75. The actual probability distribution of the + 16 system states is modified accordingly. Thus, the connection r “sharpens” the maximum entropy distribution + into an actual distribution, which is another point in Q. The q-arrow linking the two distributions geometrically + realizes the informational relationship specified by the connection. The length (divergence) of the q-arrow + expresses how much the connection specifies the distribution (the effective information it generates or relative + entropy between the two distributions); the direction in Q expresses the particular way in which the connection + specifies the distribution. (B) Engaging more connections further sharpens the actual repertoire, specifying new + points in Q and the corresponding q-arrows. The figure shows 16 out of the 399 points in the quale, generated + by combinations of the four sets of connections. The probability distributions depicted around the quale are + representative of the repertoires generated by two q-edges formed by q-arrows that engage the four sets of + connections in two different orders (the two representative q-edges start at bottom left—one goes clockwise, the + other counter-clockwise; black connections represent those whose contribution is being evaluated; gray con- + nections those whose contribution has already been considered and which provides the context on top of which + the q-arrow generated by a black connection begins). Repertoires corresponding to certain points of the quale are + shownalongside, as in previous figures. Effective information values (in bits) of the q-arrows in the two q-edges + are shown alongside. Together, the q-edges enclose a shape, the quale, which completely specifies the quality + of the experience. + tional relationships. First, consider the complex with all photodiode, the mechanism implemented by that connection + connections among its elements disengaged, thus discount- and the state the system is in rule out states that could not + ing any causal interactions (Fig. 5A). In the absence of a have caused x1 and increases the actual probability of states + mechanism, the state x provides no information about the that could have caused x , yielding an actual repertoire. In + 1 1 + system’s previous state: from the perspective of a system Q, the actual repertoire specified by this connection corre- + without causal interactions, all previous states are equally sponds to a point projecting onto higher p values on some + likely, corresponding to the maximum entropy or uniform axes and onto lower p values (or zero) on other axes. Thus, + distribution (the potential repertoire). In Q, this probability the connection shapes the uniform distribution into a more + n + distribution is a point projecting onto all axes at p  1/2 specific distribution, and thereby generates information (re- + (probabilities must sum to 1). duces uncertainty). More generally, we can say that the + Next, consider engaging a single connection (Fig. 5A, the connection specifies an informational relationship, that is, a + other connections are treated as extrinsic noise). As with the relationship between two probability distributions. This in- + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + CONSCIOUSNESS AS INTEGRATED INFORMATION 227 + formational relationship can be represented as an arrow in Q considering the effects of an additional connection (how it + (q-arrow) that goes from the point corresponding to the further sharpens the actual repertoire) can change in both + n) + maximum entropy distribution (p  1/2 - magnitude and direction depending on the context in which + to the point cor + responding to the actual repertoire specified by that connec- it is considered. In Figure 6, when considered in isolation + tion. The length (divergence) of the q-arrow expresses how (null context), the connection “r” between elements 4 and 3 + muchthe connection specifies the distribution (the effective generates a short q-arrow (0.18 bits) pointing in a certain + information it generates, i.e., the relative entropy between direction. When considered in the full context provided by + the two distributions); the direction in Q expresses the all other connections (not-r or ¬r), the same connection “r” + particular way in which the connection specifies the distri- generates a longer q-arrow (1 bit) pointing in a different + bution, i.e., a change in position in Q. Similarly, if one direction. + considers all other connections taken in isolation, each will Another property is how removing or adding a set of + specify another q-arrow of a certain length, pointing in a connections folds or unfolds a quale. The portion of the + different direction. quale that is generated by a set of connections r (acting in all + Next, consider all possible combinations of connections contexts) is called a q-fold. If we remove connection r from + (Fig. 5B). For instance, consider adding the contribution of the system, all the q-arrows generated by that connection, in + the second connection to that of the first. Together, the first all possible contexts, vanish, so the shape of the quale + and second connections specify another actual repertoire— “folds” along the q-fold specified by that connection. Con- + another point in Q-space—and thereby generate more in- versely, when the connection is added to a system, the shape + formation than either connection alone as they shape the of the quale unfolds. + uniform distribution into a more specific distribution. To the Another important property of q-arrows is entanglement + tip of the q-arrow specified by the first connection, one can (, Balduzzi and Tononi, unpubl.). A q-arrow is entangled + now add a q-arrow bent in the direction contributed by the (  0) if the underlying connections considered together + second connection, forming an “edge” of two q-arrows in generate information above and beyond the information + Q-space (the same final point is reached by adding the they generate separately (note the analogy with ). Thus, + q-arrow due to the first connection on top of the q-arrow entanglement characterizes informational relationships (q- + specified by the second one). Each combination of connec- arrows) that are more than the sum of their component + tion therefore specifies a q-edge made of concatenated q- relationships (component q-arrows, Fig. 6B), just like  + arrows (component q-arrows). In general, the more connec- characterizes systems that are more than the sum of their + tions one considers together, the more the actual repertoire parts. Geometrically, entanglement “warps” the shape of the + will take shape and differ from the uniform (potential) quale away from a simple hypercube (where q-arrows are + distribution. orthogonal to each other). Entanglement has several rele- + Finally, consider the joint contribution of all connections vant consequences (Balduzzi and Tononi, unpubl.). For + of the complex (Fig. 5B). As was discussed above, all example, an entangled q-arrow can be said to specify a + connections together specify the actual repertoire of the concept, in that it groups together certain states of affairs in + whole. This is the point where all q-edges converge. To- a way that cannot be decomposed into the mere sum of + gether, these q-edges in Q delimit a quale, that is, a shape simpler groupings (see also Feldman, 2003). Moreover, just + n + in Q, a kind of 2 -dimensional solid (technically, in more as  can be used to identify complexes, entanglement  can + than three dimensions, the “body” of a polytope). The be used to identify modes. By analogy with complexes, + bottom of the quale is the maximum entropy distribution, its modes are sets of q-arrows that are more densely entangled + edges are q-edges made of concatenated q-arrows, and its than surrounding q-arrows: they can be considered as clus- + top is the actual repertoire of the complex as a whole. The ters of informational relationships constituting distinctive + shape of this solid (polytope) is specified by all informa- “sub-shapes” in Q (see Fig. 8). By analogy with a main + tional relationships that are generated within the complex by complex, an elementary mode is such that its component + the interactions among its elements (the effective informa- q-arrows have strictly lower . As will be briefly discussed + 7 + tion matrix; Tononi, 2004). Note that the same complex of below, modes play an important role in understanding the + elements, endowed with the same mechanism, will typically structure of experience. + generate a different quale or shape in Q depending on the + particular state it is in. Some properties of qualia space + It is worth considering briefly a few relevant properties of + informational relationships or q-arrows. First, informational What is the relevance of these constructs to understand- + relationships are context-dependent (Fig. 6), in the follow- ing the quality of consciousness? It is not easy to become + ing sense. A context can be any point in Q corresponding to familiar with a complicated multidimensional space nearly + the actual repertoire generated by a particular subset of impossible to draw, so it may be useful to resort to some + connections. It can be shown that the q-arrow generated by metaphors. I have argued that the set of informational rela- + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + 228 G. TONONI + A + r + ¬ + r + 1 bit + 1 3 1 3 + r + 2 4 .18 bits 2 4 + ¬ + NULL CONTEXT r + FULL CONTEXT + B + entanglement γ = .42 bits + r + 1 3 + 1 3 + 2 4 + 2 4 ¬r + Figure 6. Context and entanglement. (A) Context. The same connection (black arrow between elements + 3 and 4) considered in two contexts. At the bottom of the quale (null context, corresponding to the maximum + entropy distribution when no other connections are engaged), the connection r generates a q-arrow (called + down-set of r, or 2r) corresponding to 0.18 bits of information pointing up-left in Q. Near the top of the quale + (full context, corresponding to the actual distribution specified by all other connections except for r, indicated + as ¬r), r generates a q-arrow (called up-set of non-red, or 1 ¬r) corresponding to 1 bit of information pointing + up-right in Q. (B) Entanglement. Left: the q-arrow generated by the connection “r” and the q-arrow generated + by the complementary connections “¬ r” at the bottom of the quale (null context). Right: The product of the two + q-arrows (corresponding to independence between the informational relationships specified by the two sets of + connections) would be a point corresponding to the vertex of the dotted parallelogram opposite to the bottom. + However, “r” and “¬r” jointly specify the actual distribution corresponding to the top of the quale (black + triangle). The distance between the probability distribution in Q specified jointly by two sets of connections and + their product distribution (zigzag arrow) is the entanglement between the two corresponding q-arrows (how + much the composite q-arrow specifies above and beyond its component q-arrows). + tionships in Q generated by the mechanisms of a complex in and differences can in principle be quantified as similarities + a given state (q-arrows between repertoires) specify a shape and differences between shapes. The set of all shapes gen- + in Q (a quale). Perhaps the most important notion emerging erated by the same system in different states provides a + 9 + from this approach is that an experience is a shape in Q. geometrical depiction of all its possible experiences. + According to the IIT, this shape completely and univo- Note that a quale can only be specified by a mechanism + 8 + cally specifies the quality of experience. and a particular state—it does not make sense to ask about + It follows that different experiences are, literally, differ- the quale generated by a mechanism in isolation, or by a + ent shapes in Q. For example, when the same system is in a state (firing pattern) in isolation. A consequence is that two + different state (firing pattern), it will typically generate a different systems in the same state can generate two differ- + different shape or quale (even for the same value of ). ent experiences (i.e., two different shapes). As an extreme + Importantly, if an element turns on, it generates information example, a system that was to copy one by one the state of + and meaning not by signifying something (say “red”), the neurons in a human brain, but had no internal connec- + which in isolation it cannot, but by changing the shape of tions of its own, would generate no consciousness and no + the quale. Moreover, experiences are similar if their shape is quale (Tononi, 2004; Balduzzi and Tononi, 2008). + similar, and different to the extent that their shapes are By the same token, it is possible that two different sys- + different. This means that phenomenological similarities tems generate the same experience (i.e., the same shape). + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + CONSCIOUSNESS AS INTEGRATED INFORMATION 229 + For example, consider again the photodiode, whose mech- tion (MIP) is just another point in Q: the one specified by + anism determines that if the current in the sensor exceeds a the connections within the minimal parts only, leaving out + threshold, the detector turns on. This simple causal interac- the contribution of the connections among the parts. This + tion is all there is, and when the photodiode turns on it point is the actual repertoire corresponding to the product of + merely specifies an actual repertoire where states the actual repertoires of the parts taken independently.  + (00,01,10,11) have, respectively, probability (0,0,1/2,1/2). corresponds then to an arrow linking this point to the top of + This corresponds in Q to a single q-arrow, one bit long, the solid. In this view, the q-edges leading to the minimum + going from the potential, maximum entropy repertoire (1/ information bipartition provide the natural “base” upon + 4,1/4,1/4,1/4) to (0,0,1/2,1/2). Now imagine the light sensor which the solid rests—the informational relationships gen- + is substituted by a temperature sensor with the same thresh- erated within the parts upon which are built the informa- + old and dynamic range—we have a thermistor rather than a tional relationships among the parts. The -arrow can then + photodiode. Although the physical device has changed, be thought of as the height of the solid—or rather, to + according to the IIT the experience, minimal as it is, has to employ a metaphor, as the highest pole holding up a tent. + be the same, since the informational relationship that is For example, if  is zero (say a system decomposes into + generated by the two devices is identical. Similarly, an two independent complexes as in Fig. 7B), the tent corre- + AND gate when silent and an OR gate when firing also sponding to the system is flat—it has no shape—since the + generate the same shape in Q, and therefore must generate actual repertoire of the system collapses onto its base (MIP). + the same minimal experience (it can be shown that the two This is precisely what it means when 0. Conversely, + shapes are isomorphic, that is, have the same symmetries; the higher the  value of a complex (the higher the tent or + Balduzzi and Tononi, unpubl.). In other words, different solid), the more “breathing room” there is for the various + “physical” systems (possibly in different states) generate the informational relationships within the complex (the edges of + same experience if the shape of the informational relation- the solid or the seams of the tent) to express themselves. + ships they specify is the same. On the other hand, more In summary, and not very rigorously, the generation of an + complex networks of causal interactions are likely to create experience can be thought of as the erection of a tent with + highly idiosyncratic shapes, so systems of high  are un- a very complex structure: the edges are the tension lines + likely to generate exactly identical experiences. generated by each subset of connections (the respective + If experience is integrated information, it follows that q-arrow or informational relationship). The tent literally + only the informational relationships within a complex (those takes shape when the connections are engaged and specify + that give the quale its shape) contribute to experience. actual repertoires. Perhaps an even more daring metaphor + Conversely, the informational relationships that exist out- would be the following: whenever the mechanisms of a + side the main complex—for example, those involving sen- complex unfold and specify informational relationships, the + sory afferents or cortico-subcortical loops implementing flower of experience blooms. + informationally insulated subroutines—do not make it into + the quale, and therefore do not contribute either to the From phenomenology to geometry + quantity or to the quality of consciousness. + Note also that informational relationships, and thus the The notions just sketched aim at providing a framework + shape of the quale, are specified both by the elements that for translating the seemingly ineffable qualitative properties + are firing and by those that are not. This is natural consid- of phenomenology into the language of mathematics, spe- + ering that an element that does not fire will typically rule out cifically, the language of informational relationships (q- + someprevious states of affairs (those that would have made arrows) in Q. Ideally, when sufficiently developed, such + it fire), and thereby it will contribute to specifying the actual language should permit the geometric characterization of + repertoire. Indeed, many silent elements can rule out, in phenomenological properties generated by the human brain. + combination, a vast number of previous states and thus be In principle, it should also allow us to characterize the + highly informative. From a neurophysiological point of phenomenology of other systems. After all, in this frame- + view, such a corollary may lead to counterintuitive predic- work the experience of a bat echo-locating in a cave is just + tions. For example, take elements (neurons) within the main another shape in Q and, at least in principle, shapes can be + complex that happen to be silent when one is having a compared objectively. + particular experience. If one were to temporarily disable At present, due to the combinatorial problems posed by + these neurons (e.g., make them incapable of firing), the deriving the shape of the quale produced by systems of just + prediction is that, though the system state (firing pattern) a few elements, and to the additional difficulties posed by + would remain the same, the quantity and quality of experience representing such high-dimensional objects, the best one + would change (Tononi, 2004; Balduzzi and Tononi, 2008). can hope for is to show that the language of Q can capture, + It is important to see what  corresponds to in this in principle, some of the basic distinctions that can be made + representation (Fig. 7A). The minimum information parti- in our own phenomenology, as well as some key neuropsy- + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + 230 G. TONONI + A B + 1111 1110 1101 + 1100 1011 + 1010 + 1001 0 φ = 2 bits + 1 3 100 + MIP 0111 + 0110 + 0101 + 2 4 0100 + 0011 + 0010 MIP + 0001 + 0000 + 0 1/16 1/2 1 + C D + 1111 1110 1000 + 1100 1001 + MIP 1101 + {c ,c }1011 + 12 34 1010 + 1 3 0111 + {c } 0110 + COPYCOPY 12 + {c } 0101 + 34 + 2 4 0100 + 0011 + { } 0010 + 0001 + 0000 + Figure 7. The tent analogy. (A) The system of Fig. 2A / Fig. 5. (B) The q-edges converging on the + minimuminformation partition of the system (MIP) form the natural base on which the complex rests, depicted + as a “tent.” The informational relationships among the parts are built on top of the informational relationships + generated independently within the minimal parts. From this perspective the  q-arrow (in black) is simply the + tent pole holding the quale up above its base; the length (divergence) of the pole expresses the breathing room + in the system. The thick gray q-arrow represents the information generated by the entire system. (C) The system + of Fig. 2A. The quale (not) generated by the two photodiodes considered as a single system. As shown in Fig. + 2A, the system reduces to two independent parts, so it does not exist as a single entity. (D) Note that in this case + the quale reduces to the MIP: the “tent” collapses onto its base, so there is no breathing room for informational + relationships within the system. The quale generated by each part considered in isolation does exist, corre- + sponding to an identical q-arrow for each couple. + chological observations (Balduzzi and Tononi, unpubl.). A shapes, yet they are all part of the same landmass, just as + short list includes the following: modalities are parts of the same consciousness. Moreover, + (i) Experience is divided into modalities, like the classic within each continent there are peninsulas (sub-sub-shapes), + senses of sight, hearing, touch, smell, and taste (and several like Italy in Europe, just as there are submodalities within + others), as well as submodalities, like visual color and visual modalities. + shape. What do these broad distinctions correspond to in Q? (ii) Some experiences appear to be “elementary,” in that + According to the IIT, modalities are sets of densely entan- they cannot be further decomposed. A typical example is + gled q-arrows (modes) that form distinct sub-shapes in the what philosophers call a “quale” in the narrow sense—say a + quale; submodalities are subsets of even more densely en- pure color like red, or a pain, or an itch: it is difficult, if not + tangled q-arrows (sub-modes) within a larger mode, thus impossible, to identify any further phenomenological struc- + forming distinct sub-sub-shapes (Fig. 8). As a two-dimen- ture within the experience of red. According to the IIT, such + sional analog, imagine a given multimodal experience as the elementary experiences correspond to sub-modes that do + shape of the three-continent complex constituted by Europe, not contain any more densely entangled sub-sub-modes + Asia, and Africa. The three continents are distinct sub- (elementary modes, Fig. 8). + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + CONSCIOUSNESS AS INTEGRATED INFORMATION 231 + to-articulate phenomenological differences correspond to + n + different basic sub-shapes in Q, such as 2 -dimensional + grid-like structures and pyramid-like structures, which + emerge naturally from the underlying neuroanatomy. + (vi) Some experiences are more alike than others. Blue is + certainly different from red (and irreducible to red), but + Red clearly it seems even more different from middle C on the + oboe. In the IIT framework, in Q colors correspond to + different sub-shapes of the same kind (say pyramids point- + Color ing in different directions) and sounds to very different + Form sub-shapes (say tetrahedra). In principle, such subjective + similarities and differences can be investigated by employ- + ing objective measures of similarity between shapes (e.g., + Sound Sight considering the number and kinds of symmetries involved + in specifying shapes that are generated in Q by different + Quale neuroanatomical circuits). + (vii) Experiences can be refined through learning and + Figure 8. Modes. Schematic depiction of modes and sub-modes. A changes in connectivity. Suppose one learns to distinguish + mode, indicated by a polygon within the quale (light gray with black wine from water, then red wines from whites, then different + border), is a set of q-arrows that are more densely entangled than surround- + ing q-arrows, and can be considered as clusters of informational relation- varietals. Presumably, underlying this phenomenological + ships constituting distinctive “sub-shapes” in Q. Two different modes refinement is a neurobiological refinement: neurons that + could correspond, for example, to the modalities of sight and sound. A initially were connected indiscriminately to the same affer- + sub-mode within a mode is a set of q-arrows that is even more densely ents become more specialized and split into sub-groups with + entangled (a sub-sub-shape in Q). Color and form could correspond to two partially segregated afferents. This process has a straight- + sub-modes within the visual mode. The thin black polygon represents an + elementary mode, which does not contain more densely entangled q-arrows. forward equivalent in Q: the single q-arrow generated ini- + Elementary modes could correspond to experiential qualities that cannot be tially by those afferents splits into two or more q-arrows + further decomposed, such as the color “red” (qualia in the narrow sense.) pointing in different directions, and the overall sub-shape of + the quale is correspondingly refined. + (iii) Some experiences are homogeneous and others are (viii) Qualia in the narrow sense (elementary modes) + composite: for example, a full-field experience of blue, as exist “at the top of experience” and not at its bottom. + when watching a cloudless sky, compared to that of a busy Consider the experience of seeing a pure color, such as red. + market street. In Q, homogeneous experiences translate to a Theevidencesuggests that the “neural correlate” (Crick and + single homogeneous shape, and composite ones into a com- Koch, 2003) of color, including red, is probably a set of + posite shape with many distinguishable sub-shapes (modes neurons and connections in the fusiform gyrus, maybe in + and sub-modes). area V8 (ideally, neurons in this area are activated whenever + (iv) Some experiences are hierarchically organized. Take a subject sees red and not otherwise, if stimulated trigger the + seeing a face: we see at once that as a whole it is some- experience of red, and if lesioned abolish the capacity to see + body’s face, but we also see that it has parts such as hair, red). Certain achromatopsic subjects with dysfunctions in + eyes, nose, and mouth, and that those are made in turn of this general area seem to lack the feeling of what it is like + specifically oriented segments. The subjective experience is to see color, its “coloredness,” including the “redness” of + constructed from informational relationships (q-arrows) that red. They cannot experience, imagine, remember, or even + are entangled (not reducible to a product of independent dream of color, though they may talk about it, just as we + components) across hierarchical levels. For example, infor- could talk about echolocation, from a third-person perspec- + mational relationships constituting “face” would be more tive (van Zandvoort et al., 2007). Contrast such subjects, + densely tangled than unnatural combinations such as seen in who are otherwise perfectly conscious, with vegetative pa- + certain Cubist paintings. The sub-shape of the quale corre- tients, who are for all intents and purposes unconscious. + sponding to the experience of seeing a face is then an Some of these patients may show behavioral and neuro- + overlapping hierarchy of tangled q-arrows, embodying re- physiological evidence for residual function in an isolated + lationships within and across levels. brain area (Posner and Plum, 2007). Yet it seems highly + (v) We recognize intuitively that the way we perceive unlikely that a vegetative patient with residual activity ex- + taste, smell, and maybe color, is organized phenomenolog- clusively in V8 should enjoy the vivid perceptions of color + ically in a “categorical” manner, quite different from, say, just as we do, while being otherwise unconscious. + the “topographical” manner in which we perceive space in The IIT provides a straightforward account for this dif- + vision, audition, or touch. According to the IIT, these hard- ference. To see how, consider again Figure 6A: call “r” the + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + 232 G. TONONI + connections targeting the “red” neurons in V8 that confer work can be extended to begin translating phenomenology + them their selectivity, and non-r (¬r) all the other connec- into the language of mathematics. + tions within the main corticothalamic complex. Adding r in At present, the very notion of a theoretical approach to + isolation at the bottom of Q (null context) yields a small consciousness may appear far-fetched, yet the nature of the + q-arrow (called the down-set of red or 2r) that points in a problems posed by a science of consciousness requires a + direction representing how r by itself shapes the maximum combination of experiment and theory: one could say that + entropy distribution into an actual repertoire. Schematically, theories without experiments are lame, but experiments + this situation resembles that of a vegetative patient with V8 without theories are blind. For instance, only a theoretical + and its afferents intact but the rest of the corticothalamic framework can go beyond a provisional list of candidate + system destroyed. The shape of the experience or quale mechanisms or brain areas and provide a principled expla- + reduces to this q-arrow, so its quantity is minimal ( for this nation of why they may be relevant. Also, only a theory can + q-arrow is obviously low) and its quality minimally speci- account, in a coherent manner, for key but puzzling facts + fied: as we have seen with the photodiode, r by itself cannot aboutconsciousnessandthebrain,suchastheassociationof + specify whether the experience is a color rather than some- consciousness with the corticothalamic but not the cerebel- + thing else such as a shape, whether it is visual or not, lar system, the “unconscious” functioning of many cortico- + sensory or not, and so on. subcortical circuits, or the fading of consciousness during + Bycontrast, subtract r from the set of all connections, so certain stages of sleep or epilepsy. + one is left with ¬r. This “lesion” collapses the q-fold spec- A theory should also generate relevant corollaries. For + ified by r in all contexts, including the q-arrow, called the example, the IIT predicts that consciousness depends exclu- + up-set of non-red (1¬r), which starts from the full context sively on the ability of a system to generate integrated + provided by all other connections ¬r and reaches the top of information: whether or not the system is interacting with + the quale.10 This q-arrow will typically be much longer and the environment on the sensory and motor side, it deploys + point in a different direction than the q-arrow generated by language, capacity for reflection, attention, episodic mem- + r at the bottom of the quale. This is because, the fuller the ory, a sense of space, of the body, and of the self. These are + context, the more r can shape the actual repertoire. Sche- obviously important functions of complex brains and help + matically, removing r from the top resembles the situation shape its connectivity. Nevertheless, contrary to some com- + of an achromatopsic patient with a selective lesion of V8: mon intuitions, but consistent with the overall neurological + the bulk of the experience or quale remains intact ( re- evidence, none of these functions seems absolutely neces- + mains high), but a noticeable feature of its shape collapses sary for the generation of consciousness “here and now” + (the upset of non-red). According to the IIT, the feature of (Tononi and Laureys, 2008). + the shape of the quale specified by “the upset of non-red” Finally, a theory should be able to help in “difficult” cases + 11 + captures the very quality or “redness” of red. that challenge our intuition or our standard ways to assess + It is worth remarking that the last example also shows consciousness. For instance, the IIT says that the presence + why specific qualities of consciousness, such as the “red- and extent of consciousness can be determined, in principle, + ness” of red, while generated by a local mechanism, cannot also in cases in which we have no verbal report, such as + be reduced to it. If an achromatopsic subject without the r infants or animals, or in neurological conditions such as + connections lacks precisely the “redness” of red, whereas a minimally conscious states, akinetic mutism, psychomotor + vegetative patient with just the r connections is essentially seizures, and sleepwalking. In practice, of course, measur- + unconscious, then the redness of red cannot map directly to ing  accurately in such systems will not be easy, but + the mechanism implemented by the r connections. How- approximations and informed estimates are certainly con- + ever, the redness of red can map nicely onto the informa- ceivable. Whether these and other predictions turn out to be + tional relationships specified by r, as these change dramat- compatible with future clinical and experimental evidence, + ically between the null context (vegetative patient) and the a coherent theoretical framework should at least help to + full context (achromatopsic subject). systematize a number of neuropsychological and neurobio- + logical results that might otherwise seem disparate (Albus et + AProvisional Manifesto al., 2007). + In the remaining part of this article, I briefly consider + To recapitulate, the IIT claims that the quantity of con- some implications of the IIT for the place of experience in + sciousness is given by the integrated information () gen- our view of the world. + erated by a complex of interacting elements, and its quality + by the shape in Q specified by their informational relation- Consciousness as a fundamental property + ships. As I have tried to indicate here, this theoretical + framework can account for basic neurobiological and neu- According to the IIT, consciousness is one and the same + ropsychological observations. Moreover, the same frame- thing as integrated information. This identity, which is + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + CONSCIOUSNESS AS INTEGRATED INFORMATION 233 + predicated on the phenomenological thought experiments at Consciousness as an intrinsic property + the origin of the IIT, has ontological consequences. Con- + sciousness exists beyond any doubt (indeed, it is the only Consciousness, as a fundamental property, is also an + thing whose existence is beyond doubt). If consciousness is intrinsic property. This simply means that a complex + integrated information, then integrated information exists. generating integrated information is conscious in a cer- + Moreover, according to the IIT, it exists as a fundamental tain way regardless of any extrinsic perspective. This + quantity—as fundamental as mass, charge, or energy. As point is especially relevant if we consider how difficult it + long as there is a functional mechanism in a certain state, it is to measure the quantity of integrated information, not + must exist ipso facto as integrated information; specifically, to mention the shape of a quale, for any realistic system. + it exists as an experience of a certain quality (the shape of If we want to know what are the borders of a certain + 12 + the quale it generates) and quantity (its “height” ). complex, the amount of integrated information it gener- + If one accepts these premises, a useful way of thinking ates, the set of informational relationships it specifies, + about consciousness as a fundamental property is as fol- and the spatio-temporal grain at which  is highest (see + lows. We are by now used to considering the universe as a below), we need to perform a prohibitively large set of + vast empty space that contains enormous conglomerations computations. One would need to perturb a system in all + of mass, charge, and energy—giant bright entities (where possible ways and use Bayes’ rule to keep track of the + brightness reflects energy or mass) from planets to stars to probabilities of the previous states given the current + galaxies. In this view (that is, in terms of mass, charge, or output, and then calculate the relative entropy between + energy), each of us constitutes an extremely small, dim the potential and the actual distributions. Moreover, this + portion of what exists—indeed, hardly more than a speck of + dust. must be done for all possible subsets of a system (to find + However, if consciousness (i.e., integrated information) complexes) and for all combinations of connections (to + exists as a fundamental property, an equally valid view of obtain the shape of each quale). Finally, the calculations + the universe is this: a vast empty space that contains mostly must be repeated at multiple spatial and temporal scales + nothing, and occasionally just specks of integrated informa- to determine what is the optimal grain size, in space and + tion ()—mere dust, indeed—even there where the mass- time, for generating integrated information (see below). It + charge–energy perspective reveals huge conglomerates. On goes without saying that these calculations are presently + the other hand, one small corner of the known universe unfeasible for anything but the smallest systems. It also goes + contains a remarkable concentration of extremely bright without saying that a complex itself cannot and need not go + entities (where brightness reflects high ), orders of mag- through such calculations: it is intrinsically conscious in this + nitude brighter than anything around them. Each bright or that way. In fact, it needs as little to “calculate” all the + “-star” is the main complex of an individual human being relevant probability distributions to generate consciousness + 13 + (and most likely, of individual animals). I argue that such and specify its quality, as a body of a certain mass needs to + -centric view is at least as valid as that of a universe “calculate” how much gravitational mass it has in order to + dominated by mass, charge, and energy. In fact, it may be attract other bodies. + more valid, since to be highly conscious (to have high ) Another way to express this aspect of integrated infor- + implies that there is something it is like to be you, whereas mation is to say that consciousness can be characterized + if you just have high mass, charge, or energy, there may be extrinsically as a disposition or potentiality –in this case as + little or nothing it is like to be you. From this standpoint, it the potential discriminations that a complex can do on its + would seem that entities with high  exist in a stronger possible states, through all combinations of its mechanisms, + sense than entities of high mass. yet from an intrinsic perspective it is undeniably actual. + Intriguingly, it has been suggested, from a different per- While this may sound strange, fundamental quantities asso- + spective, that information may be, in an ontological sense, ciated with physical systems can also be characterized as + prior to conventional physical properties (the it from bit dispositions or potentialities, yet have actual effects. For + perspective; Wheeler and Ford, 1998). This may well be example, mass can be characterized as a potentiality—say + true but, according to the IIT, only if one substitutes “inte- + 14 the resistance that a body would offer to acceleration by a + grated information” for information. Information that is + not integrated, I have argued, is not associated with expe- force—yet it exerts undeniably actual effects, such as actu- + rience, and thus does not really exist as such: it can only be ally attracting other masses if these turn out to be there. + given a vicarious existence by a conscious observer who Similarly, a mechanism’s potential for integrated informa- + exploits it to achieve certain discriminations within his main tion becomes actual by virtue of the fact that the mechanism + complex. Indeed, the same “information” may produce very is actually in a particular state. Paraphrasing E. M. Forster, + different consequences in different observers, so it only one could express this fact as follows: How do I know what + exists through them but not in and of itself. I am till I see what I do? + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + 234 G. TONONI + Being and describing 2. Note that, if we try to “integrate” the couples by adding + According to the IIT, a full description of the set of horizontal connections between elements, we reduce the + informational relationships generated by a complex at a available information. Thus, integrated information has to + given time should say all there is to say about the experience be evaluated from the perspective of the system itself, + 17 starting from its elementary, indivisible components (see + it is having at that time: nothing else needs to be added. also the next point), and not by arbitrarily imposing “units” + Nevertheless, the IIT also implies that to be conscious—say from the perspective of an observer. + to have a vivid experience of pure red—one needs to be a Figure 9B (top) illustrates a similar problem with respect + complex of high ; there is no other way. Obviously, to elementary operations. The system contains n +1 binary + although a full description can provide understanding of components, with a single component receiving inputs from + what experience is and how it can be generated, it cannot the other n; the component fires if all n inputs are active. + substitute for it: being is not describing. This point should TheminimuminformationpartitionisthetotalpartitionP + be uncontroversial, but it is worth mentioning because of a {X}andnbitswhenthetopcomponentisfiring,since + well-known argument against a scientific explanation of it uniquely specifies the prior state of the other n compo- + consciousness, best exemplified by a thought experiment nents. Increasing the number of inputs feeding into the top + involving Mary, a neuroscientist in the 23rd century (Jack- component while maintaining the same rule—fire if and + son, 1986). Mary knows everything about the brain pro- only if all inputs are active—seems to provide a method for + cesses responsible for color vision, but has lived her whole constructing systems with high 15 - + life in a black-and-white room and has never seen any using binary compo + color.18 The argument goes that, despite her complete nents and a basic architecture that is certainly easy to + knowledge of color vision, Mary does not know what it is describe. The difficulty once again lies in physically imple- + like to experience a color: it follows that there is some menting a component that processes n inputs at a single + knowledge about conscious experience that cannot be de- point in space and at a single instant in time for large n. + ducedfromknowledgeaboutbrainprocesses.Theargument Figure 9B (bottom) shows a possible internal architecture of + loses its strength the moment one realizes that conscious- the component, constructed using a hierarchy of logical + ness is a way of being rather than a way of knowing. AND-gates. When analyzed at this level, it is apparent that + According to the IIT, being implies “knowing” from the the system generates 1 bit of integrated information regard- + inside, in the sense of generating information about one’s less of the number of inputs that feed into the top compo- + previous state. Describing, instead, implies “knowing” from nent, since the bipartition framed by the dashed cut forms a + the outside. This conclusion is in no way surprising: just bottleneck. As in the previous example, integrated informa- + consider that though we understand quite well how energy tion has to be evaluated from the perspective of the system + is generated by atomic fission, unless atomic fission occurs, itself, based on the elementary causal interactions its ele- + no energy is generated—no amount of description will ments can perform, and not by arbitrarily imposing “rules” + substitute. from the perspective of an observer with no regard to their + actual implementation. It is well known that all computa- + Observer pitfalls: minimal elements and minimal tions (or Boolean functions) can be performed by elemen- + interactions tary logical gates such as NOR or NAND gates acting on + elementary binary elements. In principle, then, a system + Because integrated information is an intrinsic property, it should be decomposed into minimal elements and minimal + is especially important that one avoid the observer fallacy in interactions—as elementary as they come in terms of phys- + estimating how much of it is generated by a system. Con- ical implementation—before any pronouncement is made + sider the system in Figure 9A (top). An observer might on its capacity to generate integrated information and + 16 + assume that the system is made up of two units, each with thereby consciousness. + n + a repertoire of 2 states. If the lower unit copies the output + of the upper unit, then this two-unit system generates n bits Consciousness and the spatiotemporal grain of reality + of integrated information—it would seem trivial to imple- + ment systems with arbitrarily large values of . But how is An outstanding issue is finding a principled way to de- + the system really built? Figure 9A (bottom) shows a possi- termine the proper spatial and temporal scale to measure + ble architecture: each “unit” is actually not a unit at all, but informational relationships and integrated information. + it contains n binary elements. Each upper element is then What are the elements upon which probability distributions + connected to the corresponding lower element. Seen this of states are to be evaluated? For example, are they mini- + way, it becomes obvious that the system is not a complex columns or neurons? And what about molecules, atoms, or + generating n bits of integrated information, but rather a subatomic particles? Similarly, what is the “clock” to use to + collection of independent couples (or photodiodes) each identify system states? Does it run in seconds, hundreds of + generating 1 bit of integrated information, just as in Figure milliseconds, milliseconds, or microseconds? + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + CONSCIOUSNESS AS INTEGRATED INFORMATION 235 + AA B Φ = n bits + 1 ... n + Φ = n bits + A‘A‘ B‘ + B‘ + Φ = 1 bit + 1 n + ... + ... ... ... ... + 1 ... n + Φ = 1 bit MIP + Φ = 0 bits + Figure 9. Analyzing systems in terms of elementary components and operations. (A) and (B) show + systems that on the surface appear to generate a large amount of integrated information. The units in (A) have + n + a repertoire of 2 outputs, with the bottom unit copying the top. Integrated information is n bits. By analyzing + the internal structure of the system in (A )wefindn disjoint couples, each integrating 1 bit of information; the + entire system, however, is not integrated. (B) shows a system of binary units. The top unit receives inputs from + eight other units and performs an AND-gate like operation, firing if and only if all eight inputs are spikes. + Increasing the number of inputs appears to easily increase  without limit. (B ) examines a possible imple- + mentation of the internal architecture of the top unit using binary AND-gates. The architecture has a bottleneck, + shown as the MIP line, so that 1 bit regardless of the number of input units. + Properly addressing this issue requires a comprehensive Tononi, unpubl.). The working hypothesis is as follows + theoretical approach to the relationship between integrated (Tononi, 2004): In general, for any system, integrated in- + information, emergence, and memory (Balduzzi and formation is generated at multiple spatiotemporal scales. In + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + 236 G. TONONI + particular, however, there will often be a privileged spatio- sible quality—that is captured by a single q-arrow of length + temporal “grain size” at which a given system forms a 1 bit.19 + complex of highest —the spatiotemporal scale at which it How close is this position to panpsychism, which holds + “exists” the most in terms of integrated information, and that everything in the universe has some kind of conscious- + therefore of consciousness. ness? Certainly, the IIT implies that many entities, as long + For example, while in the brain there are many more as they include some functional mechanisms that can make + atoms than neurons, it is likely that complexes at the spatial choices between alternatives, have some degree of con- + scale of atoms are exceedingly small, or at any rate that they sciousness. Unlike traditional panpsychism, however, the + cannot maintain both functional specialization and long- IIT does not attribute consciousness indiscriminately to all + range integration, thus yielding low values of .Atthe things. For example, if there are no interactions, there is no + other extreme, the spatial scale of cortical areas is almost consciousness whatsoever. For the IIT, a camera sensor as + certainly too coarse for yielding high values of . Some- such is completely unconscious (in fact, it does not exist as + where in between, most naturally at the grain size of neu- an entity). Moreover, panpsychism hardly has a solid con- + rons or minicolumns, the neuroanatomical arrangement en- ceptual foundation. The attribution of consciousness to all + sures an ideal mix of functional specialization and kinds of things is based more on an attempt to avoid dualism + integration, leading to the formation of a large complex of than on a principled analysis of what consciousness is. + high . Similarly, panpsychism offers hardly any guidance as to + Similarly, with respect to time, neurons would yield zero what would determine the amount of consciousness associ- +  at the scale of microseconds, since there is simply not + enough time for engaging their mechanisms. At long time ated with different things (such as humans, animals, plants, + scales, say hours,  would also be low, as output states or rocks), or with the same thing at different times (say + would bear little relationship to input states. Somewhere in wakefulness and sleep), not to mention that it says nothing + between, at a time scale of tens to hundreds of milliseconds, about what would determine the quality of experience. + the firing pattern of a large complex of neurons should be A more relevant issue is the following: How can the + maximally predictive of its previous state, thus yielding theory attribute consciousness (albeit minimal) to a photo- + high . It is not by chance, according to the IIT, that this is diode, while acknowledging that we “lose” consciousness + both the time scale at which experience seems to flow every night when falling into dreamless sleep? After all, the + (Bachmann, 2000) and that at which long-range neuronal sleeping brain likely generates more integrated information + 21 + interactions occur (Dehaene et al., 2003; Koch, 2004). than a photodiode. Two considerations are in order. First, + This working hypothesis also suggests that the generation we have first-hand “experience” that consciousness can be + of integrated information may set an intrinsic framework for graded: falling asleep is often a rapid process but, before we + both space and time. With respect to time, for example, are “gone” altogether, we occasionally do go through some + consider a complex generating a certain shape in Q through degree of restriction in the field of consciousness, where we + a fast mechanism, and another complex that generates ex- are progressively less aware of ourselves and the environ- + actly the same shape, but through a slower mechanism. It ment. Something similar also happens at certain stages of + would seem that these two complexes should generate ex- alcohol intoxication. So the level of consciousness can + actly the same experience, except that time would flow indeed change around our typical waking baseline, allowing + faster in one case and slower in the other. Similar consid- for some gradation. + erations may apply to space. Also, according to the IIT, Below a certain level of consciousness, however, it truly + what constitutes a “state” of the system is not an arbitrary feels as if we fade away completely. But is consciousness + choice from an extrinsic perspective, but rather the spatio- really annihilated? Is it likely that when we “lose” con- + temporal grain size at which the system can best generate sciousness the amount of integrated information generated + information about its past: what is, is what can make a by the corticothalamic main complex decreases nonlin- + difference. early? Computer simulations indicate that when the overall + Consciousness as a graded quantity activation of corticothalamic networks goes below a certain + level, there is a sudden drop in the average effective infor- + The IIT claims that consciousness is not an all-or-none mation between distant parts of the cortex (Tononi, unpubl. + property, but is graded: specifically, it increases in propor- obs.). In other words, below a certain threshold of activation + tion to a system’s repertoire of discriminable states. Strictly the corticothalamic system breaks down into nearly inde- + speaking, then, the IIT implies that even a binary photo- pendent pieces and cannot sustain integrated patterns of + diode is not completely unconscious, but rather enjoys ex- firing. This could explain why it feels as if consciousness is + actly 1 bit of consciousness. Moreover, the photodiode’s vanishing in an almost all-or-none manner rather than di- + 20 + consciousness has a certain quality to it—the simplest pos- minishing progressively. + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + CONSCIOUSNESS AS INTEGRATED INFORMATION 237 + The limited capacity of consciousness such as fruit flies, or even more when one considers man- + It is often stated that the brain discards most of the made artifacts, arguments from analogy lose their strength, + incoming information, and that only a very small portion and it is hard to know what to think. The IIT has a straight- + trickles into consciousness. Thus, though the retina can forward position on this issue: to the extent that a mecha- + transmit millions of bits per second, some estimates suggest nism is capable of generating integrated information, no + that just a few bits per second make it to consciousness matter whether it is organic or not, whether it is built of + (Nørretranders, 1998), which is abysmally little by engi- neurons or of silicon chips, and independent of its ability to + neering standards. Indeed, as shown by classic experiments, report, it will have consciousness. Thus, the theory implies + we cannot keep in mind more than a few things at a time. that it should be possible to construct highly conscious + For the IIT, however, the informativeness of conscious- artifacts by endowing them with a complex of high  (Koch + ness is not related to how many chunks of information a and Tononi, 2008). Moreover, it should be possible to + single experience might contain. Instead, it relates to how design the quality of their conscious experience by appro- + many different states are ruled out. Since we can easily priately structuring their effective information matrix. + discriminate among trillions of conscious states within a Such a position should not be read as implying that + fraction of a second, the informativeness of conscious ex- building conscious artifacts may be easy, or that many + perience must be considerable. Presumably, the so-called existing man-made products, especially “complicated” + capacity limitation of consciousness reflects an upper bound ones, should be expected to have high values of . The + on how many partially independent subprocesses can be conditions needed to build complexes of high , such as a + sustained within the main complex without compromising combination of functional specialization and integration, are + its integration. apparently not easy to achieve. Moreover, computer simu- + Another consequence of the need for integration is the lations suggest that seemingly “complicated” networks with + seemingly serial nature of consciousness. Since a complex many nodes and connections, whose connection diagram + constitutes a single entity, it must move from one global superficially suggests a high level of “integration,” usually + state to another, and its temporal evolution must follow a turn out to break down into small local complexes of low , + single trajectory. Indeed, dual-task paradigms and the psy- or to form a single entity with a small repertoire of states + chological refractory period show that decisions or choices and therefore also of low : a paradigmatic example is a + can only occur one at a time (Pashler, 1998). Such choices network with full connectivity, which can be shown to + take around 150 milliseconds, a figure remarkably close to generate at most 1 bit of integrated information (Balduzzi + the lower limit of the time typically needed for conscious and Tononi, 2008). Though we do not know how to calcu- + integration. late the amount of integrated information, not to mention the + More generally, although transmitting and storing infor- shape of the qualia, generated by structures such as a + mation is relatively cheap and easy, generating integrated computer chip, the World Wide Web, or the proverbial + information would seem to be more expensive and difficult. network of Chinese talking on the phone (Block, 1978), it is + Ensuring that a system forms a complex (integration) re- likely that the same principles apply: high  requires a very + quires many connections per element, and connections are special kind of complexity, not just having many elements + usually expensive. At the same time, ensuring that the intricately linked. Just think of something as complex as the + complex can discriminate among a large number of states cerebellum and its negligible contribution to consciousness. + (information) requires that connections are patterned so that Whether certain kinds of random networks (Tononi and + elements are both functionally specialized and capable of Sporns, 2003), or even periodic network such as grids + acting as a single entity, which is usually difficult. Thus, it (Balduzzi and Tononi, 2008), could achieve high values of + may be more fitting to say that the brain, rather than dis- (albeit inefficiently) by simply increasing the number of + carding information, sifts through the chaff to extract pre- elements remains to be determined. The brain certainly + cious kernels of integrated information. To use another exploits grid-like arrangements (as in early sensory areas) + metaphor, if information were like carbon, mere informa- and certain kinds of near-random connectivity (as in pre- + tion would be like a heap of coal, and integrated information frontal areas and perhaps, at a finer scale, everywhere else). + like a precious diamond. Moreover, the small world architecture of the cerebral cor- + Conscious artifacts? tex and its hub-like backbone may be especially well-suited + to integrating information (Sporns et al., 2000; Hagmann et + Many scientists think that other species beyond humans al., 2008). At present, even for very small networks of just + are likely to be conscious (Koch, 2004) based on common- a dozen elements, the only way to increase  is by brute- + alities of behavior and on the overall similarity between force optimization, which is clearly unfeasible for more + their corticothalamic system and ours. But when it comes to realistic networks, or through adaptation to a rich environ- + species that have radically different neural organization, ment (Tononi et al., 1996). + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + 238 G. TONONI + Consciousness and meaning A + Thenotionofintegrated information and, more generally, 1 234 Sensors + the set of informational relationships that constitute a quale, + are closely related to the notion of meaning and, more + generally, semantics. Here I briefly discuss how meaning + requires a system capable of integrating information and, + more specifically, how meaning is captured by concepts. + For the IIT, mechanisms generate meanings. Moreover, + only the mechanisms within a single complex do so. A 6 7 8 Detectors + 56 8 + mechanism modifies a probability distribution (the context 5 + to which it is applied) into another distribution, thereby copy copy copy copy + specifying an informational relationship. In essence, then, a + mechanism rules out certain states and rules in others. Note B + the parallel with semantics, where a sentence’s meaning is + specified by the possible worlds in which it is true and false. 1 Sensors + 1 234 + Also, as in semantics, the meaning changes depending on + the context in which the mechanism acts. For the IIT, + however, meaning is only meaningful within a complex— + mechanisms belonging to disjoint complexes do not gener- + ate meaning. In fact, what is meaningful is each individual + experience, and its meaning is completely and univocally + Concepts + specified by the shape of its quale. For example, a photo- 5678 + 22 + diode generating a single q-arrow means (i.e., specifies) parity symmetry contiguity balance + very little, whereas a large and complex quale means (i.e., + specifies) much more. The IIT is also precise about the Figure 10. Meaning. (A) The “copy system.” Each output element is + possible worlds that need be considered: they are the states connected to a different input element, implementing for each sensor- + encompassed by the maximum entropy distribution of a detector couple the function “D  S.” The copy system relays all four bits + in the input but, since it decomposes into four separate complexes, it + complex. How meanings “in the head” of different subjects generates no integrated information. Each sensor-detector couple generates + refer to the external world is a different matter, which 1 bit of integrated information and a single informational relationship + requires considering the matching between internal and (q-arrow), corresponding to the simplest possible concept: that things are + external relationships (see below). one way rather than another way (just like the photodiode in Fig. 1). (B) + Recall that concepts are entangled q-arrows that group The “conceptual” system. Each output element receives connections from + all four input elements, and performs a more complex Boolean function on + together certain states of affairs in a way that cannot be the input. The q-arrow generated by each output element (i.e., by its + decomposed into the mere sum of simpler groupings (see afferent connections) is entangled (the information generated jointly by its + also Feldman, 2003). Figure 10 shows two systems com- four afferent connections is higher than the sum of the information gen- + prising four input elements (sensors) and four output ele- erated by each connection independently). An entangled q-arrow consti- + ments (detectors). The “copy” system (Fig. 10A, similar to tutes a concept. In this case, the first element being off means “even” input, + the second on means “symmetrical,” the third off “non-contiguous,” the + the camera example in Fig. 2, left side) is such that each fourth on “balanced.” The q-arrow generated by all afferents to output + output element is connected to a different input element, elements considered together is also entangled, and means something like + implementing for each sensor-detector couple the function this: things are this particular way—an even, symmetrical, non-contiguous, + “DS.”Thecopysystemrelays all 4 bits in the input but, balanced input—rather than many different ways. The conceptual system + since it decomposes into four separate complexes, it gener- has literally added meaning to the input string. Moreover, the conceptual + system realizes this concept as a single entity—a complex having high + ates no integrated information. Each sensor-detector couple integrated information—rather than as a collection of smaller entities, each + generates 1 bit of integrated information and a single infor- of which realizes only a partial concept. + mational relationship (q-arrow), corresponding to the sim- + plest possible concept: that things are one way rather than + another way (just like the photodiode in Fig. 1). otherwise); element 6 a “symmetry” function (on if the + Consider now the “conceptual” system (Fig. 10B). In this arrangement of on-and-off inputs is symmetric); element 7 a + case, each output element receives connections from all four “contiguity” function (on if on-or-off input elements are not + input elements, and performs a more complex Boolean separated by an element of the other sign); and element 8 a + 23 + function on the input. For example, output element 5 “balance” function (on if there are an equal number of on + 24 + could be implementing a “parity” function on the four input and off input elements). - + In this case, the q-arrow gener + elements (it is on if an odd number of inputs are on, and off ated by each output element (i.e., by its afferent connec- + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + CONSCIOUSNESS AS INTEGRATED INFORMATION 239 + tions) is entangled: the information generated jointly by its tainly true of nonliving things, at multiple scales: think of + four afferent connections is higher than the sum of the crystals or, at a much grander scale, of mountains. But it is + information generated by each connection independently spectacularly true of living organisms, also at multiple + (for example, the parity function can only be computed scales: from the vast catalog of proteins and protein com- + when all inputs are considered together). As I mentioned plexes—all of different shapes—to the inventory of cells, to + above, an entangled q-arrow constitutes a concept in Q, here that of organs, to the ramified tree of species, and within + embodied in single output elements integrating globally each species, to the panoply of different individuals. One + over all four input elements. Moreover, in this case the four could go on, and note how much of our own creations in + output elements specify different concepts, and thus gener- engineering, science, and art also represent the generation of + 25 + ate information about different aspects of the input string. novel shapes, never seen before, again in astonishing vari- + Thus, the first element being off means “even” input, the ety. Perhaps most relevant in this context is to consider how + second on means “symmetrical,” the third off “non-contig- even more extraordinary shapes would appear if we could + uous,” the fourth on “balanced.” The q-arrow generated by look at them in more than just three dimensions and at the + all afferents to the output elements taken together is also most appropriate level of organization. Take the brain at the + entangled: the information generated jointly by all afferent synaptic level, and disentangle its connectional organization + connections is higher than the sum of the information gen- in all its complexity: if one could visualize the intricacy of + erated independently by the afferents to each output ele- the “connectome” (Sporns et al., 2005) in a space of appro- + 26 + ment, - priate dimensionality, it would make for a remarkable shape + meaning something like this: things are this partic + ular way—an even, symmetrical, non-contiguous, balanced indeed. + input—rather than many different ways. The conceptual I mention all of this to come to a key aspect of the IIT: + system has literally added meaning to the input string. that experiences (i.e., qualia) are shapes too. As remarkable + Moreover, the conceptual system realizes this concept as a as the “enchanted loom” of anatomical connectivity and + single entity—a complex having high integrated informa- firing patterns is, it pales compared to the shape of an + tion—rather than as a collection of smaller entities, each of experience in qualia space. For example, the complex gen- + which realizes only a partial concept. erating the quale in Figure 5 has four elements (one of them + Indeed, meaning is truly in the eye of the beholder: an firing) and nine connections among them. This simple sys- + input string as such is meaningless, but becomes meaningful tem specifies a quale or shape that is described by 399 + the moment it is “read” by a complex with a rich conceptual points in a 16-dimensional qualia space. It is hard to imag- + structure (corresponding to high ). Moreover, a complex ine what may be the complexity of the quale generated by a + with many different concepts will “read” meaning into sizable portion of our brain. Add to this that the main + anything, whether the meaning is there or not. It goes complex within our brain, whatever its precise makeup in + without saying that it is a good idea to build such complexes terms of neurons and connections, is presumably generating + in such a way that its concepts are meaningful for interpret- a different shape, just as remarkable, every few hundred + ing the environment (for example, because they help predict milliseconds, often morphing smoothly into another shape + future inputs). Finally, the more a system is able to concep- as new informational relationships are specified through its + tualize, the more it “understands”; or, if it was built to mechanisms entering new states. Of course, we cannot + predict an environment, the more it “knows.” Imagine that dream of visualizing such shapes as qualia diagrams (we + you do not know Chinese and are presented with a large have a hard time with shapes generated by three elements). + number of Chinese characters. By and large, you will group Andyet, from a different perspective, we see and hear such + them into the category (concept) of “must be something in shapes all the time, from the inside, as it were, since such + Chinese,” since they are all equivalent to you. After you shapes are actually the stuff our dreams are made of— + have learned Chinese, however, each of the characters ac- indeed the stuff all experience is made of. + quires a new, individual meaning (this one is a this, and that + one is a that)—the input is the same, but the meaning has Consciousness and the world: matching informational + 27 + grown. relationships + The richness of qualia space Consciousness qua integrated information is intrinsic and + thus solipsistic. In principle, it could exist in and of itself, + People often marvel at the immensity of the known without requiring anything extrinsic to it, not even a func- + universe, and wonder about other possible universes that we tion or purpose. For the IIT, as long as a system has the right + may never know. But perhaps even more awe-inspiring is internal architecture and forms a complex capable of dis- + the variety and complexity of nature around us. Just think of criminating a large number of internal states, it would be + the number of different shapes that surround us, and their highly conscious. Such a system would not even need any + remarkable internal organization (see cover). This is cer- contact with the external world, and it could be completely + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + 240 G. TONONI + passive, watching its own states change without having to “inflate” along certain dimensions when the complex is + act.28 - presented with appropriate stimuli. + Depending on the informational relationships gener + ated by its architecture, its qualia could be just as interesting This working hypothesis also suggests that morphogene- + as ours, whether or not they have anything to do with the sis and natural selection may be responsible for a progres- + causal architecture of the external world. Strange as this sive increase in the amount of integrated information gen- + may sound, the theory says that it may be possible one day erated by biological brains, and thus for the evolution of + to construct a highly conscious, solipsistic entity. consciousness. This is because, in organisms exposed to a + Nevertheless, it is unlikely that a system having high  rich environment, plastic processes tend to increase func- + and interesting qualia would come to be by chance, but only tional specialization, while the brain’s massive interconnec- + by design or selection. Brain mechanisms, including those tivity ensures neural and behavioral integration. In fact, it + inside the main complex, are what they are by virtue of a appears that as a system incorporates statistical regularities + long evolutionary history, individual development, and fromits environment and learns to predict it, its capacity for + learning. Evolutionary history leads to the establishment of integrated information may grow (Tononi et al., 1996). It + certain species-specific traits encoded in the genome, in- remains to be seen whether, based on the same principles, + cluding brains and means to interact with the environment. the construction of shapes even more extensive and com- + Development and epigenetic processes lead to an appropri- plex may be achieved through nonbiological means. + ate scaffold of anatomical connections. Experience then Finally, the integrated information approach offers a + refines neural connectivity in an ongoing manner though straightforward perspective on why consciousness would be + plastic processes, leading to the idiosyncrasies of the indi- useful (Dennett, 1991). By definition, a highly conscious + vidual “connectome” and the memories it embeds. experience is a discrimination among trillions of alterna- + Since for the IIT, experiences are informational relation- tives—it specifies that what is the case is this particular state + ships generated by mechanisms, what is the relationship of affairs, which differs from a trillion other states of affairs + between the structure of experience and the structure of the in its own peculiar way, and in a way that is imbued with + world? Again, this issue requires a comprehensive theoret- evolutionary value. Equivalently, one can say that a quale of + ical approach (Tononi et al., 1996; Balduzzi and Tononi, high  represents a discrimination that is extremely con- + unpubl.), but the main idea is simple enough. Through text-sensitive, and thus likely to be useful. Experience is + natural selection, epigenesis, and learning, informational choice, and a highly conscious choice is a choice that is both + relationships in the world mold informational relationships highly informed and highly integrated. + within the main complex that “resonate” best on a commen- Recall the photodiode. For it, turning on specifies that + surate spatial and temporal scale. Moreover, over time these things are one way rather than another. What things might + relationships will be shaped by an organism’s values, to be like, it has 1 bit of a notion. For each of us, when the + reflect relevance for survival. This process can be envi- screen light turns on, the movie is about to begin. + sioned as the experiential analog of natural selection. As is + well known, selective processes act on organisms through Acknowledgments + differential survival to modify gene frequencies (genotype), + which in turn leads to the evolution of certain body forms I thank David Balduzzi, Chiara Cirelli, and Lice Ghilardi + and behaviors (extrinsic phenotype). Similarly, selective for their help, and the McDonnell Foundation for support. + processes (Edelman, 1987) acting on synaptic connections + through plastic changes modify brain mechanisms (neuro- Notes + type), which in turn modifies informational relationships + 29 1 One could say that the theory starts from two basic phenomenological + inside the main complex (intrinsic phenotype ) and thereby + consciousness itself. In this way, qualia—the shapes of postulates—(i) experience is informative; (ii) experience is integrated— + experience—come to be molded, sculpted, and refined by which are assumed to be immediately evident (or at least should be after + the informational structure of events in the world. going through the two thought experiments). In principle, the theory, + including the mathematical formulation and its corollaries, should be + Aworking hypothesis is that the quantity of “matching” derivable from these postulates. + between the informational relationships inside a complex 2 Note that two different distributions over the same states have relative + and the informational structure of the world can be evalu- entropy 0 even if they have the same entropy. + ated, at least in principle, by comparing the value of  when 3 One could paraphrase a classic definition of information (Bateson, 1972) + a complex is exposed to the environment, to the value of  and say that information is a difference that made a difference (the actual + when the complex is isolated or “dreaming” (Tononi et al., repertoire that can be discriminated by a given mechanism in a given state). + 1996). Similarly, the quality of matching can be evaluated 4 In other words, integrated information is a difference that made a + by how the shapes of qualia “resonate” with the environ- difference to a system, to the extent that the system constitutes a single + ment: for example, certain sub-shapes within a quale should entity. + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + CONSCIOUSNESS AS INTEGRATED INFORMATION 241 + 5 A phenomenon in which an observer may fail to perceive an image that useful to consider some of the paradoxes of information in physics from the + is presented after a rapid succession of other images. intrinsic perspective, that is, as integrated information, where the observer + 6 A condition in which, when different images are presented to each eye, is one and the same as the observed. + instead of seeing them superimposed, one perceives one image at a time, 15  would be high for one specific firing pattern; for all other ones it + and which image one perceives switches every 2 seconds. would be very low. + 7 The set of all subsets of connections forms a lattice (or more precisely a 16 Here I ignore the issue of whether serial and parallel mechanisms are + logic, characterized by an ordering relationship, join and meet operators, equivalent from the perspective of integrated information, as well as the + and a complement operator). issue of analog and digital computation (or quantum computation). In + 8 Univocally implies, for example, that the “inverted spectrum” is impos- general, it must be asked to what extent two systems that are implemented + sible: a given shape (quale) specifies red and only red, another one green differently actually specify the same complex and qualia when analyzed at + and only green. In turn, this implies that the neural mechanisms underlying the proper spatio-temporal grain. + the perception of red and green cannot be completely symmetric (Palmer, 17 It is worth reiterating that a full description is practically out of the + 1999). question for any realistic system. + 9 - + The set of all possible shapes generated by all possible systems corre 18 More appropriately, Mary should be like the achromatopsic patient + sponds to the set of all possible experiences. mentioned above, since otherwise she might be able to dream in color. + 10 More precisely, the lesion collapses all q-arrows generated by r starting + 19 Although the quality of the photodiode’s consciousness is the same + from any context; that is, it folds the quale along the q-fold specified by r. quality generated by a binary thermistor, and many other simple mecha- + 11 In lattices there is often a duality between elements (extensions) and nisms. + attributes (intensions). Going up the lattice we move from elementary 20 Our ability to judge gradations in the level of consciousness when + connections taken in isolation to all connections taken together. Going + downthe lattice, or up its dual, we move from the elementary attributes of absolute levels are low may also be poor. As a loose metaphor, consider + a fully specified experience (the redness of red) to an undifferentiated temperature. We are good at judging temperature as long as it fluctuates + experience, all of whose attributes are unspecified. around the usual range, say between 50 and +100 °C. However, when + temperature falls below that range, we become much less precise: both + 12 In essence, the very existence of a functional mechanism in a given state 200 and 273°C are inconceivably cold to us, and we certainly would + is saying something like this: Given that I am a certain mechanism in good not judge 200 to be much warmer than absolute zero. Similarly, a + order, and that I am a certain state, things must have been this way, rather complex generating 1 or 10 bits of integrated information may feel a bit + than other ways. In this sense, the information the mechanism generates is different (or rather 9 bits different), but it may feel like so little that, + a statement about the universe made from its own intrinsic perspective— compared to our usual levels of consciousness, it essentially feels like + indeed, the only statement it can possibly make. Another way of saying this nothing. Which is why, of course, it is good to have a thermometer or a + is that the mechanism is generating information by making an observation -meter. + or measurement—where the mechanism is both the observer and the + observed. In short, every (integrated) mechanism is an observer (of itself), 21 An optical metaphor can again be useful: things come crisply into + and the state it is in is the result of that observation. existence at a certain focal distance, and with a certain exposure time. At + 13 There may be concentrations of such bright objects elsewhere in the shorter or longer focal distances things vanish out of focus: if exposure + universe, but at present we have no positive evidence. time is too short, they do not register; if it is too long, they blur. + 14 The notion of integrated information can in principle be extended to 22 A photodiode or any other complex generating a quale consisting of just + encompass quantum information. There are intriguing parallels between a single q-arrow. + integrated information and quantum notions. Consider for example: (i) 23 Here I ignore the issue of decomposing complex Boolean functions into + quantum superposition and the potential repertoire of a mechanism (in a elementary mechanisms. + sense, before it is engaged, a mechanisms exists in a superposition of all its + possible output states); (ii) decoherence and the actual repertoire of a 24 Note that each of these functions should be thought of as implemented + mechanism (when the mechanism is engaged and enters a certain state, it according to its minimal formula (of shortest description length, i.e., of + collapses the potential repertoire into the actual repertoire); (iii) quantum minimal complexity). Clearly, minimal formulas that involve four inputs + entanglement and integrated information (to the extent that one cannot are more complex than formulas involving just one input (the parity + perturb two elements independently, they are informationally one). function, for instance, is notoriously incompressible). + There are also some points of contact between the notion of integrated 25 While the particular combination of concepts described here was chosen + information and the approach advocated by relational quantum mechanics for its familiarity (parity, symmetry, contiguousness, balance) rather than + (Rovelli, 1996). The relational approach claims that system states exist for informational efficiency, one can envision Boolean functions that + only in relation to an observer, where an observer is another system (or a realize “optimal” sets of concepts from the point of view of integrated + part of the same system). By contrast, the IIT says that a system can information. For example, the four functions may be chosen so that, on + observe itself, though it can only do so by “measuring” its previous state. average, the set of four output units jointly generate as much integrated + More generally, for the IIT, only complexes, and not arbitrary collections information as possible, up to the theoretical maximum of 4 bits of  for + of elements, are real observers, whereas physics is usually indifferent to every input string (by contrast, the “copy system,” while transmitting all 4 + whether information is integrated or not. bits in the input, would generate 4 times 1 bit of integrated information). + Other interesting issues concern the relation between the conservation of Obviously, building a system that could respond optimally to a large set of + information and the apparent increase in integrated information, and the input strings is exceedingly difficult (if at all possible), especially consid- + finiteness of information (even in terms of qubits, the amount of informa- ering the need to build such a system using simple Boolean functions as + tion available to a physical system is finite). More generally, it seems building blocks. + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). + 242 G. TONONI + 26 Again, it is difficult to build an optimal conceptual system that can Gazzaniga, M. S. 2005. Forty-five years of split-brain research and still + preserve all the information in the input, corresponding in this case to 4 bits going strong. Nat. Rev. Neurosci. 6: 653–659. + of integrated information for every input string. Hagmann, P., L. Cammoun, X. Gigandet, R. Meuli, C. J. Honey, V. J. + 27 The extreme case is watching noisy “snow” patterns flickering on a TV Wedeen, et al. 2008. Mapping the structural core of human cerebral + screen. We treat the overwhelming majority of TV frames as equivalent, cortex. PLoS Biol. 6: e159. + under the concept of “TV snow.” If one were an optimal conceptual Hobson, J. A., E. F. Pace-Schott, and R. Stickgold. 2000. Dreaming + system, however, each frame would be conceptualized as its own very and the brain: toward a cognitive neuroscience of conscious states. + particular kind of pattern (say exhibiting a certain amount of 17th order Behav. Brain Sci. 23: 793–842. + symmetries, another amount of 11th order symmetries, belonging to the 6th Jackson, F. 1986. What Mary didn’t know. J. Philos. 83: 291–295. + class of contiguousness, etc.). In a sense, every noisy frame would be read Koch, C. 2004. The Quest for Consciousness: A Neurobiological Ap- + as an astonishingly deep, rich, meaningful and unique pattern, perhaps as proach. Roberts, Denver, CO. + a work of art. Koch, C., and G. Tononi. 2008. Can machines be conscious? Spectrum + IEEE 45: 55–59. + 28 Dreams prove that an adult brain does not need the outside world to Koch, C., and N. Tsuchiya. 2007. Attention and consciousness: two + generate experience “here and now”: the mechanisms of the main complex distinct brain processes. Trends Cogn. Sci. 11: 16–22. + within the brain are sufficient, all by themselves, to generate the informa- Massimini, M., F. Ferrarelli, R. Huber, S. K. Esser, H. Singh, and G. + tional relationships that constitute experience. Not to mention that in Tononi. 2005. Breakdown of cortical effective connectivity during + dreams we tend to be remarkably passive. sleep. Science 309: 2228–2232. + 29 Indeed, the shape of experience can be said to be the quintessential Massimini, M., F. Ferrarelli, S. K. Esser, B. A. Riedner, R. Huber, M. + “phenotype.” Murphy, et al. 2007. Triggering sleep slow waves by transcranial + magnetic stimulation. Proc. Natl. Acad. Sci. USA 104: 8496–8501. + Literature Cited Nørretranders, T. 1998. The User Illusion: Cutting Consciousness + Down to Size. Viking, New York. + Albus, J. S., G. A. Bekey, J. H. Holland, N. G. Kanwisher, J. L. Palmer, S. E. 1999. Color, consciousness, and the isomorphism con- + Krichmar, M. Mishkin, et al. 2007. Aproposal for a Decade of the straint. Behav. Brain Sci. 22: 923–943; discussion 944–989. + Mind initiative. Science 317: 1321. Pashler, H. E. 1998. The Psychology of Attention. MIT Press, Cam- + Alkire, M. T., A. G. Hudetz, and G. Tononi. 2008. Consciousness and bridge, MA. + anesthesia. Science 322: 876–880. Posner, J. B., and F. Plum. 2007. Plum and Posner’s Diagnosis of + Baars, B. J. 1988. A Cognitive Theory of Consciousness. Cambridge Stupor and Coma, 4th ed. Oxford University Press, New York. + University Press, New York. Rovelli, C. 1996. Relational quantum mechanics. Int. J. Theor. Phys. 35: + Bachmann, T. 2000. Microgenetic Approach to the Conscious Mind. 1637–1678. + John Benjamins, Philadelphia. Sporns, O., G. Tononi, and G. M. Edelman. 2000. Theoretical neuro- + Balduzzi, D., and G. Tononi. 2008. Integrated information in discrete anatomy: relating anatomical and functional connectivity in graphs and + dynamical systems: motivation and theoretical framework. PLoS Com- cortical connection matrices. Cereb. Cortex 10: 127–141. + put. Biol. 4: e1000091. Sporns, O., G. Tononi, and R. Kotter. 2005. The human connectome: + Bateson, G. 1972. Steps to an Ecology of Mind: Collected Essays in a structural description of the human brain. PLoS Comput. Biol. 1: e42. + Anthropology, Psychiatry, Evolution, and Epistemology. Chandler, San Steriade, M., I. Timofeev, and F. Grenier. 2001. Natural waking and + Francisco. sleep states: a view from inside neocortical neurons. J. Neurophysiol. + Block, N., ed. 1978. Trouble with Functionalism, Vol. 9. Minnesota 85: 1969–1985. + University Press, Minneapolis. Tononi, G. 2001. Information measures for conscious experience. Arch. + Blumenfeld, H., and J. Taylor. 2003. Why do seizures cause loss of Ital. Biol. 139: 367–371. + consciousness? Neuroscientist 9: 301–310. Tononi, G. 2004. An information integration theory of consciousness. + Bower, J. M. 2002. The organization of cerebellar cortical circuitry BMCNeurosci. 5: 42. + revisited: implications for function. Ann. N.Y. Acad. Sci. 978: 135–155. Tononi, G., and G. M. Edelman. 1998. Consciousness and complexity. + Cover,T.M.,andJ.A.Thomas.2006. ElementsofInformationTheory, Science 282: 1846–1851. + 2nd ed. Wiley-Interscience, Hoboken, NJ. Tononi, G., and S. Laureys. 2008. The neurology of consciousness: an + Crick, F., and C. Koch. 2003. A framework for consciousness. Nat. + Neurosci. 6: 119–126. overview. Pp. 375–412 in The Neurology of Consciousness, S. Laureys + Dehaene, S., C. Sergent, and J. P. Changeux. 2003. A neuronal net- and G. Tononi, eds. Elsevier, Oxford. + work model linking subjective reports and objective physiological data Tononi, G., and O. Sporns. 2003. Measuring information integration. + during conscious perception. Proc. Natl. Acad. Sci. USA 100: 8520– BMCNeurosci. 4: 31. + 8525. Tononi, G., O. Sporns, and G. M. Edelman. 1996. A complexity + Dennett, D. C. 1991. Consciousness Explained. Little, Brown, Boston, measureforselectivematchingofsignalsbythebrain.Proc.Natl.Acad + MA. Sci. USA 93: 3422–3427. + Edelman, G. M. 1987. Neural Darwinism: The Theory of Neuronal van Zandvoort, M. J., T. C. Nijboer, and E. de Haan. 2007. Devel- + Group Selection. BasicBooks, New York. opmental colour agnosia. Cortex 43: 750–757. + Feldman, J. 2003. Acatalog of Boolean concepts. J. Math. Psychol. 47: Wheeler, J. A., and K. W. Ford. 1998. Geons, Black Holes, and + 75–89. Quantum Foam: A Life in Physics, 1st ed. Norton, New York. + This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM + All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). diff --git a/archive/tononi/Consciousness_as_Integrated_Information_Tononi_2008.pdf b/archive/tononi/Consciousness_as_Integrated_Information_Tononi_2008.pdf new file mode 100644 index 00000000..a232440d --- /dev/null +++ b/archive/tononi/Consciousness_as_Integrated_Information_Tononi_2008.pdf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:afe93ff5ff17cc860cef7f1305d2e6fab4ac4b99477e1859dcd3dbe8b4ed2e5c +size 659720 diff --git a/archive/zurek/Quantum_Darwinism_Zurek_2009.md b/archive/zurek/Quantum_Darwinism_Zurek_2009.md new file mode 100644 index 00000000..f7c25350 --- /dev/null +++ b/archive/zurek/Quantum_Darwinism_Zurek_2009.md @@ -0,0 +1,668 @@ + Quantum Darwinism + Wojciech Hubert Zurek + Theory Division, MS B213, LANL Los Alamos, NM, 87545, U.S.A. + QuantumDarwinismdescribestheproliferation, intheenvironment, ofmultiplerecordsofselected + states of a quantum system. It explains how the fragility of a state of a single quantum system can + lead to the classical robustness of states of their correlated multitude; shows how effective ‘wave- + packet collapse’ arises as a result of proliferation throughout the environment of imprints of the + states of quantum system; and provides a framework for the derivation of Born’s rule, which relates + probability of detecting states to their amplitude. Taken together, these three advances mark + considerable progress towards settling the quantum measurement problem. + The quantum principle of superposition implies that I. DECOHERENCEANDEINSELECTION + any combination of quantum states is also a legal state. + This seems to be in conflict with everyday reality: States Decoherence turns one of the two problems we noted + we encounter are localized. Classical objects can be ei- above–fragility of quantum states – into a solution of the + ther here or there, but never both here and there. Yet, the other. Environment-induced decoherence recognizes that + principle of superposition says that localization should be if a measurement can put a state at risk and re-prepare + a rare exception and not a rule for quantum systems. it, so can accidental information transfers that happen + Fragility of states is the second problem with quantum- whenever a system interacts with its environment. + classical correspondence: Upon measurement, a general Decoherence is by now well understood [3, 4, 5]: + preexisting quantum state is erased – it “collapses” into Fragility of states makes quantum systems very difficult + an eigenstate of the measured observable. How is it then to isolate. Transfer of information (which has no effect on + possible that objects we deal with can be safely observed, classical states) has dramatic consequences in the quan- + even though their basic building blocks are quantum? tum realm. So, while fundamental problems of classical + To bypass these obstacles Bohr [1] followed Alexander physics were always solved in isolation (it sufficed to pre- + the Great’s example: Rather than try disentangling the vent energy loss) this is not so in quantum physics (leaks + Gordian Knot at the beginning of his conquest, he cut of information are much harder to plug). + it. The cut separates the quantum from the classical. Whenaquantumsystemgivesupinformation, its own + Bohr’s Universe consists of two realms, each governed by state becomes consistent with the information that was + its own laws. Fragile superpositions were banished from disseminated. “Collapse” in measurements is an extreme + the classical realm deemed more fundamental and indis- example, but any interaction that leads to a correlation + pensable to interpret or even practice quantum theory. can contribute to such re-preparation: Interactions that + Thus, instead of trying to understand Universe (includ- depend on a certain observable correlate it with the en- + ing “the classical”) in quantum terms one “quantized” vironment, so its eigenstates are singled out, and phase + this and that, always starting from the classical base. relations between such pointer states are lost [6]. + Negative selection due to decoherence is the essence of + This was a brilliant tactical move: Physicists could environment-induced superselection, or einselection [7]: + conquerthequantumrealmwithoutgettingdistractedby Under scrutiny of the environment, only pointer states + interpretational worries. In those days only gedankenex- remain unchanged. Other states decohere into mixtures + periments like the famous Schr¨odinger cat [2] were truly of stable pointer states that can persist, and, in this sense, + disturbing: Real experiments dealt with electrons, pho- exist: They are einselected. + tons, atoms, or other microscopic systems. Bohr’s rule of These ideas can be made precise. The basic tool is the + arXiv:0903.5082v1 [quant-ph] 29 Mar 2009thumb – that the macroscopic is classical – was enough.reduced density matrix ρS. It represents the state of the + Moreover, many (including Einstein) believed that quan- system that obtains from the composite state ΨSE of S + tum physics is just a step on a way to a deeper theory and E by tracing out the environment E: + that will solve or bypass interpretational conundrums. ρ =Tr |Ψ ihΨ | . (1) + That did not happen. Instead, old gedankenexperi- S E SE SE + ments were carried out. They confirmed validity of quan- Evolution of ρS reveals preferred states: It is most pre- + tumlawsonscales that have, of recent, begun to infringe dictable when the system starts in a pointer state. To + on “the macroscopic”. Quantum theory is here to stay. quantify this one can use (von Neumann) entropy, H = + S + It is also increasingly clear that its weirdest predictions H(ρ ) = −Trρ lgρ , as a function of time. Pointer + S S S + – superpositions and entanglement – are experimental states result in smallest entropy increase. By contrast, + facts, in principle relevant also for macroscopic objects. their superpositions produce entropy rapidly, at decoher- + Therefore, questions about the origin of “the classical”, ence rates, especially when S is macroscopic. + with its restriction to localized states that are robust, un- When pure states of the system are sorted by pre- + perturbed by measurements, can no longer be dismissed. dictability, according to entropy of the evolved ρS, + 2 + pointer states are at the top. This criterion – the pre- + dictability sieve [4, 8, 9] – yields a short list of candidates + for effectively classical states: A cat can persist in one + of the two obvious stable states, but their superposition + would deteriorate into a mixture of |deadi and |alivei + when initiated in a way envisaged by Schr¨odinger [2]. + The special role of position is traced to the nature of + the SE interactions: They tend to depend on distance.             +     '  (  )  * +     + Hence, information about position is most readily passed                      %     &  #   "          + on to the environment. This is why localized states sur- !  "  # $ %     &  #   "     + ,      &  #   "  + vive while nonlocal superpositions decay into their mix- + tures. For example, in a weakly damped harmonic os- + cillator the minimum uncertainty wavepackets – familiar + coherent states, best quantum approximation of classical + points in phase space – are einselected [9, 10, 11]. + II. ENVIRONMENTASAWITNESS + Monitoring by the environment means that informa- -  + ,      &  #   "  . /  0  1 2     + tion about S is deposited in E. What role does it play,     3  4 5 #   "   6 .   . 0 6 6 .   + and what is its fate? Decoherence theory ignores it. En-   . / 7 8 9 0  1 : 7      ;  / 1 .     < + vironment is “traced out”. Information it contains is FIG. 1: Quantum Darwinism and the structure of the envi- + treated as inaccessible and irrelevant: E is a “rug to sweep ronment. Decoherence theory distinguishes between a system + under” the data that might endanger classicality. (S) and its environment (E) as in (a), but makes no further + Quantum Darwinism recognizes that “tracing out” is recognition of the structure of E; it could as well be mono- + not what we do: Observers eavesdrop on the environ- lithic. In Quantum Darwinism the focus is on redundancy. + ment. Vast majority of our data comes from fragments Werecognize the subdivision of E into subsystems, as in (b). + of E. Environment is a witness to the state of the system. The only requirement for a subsystem is that it should be + For example, this very moment you intercept a fraction individually accessible to measurements; observables of dif- + of the photon environment emitted by a screen or scat- ferent subsystems commute. To obtain information about S + tered by a page. We never access all of E. Tiny fractions from E one can then measure fragments F of the environ- + sufficetoreveal the state of various “systems of interest”. ment – non-overlapping collections of subsystems of E, (c). + This insight captures the essence of Quantum Darwin- ically, there are many copies of the information about S in E + ism: Only states that produce multiple informational off- – “progeny” of the “fittest observable” that survived monitor- + spring – multiple imprints on the environment – can be ing by E proliferates throughout E. This proliferation of the + multiple informational offspring defines Quantum Darwinism. + found out from small fragments of E. The origin of the Theenvironment becomes a witness with redundant copies of + emergentclassicality is then not just survival of the fittest information about the preferred observable. This leads to the + states (the idea already captured by einselection), but objective existence of pointer states: Many can find out the + their ability to “procreate”, to deposit multiple records state of the system independently, without prior information, + – copies of themselves – throughout E. and they can do it indirectly, without perturbing S. + Proliferation of records allows information about S to + be extracted from many fragments of E (in the example of the system was the basic tool of decoherence. To study + above, photon E). Thus, E acquires redundant records of Quantum Darwinism we focus on correlations between + S. Now, many observers can find out the state of S in- fragments of the environment and the system. The rele- + dependently, and without perturbing it. This is how pre- vant reduced density matrix ρ is given by: + ferred states of S become objective. Objective existence SF + – hallmark of classicality – emerges from the quantum ρ =Tr |Ψ ihΨ | . (2) + substrate as a consequence of redundancy. SF E/F SE SE + Decoherencetheorywasfocusedonthesystem. Itsaim Above, trace is over “E less F”, or E/F – all of E except + was to determine what states survive information leaks for the fragment F. How much F knows about S can be + to E. Now we ask: What information about the system quantified using mutual information: + can be found out from fragments of E? This change of + focus calls for a more realistic model of the environment I(S : F) = H +H −H , (3) + (Fig. 1): Instead of a monolithic E we recognize that envi- S F S,F + ronments consist of subsystems that comprise fragments defined as the difference between entropies of two sys- + independently accessible to observers. tems (here S and F) treated separately and jointly. For + The reduced density matrix ρ representing the state example, the mutual information between an original and + S + 3 + and indirectly – without perturbing S. + Rapid rise and gradual leveling of I(S : Ff), Fig. 2, + implies redundancy. The information in F allows one + f + to determine the state of S as it reaches redundancy + plateau. Observables of different F’s commute – such + measurements are independent. Yet, underlying corre- + lations mean that their outcomes imply the same state + of the system, as if S were classical: The redundancy + plateau is a classical plateau. Its level H is the classical + S + information accessible from a small fraction of E. + Redundancy allows for objective existence of the state + of S: It can be found out indirectly, so there is no danger + of perturbing S with a measurement. Error correction al- + FIG. 2: Information about S stored in E and its redundancy. lowedbyredundancyisalsoimportant: Fragilityofquan- + Mutual information is monotonic in f. When global state of tumstatesmeansthatcopiesinF’saredamagedbymea- + SE is pure, I(S : F ) in a typical fraction f of the environ- surements (we destroy photons!), and may be measured + f in a “wrong” basis. One cannot access records in E with- + ment is antisymmetric around f = 0.5 [13]. For pure states + picked out at random from the combined Hilbert space H , out endangering their existence. But with many (R ) + SE δ + there is little mutual information between S and a typical F copies, state of S can be found out by ∼ Rδ observers + smaller than half of E. However, once a threshold f = 1 is who can get their information independently, and with- + 2 + attained, nearly all information is in principle at hand. Thus, out prior knowledge about S. Consensus between copies + such random states (green line) exhibit no redundancy. By suggests objective existence of the state of S. + contrast, states of SE created by decoherence (where the en- The mutual information I(S : Ff) computed in mod- + vironment monitors preferred observable of S) contain almost els of decoherence exhibits behavior illustrated by the red + all (all but δ) of the information about S in small fractions plot of Fig. 2. In the family of models representing spin + f of E. The corresponding I(S : F ) (red line) quickly rises + δ f S surrounded by environments of many spins [12, 13, 14] + to HS (entropy of S due to decoherence), which is all of the the same number of spins suffices to reach the plateau: + information about S available from either E or S. (More, up AddingmorespinstoE onlyextendslengthoftheplateau + to 2H , can be obtained only through global measurements + S + on S and nearly all E). HS is therefore the classically acces- measured in “absolute units” – in the number of the en- + sible information. As (1 − δ)H of information is contained + S vironment spins. In this model (that can be viewed as + in f = 1/R of E, there are R such fragments in E: R + δ δ δ δ a simplified model of a photon environment) redundancy + is the redundancy of the information about S. Large redun- is then proportional to the number of the environment + dancy implies objectivity: The state of the system can be subsystems that interact with the system of interest. + found out indirectly and independently by many observers, Quantum Brownian motion – harmonic oscillator sur- + who will agree about their conclusions. Thus, Quantum Dar- rounded by many environmental oscillators – is the other + winism accounts for the emergence of objective existence. well known model of decoherence. It is exactly solvable, + and the case of an underdamped oscillator yields sur- + a perfect copy (of, say, a book) is equal to the entropy of prisingly simple results [15, 16]: (i) Mutual information + the original, as either contains the same text. So, every is approximately given by I(S : F) ≈ H + 1 ln f , + bit of information in the first copy reveals a bit of infor- S 2 (1−f) + and; (ii) Redundancy for an initially squeezed state of S + mationintheoriginal. However, having extra copies does 2δ + reaches R ≈ s , where s, the squeeze factor, quantifies + not increase the information about the original. Yet, it δ + determines how many can independently access this in- delocalization of the state. Similar equation should hold + formation. The number of copies defines redundancy. for more general “Schr¨odinger cat” states, with s quan- + Similar ideas apply to the quantum case. Initially, ev- tifying the separation of the two localized alternatives. + ery bit of information gained from a fraction f  1 of These results confirm intuitions that originally moti- + E that was pure before it monitored (and decohered) the vated Quantum Darwinism [4, 17]: Monitoring of the + system is a bit about S. The red plot in Fig. 2 starts with system by the environment can deposit multiple records + this steep “bit for bit” slope, but moderates as I(S : Ff) of preferred states of S in E. States of SE that arise from + approaches redundancy plateau at H , where additional decoherence are special [13, 14], as I(S : Ff) for a typ- + S ical pure state selected with Haar measure in the whole + bits only confirm what is already known. Hilbert space of SE (green plot in Fig. 2) shows. In + Redundancy is the number of independent fragments such random states small fragments reveal almost noth- + of the environment that supply almost all classical infor- ing about the rest of the state. Only when half of E is + mation about S, i.e., (1 − δ)H . In other words; + S found out the whole state is suddenly revealed. + R =1/f . (4) States that arise from decoherence are then far from + δ δ random. Roughlyspeaking, theyhaveabranch structure. + Rδ is the number of times one can acquire (1−δ) of the This is why the rest of such a branch including the state + information about S independently (from distinct F’s) of the system – the “bud” from which this branch has + 4 + originated – can be deduced from its fragment. We shall not interact with each other. This is why light deliv- + see how such branches grow in the next section. ers most of our information. Moreover, photons emitted + Plots of I(S : Ff) for pure SE are antisymmetric by the usual sources (e.g., sun) are far from equilibrium + around the point {H ,f = 1} for typical fragments of with our surroundings. Thus, even when decoherence is + S 2 dominated by other environments (e.g., air) photons are + E [13]. Thus, rapid rise for small f must be matched at muchbetter in passing on information they acquire while + the other end, for f ∼ 1. This is a signature of entan- “monitoring the system of interest”: Air molecules scat- + glement that allows state to be known “as the whole”, + while states of subsystems are unknown. The joint state ter from one another, so that whatever record they may + of SE is then pure, so that H =0, and I(S : F ) have gathered becomes effectively undecipherable. + S,F=E f Stability of the level of the redundancy plateau at H , + must rise to H +H =2H when f approaches 1. S + S E S even for mixed E’s, is a compelling reason to think of it as + This is a very quantum aspect of information. In clas- + sical physics knowing a composite object implies knowing “classical”. The question we shall now address concerns + each of its subsystems. This is not so in quantum physics, the nature of that information – what does the environ- + where composite states are given by tensor (rather than ment know about the system, and why? + Cartesian) products of their constituents. Thus, one can + know perfectly quantum state of the whole, but know + nothing about states of parts. We shall see in Section IV III. FROMCOPYINGTOQUANTUMJUMPS + how this feature can be used to derive Born’s rule [18] + that relates probabilities with wavefunctions. Quantum Darwinism leads to appearance, in the en- + To reveal this latent quantumness one would have to vironment, of multiple copies of the state of the system. + measure the right global observable on all of SE. For However, the no-cloning theorem [20, 21] prohibits copy- + example, when mutual information, Eq. (3), is defined ing of unknown quantum states. If cloning is outlawed, + using Shannon entropy with probabilities corresponding how can redundancy seen in Fig. 2 be possible? + to optimal observables in S and in E, the resulting Shan- Quickansweristhatcloningrefers to (unknown) quan- + non I(S : Ff) graph for small f would look very similar tum states. So, copying of observables evades the theo- + to Fig. 2. However, using Shannon entropy involves lo- rem. Nevertheless, the tension between the prohibition + cal probabilities (precluding global observables), so such on cloning and the need for copying is revealing: It leads + Shannon I(S : F ) never exceeds H , antisymmetry is + f S to breaking of unitary symmetry implied by the super- + lost, and the plateau continues until the end at f ∼ 1. position principle, accounts for quantum jumps, and sug- + Effective unattainability of the f ∼ 1 part of the plot gests origin of the “wavepacket collapse”, setting stage for + also shows why decoherence is so hard to undo: Correla- the study of quantum origins of probability in Section IV. + tions that reveal coherence can be usually detected only Quantum physics is based on several “textbook” pos- + by such global measurements of whole SE. We intercept tulates [22]. The first two; (i) States are represented by + small fractions of E, and never have the luxury of perfect vectors in Hilbert space, and; (ii) Evolutions are unitary – + global measurements needed to undo decoherence. Yet, give complete account of mathematics of quantum theory, + because of redundancy, we get ∼ HS information with but make no connection with physics. For that one needs + “sloppy” measurements of f  1. to relate calculations made possible by the superposition + Quantum Darwinism does not require pure E. Mixed principle of (i) and unitarity of (ii) to experiments. + environmentisanoisycommunicationchannel: Itsinitial Postulate (iii) Immediate repetition of a measurement + entropy of h per bit can still increase after interaction yields the same outcome starts this task. This is the only + with S, reflecting mutual information buildup. However, uncontroversial measurement postulate (even if it is diffi- + nowabitgainedfromE yieldsonly1−hofabitaboutS. cult to approximate in the laboratory): Such repeatability + So, a completely mixed E (h = 1) is useless (even though or predictability is behind the very idea of “a state”. + it can still induce decoherence!). For a partly mixed E In contrast to (i)-(iii), collapse postulate (iv) Outcomes + mutual information will increase more slowly, pure case correspond to eigenstates of the measured observable, and + “bit per bit” rate tempered to ∼ 1 − h. Yet, it can still only one of them is detected in any given run of the ex- + climb the same redundancy plateau at H [19]. periment, is inconsistent with (i) and (ii). Conflict arises + S + These conclusions apply when E is initially mixed, but for two reasons: Restriction to a preferred set of outcome + are also relevant when this channel is noisy for other rea- states seems at odds with with the egalitarian principle + sons (e.g., imperfect measurements). In all such cases one of superposition, embodied in (i). This restriction pre- + can still reach the same redundancy plateau, although vents one from finding out unknown quantum states, so + now a proportionally larger fragment of the environment it is responsible for their fragility. And a single outcome + is needed to get the same information about S. per run is at odds with unitarity (and, hence, linearity) + Suitability of the environment as a channel depends of quantum dynamics that preserves superpositions. + on whether it provides a direct and easy access to the The last axiom; (v) Probability of an outcome is given + records of the system. This depends on the structure by the square of the associated amplitude, p = |ψ |2, + k k + and evolution of E. Photons are ideal in this respect: is known as Born’s rule [18]. It completes the relation + They interact with various systems, but, in effect, do between mathematics of (i) and (ii) and the experiments. + 5 + a) b) c) + 1.0 50 1.0 + 0.8 ) 40 )0.8 + ) σ e + σ0.6 ( 30 :0.6 + 1 + ( . + 0 σ + ˆN0.4 R (0.4 + I 20 I + 0.2 µ=0.23 0.2 + 10 + 0 0 0 + 0 0 0 + π/4 π/4 π/4 40 50 + µ µ π/4 π/4 µ 30 + π/8 π/8 π/2 10 20 m + π/2 0 a a 0 + π/2 0 + FIG. 3: Quantum Darwinism in a simple model of decoherence [12]. The spin-1 S interacts with N = 50 spin-1 subsystems of E + P 2 2 + N S E 1 E E + withanIsingHamiltonianHSE = g σ ⊗σ k. TheinitialstateofS⊗E is √ (|0i+|1i)⊗|0i 1⊗...⊗|0i N. Couplingsg are + k=1 k z y 2 k + distributed randomly in the interval (0,1]. All the plotted quantities are a function of the observable σ(µ) = cos(µ)σ +sin(µ)σ , + z x + where µ is the angle between its eigenstates and the pointer states of S – eigenstates of σS. a) Information acquired by the + z + ˆ + optimal measurement on the whole environment, IN(σ), as a function of the inferred observable σ(µ) and the average interaction + action hgkti = a. A lot of information is accessible in the whole E about any observable σ(µ) except when a is so small that + there was no decoherence. b) Redundancy of the information about S as a function of the inferred observable σ(µ) and the + average action hg ti = a. R (σ) counts the number of times 90% of the total information can be “read off” independently + k δ=0.1 + by measuring distinct fragments of E. It is sharply peaked around the pointer observable: Redundancy is a very selective + criterion – the number of copies of relevant information is high only for the observables σ(µ) inside the theoretical bound (see + Ref.[12]) indicated by the dashed line. c) Information about σ(µ) extracted by local random measurements on m environmental + subsystems. Because of redundancy, pointer states – and only pointer states – can be found out through this far-from-optimal + strategy. Information about any other observable σ(µ) is restricted to what can be inferred from the pointer observable [12]. + Bohrbypassedconflictof(i)and(ii)with(iv)byinsist- demand”: As in cloning, one asks for “two (or more) of + ing that apparatus is classical, so unitarity and the prin- the same”. Its conflict with linearity of quantum the- + ciple of superposition need not apply to measurements. ory can be resolved only by restricting states that can + Butthis is an excuse, not an explanation. We are dealing be copied. Such pointer states then act as “buds” of + with a quantum environment, and redundancy of previ- branches that grow by reproducing, in E, multiple copies + ous section strengthened motivation for postulate (iii) – of the original in S. Interaction Hamiltonians do not per- + repeatability. Let us see where this demand takes us in turb observables that commute with them. So, buds of + a purely quantum setting of postulates (i), (ii), and (iii). branches coincide with the einselected pointer states. + Suppose there are states of S (say, |ui and |vi) that Evidence of such symmetry breaking is seen in Fig. + produce an imprint in a subsystem of E (which plays a 3. Mutual information and redundancy shown there are + role of an apparatus), but remain unperturbed (so they obtained using Eq. (3), but with Shannon (rather than + can produce more imprints). This repeatability implies: von Neumann) entropies of specific observables of S and + |ui|e i ⇒ |ui|e i, |vi|e i ⇒ |vi|e i in obvious notation. + 0 u 0 v F,i.e., using probabilities of their eigenstates. While von + In a unitary process scalar product is preserved. Thus; Neumann-based I(S : F ) and R characterized total + f δ + hu|vi = hu|vihe |e i , (5) information, Shannon-based counterparts are well suited + u v to enquire: What observable is this information about? + where we have set he |e i = 1. This simple equation + 0 0 It turns out that the environment as a whole “knows” + can be satisfied only when; (a) either he |e i = 1 (which + u v manyobservables of S, as is seen in Fig. 3a. By contrast, + meansthat copying was completely unsuccessful), or; (b) in Fig. 3b symmetry breaking is evident: The ridge of + hu|vi = 0, i.e., they are orthogonal. In that case he |e i + u v redundancy appears abruptly only when test observable + is arbitrary – perfect record he |e i = 0 is also possible. + u v σ(µ) and the preferred pointer observable σz (that re- + It follows that multiple (perfect or imperfect) copies mains unperturbed by the environment) nearly coincide. + of |ui and |vi can be imprinted in disjoint F’s. As a Why are pointer states favored? Commonsense says + consequence of unitarity, only sets of orthogonal states that, to be reproduced, state must survive copying. This + (that define Hermitean observables [22]) can be so copied, leads to a theorem [12, 24] that only pointer states can be + explaining selection of a set of outcomes – terminal points discovered from fractions of E. Other observables (such + of quantumjumps[23]. Before, they had to be postulated as σ(µ) in Fig. 3) can be deduced only to the extent they + by the first part of axiom (iv). We emphasize that this are correlated with the pointer observable. So, fragments + result relies on just two values of the scalar product – 0 of the environment offer a very narrow, projective point + and 1 – and, thus, does not appeal to Born’s rule. of view. Redundant imprinting of some observables hap- + This breaking of unitary symmetry (choice of preferred pens at the expense of their complements. + states in an egalitarian Hilbert space) is induced by re- + peatability of the information transfer. It is a “nonlinear Structure of branching state betrays its origin and fore- + 6 + P + shadows “collapse”. Starting from |ψ i = nψ |s i, Selection of the set of outcomes by the proliferation of + S k k k information essential for Quantum Darwinism parallels + n n + |Ψ i = Xψ |s i|e(1)i...|e(N)i = Xψ |s i|ε i (6) Bohr’s insistence [1] that a “classical apparatus” should + SE k k k k k k k determine the outcomes. However, it follows from purely + k k quantum Eq. (5), and is caused by a unitary evolution + branches grow to include N subsystems of E. Branch responsible for the information transfer. Nevertheless, as + J (j) (j) classical apparatus would, preferred pointer states desig- + fragments can be nearly orthogonal; Π he |e 0 i ' + j=1 k k nate possible future outcomes, precluding measurements + δkk0 for large enough J. This means that a pointer state of complementary observables or determining preexist- + |ski of S can be determined (along with the rest of the ing state of the system. Thus, information acquisition – + branch) from a sufficiently long fragment (which may still a copying process – results in preferred states. + be short compared to the length of the branch, J  N). Consensus between records deposited in fragments of + In the huge Hilbert space H branching state is a + SE E looks like “collapse”. In this sense we have accounted + very atypical minimally entangled superposition of only for postulate (iv) using only very quantum postulates (i)- + n product “branches” labelled by the pointer states of (iii). In particular, in deriving and analyzing Eq. (5) we + the system. This is tiny compared to the dimension of have not employed Born’s rule, axiom (v). We shall be + HSE that exceeds n by a factor exponential in N. This therefore able to use our results as a starting point for + is why the two plots in Fig. 2 are so different: Branch- such a derivation in the next section. + ing state is, to a good approximation, a multi-system There was nothing nonunitary above – unitarity was + Schmidt decomposition, with long branch fragments con- the crux of our argument, and orthogonality of branch + stituting “systems”. In a Schmidt decomposition, states seeds our main result. Relative states of Everett [26, 27, + of partners are in one-to-one correspondence. Thus, in 28] come to mind. One could speculate about reality of + Eq. (6), |s i implies |ε i (and, vice versa), and measur- + k k branches with other outcomes. We abstain from this – + ing a branch fragment F can reveal the whole branch. our discussion is interpretation-free, and this is a virtue. + Initial part of I(S : Ff), Fig. 2, represent buildup of Indeed, “reality” or “existence” of universal state vector + this correlation: When f = 0, observer is ignorant of seems problematic. Quantum states acquire objective + what branch he will find out, but the structure of the existence when reproduced in many copies. Individual + correlations within |ΨSEi leaves no doubt of what these states – one might say with Bohr – are mostly informa- + branches are. Using Born’s rule one could assign to them tion, too fragile for objective existence. And there is only + probabilities p = |ψ |2 and the corresponding entropy + k k one copy of the Universe. Treating its state as if it really + H . Next section shows how one can deduce these prob- + S existed [26, 27, 28] seems unwarranted and “classical”. + abilities without axiom (v) – how symmetries of entan- + glement imply Born’s rule. + When observer measures enough of E, he finds out IV. PROBABILITIES FROM ENTANGLEMENT + the branch (and what the state of S is). Additional + data are redundant. They only confirm what is already + known. Probabilities associated with |Ψ i are replaced Observer prepared S in a state |ψ i, but wants to mea- + SE S + with certainty of a branch. This transition from uncer- sure observable with eigenstates {|ski}. This will lead to + tainty (initial presence of many branches – potential for entangled |ΨSEi with branch structure, Eq. (6). Pointer + multiple outcomes) to certainty (once a sufficiently long states {|s i} define the outcomes, but, as yet, observer + branch fragment becomes known) accounts for percep- k + tion of “collapse”. The initial, steeply rising, part of has not measured E, and does not know the result. Given + I(S : F ) “resolves” it: Collapse is brief compared to |ΨSEi, what is the probability of, say, |s17i? + f To derive it we cannot use reduced density matrices, + the ensuing period of certainty about the outcome, as Eqs. (1,2). Tracing out is averaging [25, 29, 30] – it relies + fδ  1, but, nevertheless, not instantaneous. on p = |ψ |2, Born’s rule we want to derive. We have + Assumptionsthatleadfromcopyingtopreferredstates k k + imposed that ban while deriving and analyzing Eq. (5), + can be relaxed. Thus, E need not be initially pure [23]. but relaxed it to plot Fig. 3. Now we reimpose it again. + Moreover, it suffices that the records (e.g., in the appara- So, Born’s rule and standard tools of decoherence are + tus A) are “repeatably accessible”. Transfer of responsi- off limits – using them courts circularity. Our derivation + bility for repeatability from a quantum S to a (still quan- will rest instead on certainty and symmetry, cornerstones + tum)Aallowsonetomodelnon-orthogonalmeasurement that mark two extremal cases of probability. + outcomes (POVM’s): A entangles with the system, and The case of certainty was just settled without Born’s + then acts as ancilla. Its orthogonal pointer states |Aki rule using Eq. (5). When one re-measures an observable, + P ˜ + correlate with non-orthogonal |ς i of S, ψ |ς i|A i. the same outcome will be seen again. Thus, when {|s i} + k k k k k k + Interaction of A with the environment results in multiple includes |ψ i (e.g., |ψ i = |s i), newly added copies + S S 17 + copies of |A i. The usual projective measurement imple- just extend the branch already correlated with observer’s + k + mentation of POVM’s (see e.g. [25]) is now straightfor- state, and the outcome is certain; p =1. Certainty of + 17 + ward. Branches are labelled by |Aki. Indeed, we usually correlations between partners in Schmidt decomposition, + experience “quantum jumps” via an apparatus pointer. Eq. (6) is another important example. + 7 + a) ~ + ~ + b) + = + | | | | + + + S E S E + > > > > + | | | | | | | | + + + + S E S E S E S E + c) > > > > > > > > + | | | | | | | | + + + + S E S E S E S E + > > > > = > > > > + FIG. 4: Probabilities and symmetry: (a) Laplace used subjective ignorance to define probability. Player who does not know face + values of the cards, but knows that one of them is a spade will infer probability p = 1 for the top card. (b) The real physical + ♠ 2 + state of the system is however altered by the swap, illustrating subjective nature of Laplace’s approach, and demonstrating its + unsuitability for physics. (c) Perfectly known entangled states have objective symmetries that allow one to rigorously deduce + probabilities. When two systems are maximally entangled as above, probabilities of Schmidt partners are equal, p =p , and + 0 0♥ ♦ + p =p . After a swap u = |♠ih♥|+|♥ih♠| in S, the resulting state |♠i|♦i+|♥i|♣i must have p = p , and p = p . (We + ♠ ♣ S ♠ ♦ ♥ ♣ + ‘primed’ probabilities in S, as it was acted upon by a swap, so they might have changed.) A counterswap u = |♦ih♣|+|♣ih♦| + 0 0 E + in E restores the original entangled state, proving that p =p andp =p ,afterall(ascounterswap u leaves S untouched). + ♥ ♥ ♠1 ♠ E + This sequence of equalities implies p =p =p ,sothat p =p = ,as probabilities in S must add up to 1. + ♠ ♦ ♥ ♠ ♥ 2 + Certainty seems trivial but is important. Confirmation Figure 4 illustrates how this classical intuition yields – + that a state “is what it is” – postulate (iii) – is a part of far more convincingly — quantum probabilities. + standard quantum lore [22]. We re-affirmed it, but with Symmetry is probed by invariance. Transformations + a key insight: Redundancy allows observers to discover that respect it take system between states that exhibit + (and not just confirm) that S is in a certain pointer state. no measurable differences. For example, change of phase + in the coefficients in the Schmidt decomposition |ΨSEi = + We now turn to the opposite case of complete inde- P + nψ |s i|ε i cannot influence the state of S: It is in- + terminacy. Its connection with symmetry was noted by k k k k + duced by u = eiφk|s ihs |, local unitary on S, that can + Laplace. He wrote: “The theory of chance consists in re- S k k + be “undone” by u = e−iφk|ε ihε | on E, or; + ducing all the events ... to a certain number of cases that E k k + are equally possible... The ratio of this number to that of + all the cases possible is the measure of probability” [31]. u ⊗1 |Ψ i=|Φ i; 1 ⊗u |Φ i=|Ψ i (7) + S E SE SE S E SE SE + 8 + So, phases of ψk cannot matter for a local state or influ- of S. However, this is done by a unitary “countertrans- + ence probabilities in S. This symmetry, Eq. (7), is the formation” acting solely on E. Hence, by fact (1), state + entanglement-assisted invariance or envariance [32, 33]. of S must have been unaffected by u in the first place. + S + Such loss of phase significance for S entangled with E So, by fact (2), phases of ψ cannot change outcomes of + k + implies decoherence [33]. We arrived at its essence using any measurement on S. Equiprobability follows.2 + envariance, without reduced density matrices, trace, etc. One can now derive Born’s rule, p = |ψ | , with + k k + We now use phase envariance to show that equal ab- straightforward algebra from the above two simple cases + of complete certainty (p = 1) and equiprobability (p = + solute values of the coefficients ψ imply equal prob- k k + k 1): The general case can be always reduced to the case + abilities. For equal |ψk| any orthogonal basis of S n + is “Schmidt” (i.e., has an orthogonal partner in E). case of equal coefficients by “finegraining” (see Box). + |0i |0i +|1i |1i |+i |+i +|−i |−i The origin of probability is a fascinating problem that + S E S E S E S E + Thus, |ϕ¯ i = √ = √ , + SE 2 2 is older than quantum measurement problem, and is for- + |0i±|1i iπ + where |±i = √2 . Sign change induced by e |−ih−| gotten primarily because it is so old. We have seen how + |+i |+i −|−i |−i quantum physics sheds a new, very fundamental, light + S E S E + acting on S produces |η¯ i = √ = + SE 2 on probability. We cannot do justice to the history of + |1i |0i +|0i |1i + S E S E + √2 . In other words, one can swap |0iS with this subject here, but Ref. [34] provides a basic overview + |1i by rotating phase in a |±i basis by π. Yet, we just and exhaustive set of references. In particular, envariant + S + saw that phases of Schmidt coefficients do not matter for derivation is very different from the classic proof of Glea- + the state of S, so probabilities of 0 and 1 in S must have son [35] in that it sheds light on the physical significance + remained the same. Moreover, probabilities of paired up of the resulting measure. Moreover, it does not assume + Schmidt states are equal, so that p (0) = p (0) in |ϕ¯ i probabilities are additive (except to posit that probabil- + S E SE + and p (1) = p (0) in |η¯ i. Hence, p (0) = p (1) = 1, ity of an event and its complement are certain, i.e., to + S E SE S S 2 + where we assumed that probabilities add up to 1. establish normalization; see Box and Ref. [33, 38]). By- + In contrast to Laplace’s subjective “ignorance-based” passing additivity of probabilities is essential when deal- + approach, we obtained objective probabilities for a com- ing with a theory with another principle of additivity + pletely known entangled state. Phase envariance implied – the quantum superposition principle – which trumps + equiprobability in S. To paraphrase Beatles, “All you additivity of probabilities or at least classical intuitiions + need is phase...”. We rotated phases of the coefficients to about it (e.g., in the double-slit experiment). Discus- + induce a swap in a complementary basis. Another proof sion of the implications of envariance has already started, + (that implements swap more directly) is given in Fig. 4. with [36, 37], and [5] providing insightful commentary. + This equiprobability case is the difficult part of the BOX + proof. Instead of subjectivity (that undermined appli- We show here how “finegraining” reduces the case of + cability of Laplace’s approach to physics) we relied on arbitrary ψ to equiprobability. To illustrate general + objective symmetries of entangled quantum states. This k + was made possible by the nature of quantum states of strategy consider state in a 2D Hilbert space HS of S + spanned by orthonormal {|0i,|2i} and (at least) 3D HE: + composite systems. Classically, pure states have struc- |ψ i ∝ q2 |0i |+i + q1 |2i |2i . + ture of a Cartesian product – knowing the whole implies SE 3 S E 3 S E + knowledge of each subsystem. In quantum theory they |0i +|1i + E E + The state |+iE = √ exists in (at least 2D) sub- + are tensor products – one can know state of the whole, 2 + space of E orthogonal to |2i , i.e., h0|1i = h0|2i = h1|2i = + and thus know nothing about parts, as envariance shows. E + This was the basis of our proof of equiprobability. We h+|2i = 0. We know we can ignore phases. + To reduce |ψ i to equal coefficients case we “extend + assumed unitarity. Moreover, we assumed; (1) When a SE + ¯ + it” to a state |Ψ i by letting E act on an ancilla C. + system is not acted upon by a unitary transformation, its SEC + state remains unaffected. This state is a property of (S is not acted upon, so, by fact (1), probabilities for S + S alone, so; (2) Predictions regarding measurement out- cannot change.) This can be done by a generalization of + comes on S (including their probabilities) can be inferred controlled-not acting between E (control) and C (target), + from the state of S. Last not least; (3) When S is entan- so that (in obvious notation) |ki|00i ⇒ |ki|k0i, leading to + gled with other systems (e.g., the environment) the state √ √ 0 0 + 0 0 |0i|0 i+|1i|1 i 0 + of S alone is determined by the state of the whole SE. 2|0i|+i|0 i+|2i|2i|0 i ⇒ 2|0i √ +|2i|2i|2 i. + These “facts of life” are accepted properties of systems 2 + and states, but given the fundamental nature of our dis- Above, and from now on we skip subscripts: The state of + cussion it seems a good idea to make them explicit [33]. S will be listed first, and the state of C will be primed. + For instance, to establish independence from phases of Thecancellation of √2 yields an equal coefficient state: + the coefficients ψ we noted that the state of S is un- + k + affected by the unitaries u diagonal in Schmidt basis ¯ 0 0 0 + S |ΨSCEi ∝ |0,0 i|0i + |0,1 i|1i + |2,2 i|2i . + acting on S (like changes of Schmidt coefficient phases) + that would normally affect isolated S: The global state We have combined S and C in a single ket and (below) + Ψ isrestored by u . Thus, by fact (3), so is local state we shall swap states of SC as if it was a single system. + SE E + 9 + Clearly, this is a Schmidt decomposition of (SC)E. “single idea” category. Several ideas, applied in the right + Three orthonormal product states have coefficients with order, led to advances described here. Logically, we may + the same absolute value. Therefore, they can be en- well have started with the derivation of Eq. (5) and the + variantly swapped. Thus, the probabilities of states analysis of quantum jumps. Their randomness leads to + 0 0 0 + |0i|0 i, |0i|1 i, and |2i|2 i are all equal. By normalization probabilities. And symmetries of entangled states (that + they are 1. So, probability of detecting state |2i of S is arise in decoherence and Quantum Darwinism) allow one + 3 + 1. Moreover, |0i and |2i are the only two outcome states to derive Born’s rule. As we have seen, phase envariance + 3 + for S. It follows that probability of |0i must be 2; is (nearly) “all you need”. With probabilities at hand + 3 one has then every right to use reduced density matrices + p = 2; p = 1 . + 0 3 2 3 to analyze Quantum Darwinism and decoherence. + ThisisBorn’srule. Wehavejustseenwhytheamplitudes + in the initial |ψSEi “get squared” to yield probabilities. Ourpresentation was “historical”. We started with de- + Notethatwehaveavoidedassumingadditivityofprob- coherence, and used it to introduce Quantum Darwinism. + abilities: p = 2 not because it is a sum of two fine- Analysis of copying essential to information flows in both + 0 3 + grained alternatives for SE, each with probability of 1, of these phenomena led to quantum jumps. This in turn + 3 motivated entangelment-based derivation of Born’s rule. + but rather because there are only two (mutually exclu- QuantumDarwinism – upgrade of E to a communication + sive and exhaustive) alternatives for S; |0i and |2i, and + p = 1. Therefore, by normalization, p = 1− 1. Prob- channel from a mundane role it played in decoherence – + 2 3 0 3 tied together all of the other developments. This order + abilities of Schmidt states can be added because of the had the advantage of making motivations clear, but it is + loss of phase coherence that follows directly from phase different from more logical presentation where postulates + envariance established earlier (see also Ref. [32, 33]). (i)-(iii) are the starting point (strategy followed in [38]). + Extension of this proof to the case where proba- + bilities are commensurate is conceptually straightfor- Thecollection of ideas discussed here allows one to un- + ward but notationally cumbersome. The case of non- derstand how “the classical” emerges from the quantum + commensurate probabilities is settled with an appeal to substrate staring from more basic assumptions than de- + continuity. Frequency of the outcomes can be also de- coherence. We have bypassed a related question of why is + duced, allowing one to establish connection with the fa- our Universe quantum to the core. The nature of quan- + miliar relative frequency approach to probabilities [32, tum state vectors is a part of this larger mystery. Our + 33, 38], but in a quantum setting probability arises as a focus was not on what quantum states are, but on what + consequence of symmetries of a single entangled state. they do. Our results encourage a view one might describe + Weendbynotingthatthefinegrainingdiscussedabove (with apologies to Bohr) as “complementary”. Thus, |ψi + does not need to be carried out experimentally each time is in part information (as, indeed, Bohr thought), but + probabilities are discussed: Rather, it is a way to de- also the obvious quantum object to explain “existence”. + duce a measure that is consistent with the geometry of Wehave seen how Quantum Darwinism accounts for the + the Hilbert spaces using entanglement as a tool. Still, transition from quantum fragility (of information) to the + given fundamental implications of envariance experimen- effectively classical robustness. One can think of this + tal tests would be most useful. transition as “It from bit” of John Wheeler [39]. + In the end one might ask: “How Darwinian is Quan- + tumDarwinism?”. Clearly, there is survival of the fittest, + V. DISCUSSION and fitness is defined as in natural selection – through + the ability to procreate. The no-cloning theorem implies + We derived the two controversial quantum postulates competition for resources – space in E – so that only + from the first three. We have thus seen how classical do- pointer states can multiply (at the expense of their com- + mainoftheUniversearisesfromthesuperpositionprinci- plementary competition). There is also another aspect + ple (postulate (i)) and unitarity (postulate (ii)) as well as of this competition: Huge memory available in the Uni- + rudimentary assumptions about information flows (pos- verse as a whole is nevertheless limited. So the question + tulate (iii)), and a few basic facts about states of com- arises: What systems get to be “of interest”, and imprint + posite quantum systems (including their tensor nature, their state on their obliging environments, and what are + often cited as additional “axiom (0)”). the environments? Moreover, as the Universe has a finite + The essence of the measurement problem – accounting memory, old events will be eventually “overwritten” by + for axioms (iv) and (v) – has been largely settled. It is of new ones, so that some of the past will gradually cease + course likely one may be able to clarify assumptions and to be reflected in the present record. And if there is no + simplify proofs. Much work remains to be done on Quan- record of an event, has it really happened? These ques- + tum Darwinism and envariance. Nevertheless, nature of tions seem far more interesting than deciding closeness + the quantum-classical correspondence has been clarified. of the analogy with natural selection [40]. They suggest + Physicists take it for granted that even hard problems one more question: Is Quantum Darwinism (a process of + are solved by a single good idea. Therefore, when a single multiplication of information about certain favored states + idea does not do the whole job, often our first instinct is to that seems to be a “fact of quantum life”) in some way + dismiss it. Measurement problem does not fall into this behind the familiar natural selection? I cannot answer + 10 + this question, but neither can I resist raising it. + [1] Bohr, N. The quantum Postulate and the recent devel- [22] Dirac, P. A. M., Quantum Mechanics (Clarendon Press, + opment of atomic theory Nature 121, 580-590 (1928). Oxford, 1958). + [2] Schr¨odinger, E. Die gegenw¨artige Situation in der [23] Zurek, W. H., Quantumoriginofquantumjumps: Break- + Quantenmechanik. Naturwissenschaften 807-812; 823- ing of unitary symmetry induced by information transfer + 828; 844-849 (1935). and the transition from quantum to classical. Phys. Rev. + [3] Joos, E., Zeh, H. D., Kiefer, C., Giulini, D., Kupsch, A76,052110 (2007). + J., and Stamatescu, I.-O., Decoherence and the Appear- [24] Ollivier, H., Poulin, D., and Zurek, W. H., Environment + ancs of a Classical World in Quantum Theory, (Springer, as a Witness: Selective Proliferation of Information and + Berlin, 2003). Emergence of Objectivity in a Quantum Universe Phys. + [4] Zurek, W. H. Decoherence, einselection, and the quan- Rev. A72, 423113 (2005). + tum origins of the classical Rev. Mod. Phys. 75, 715-775 [25] Nielsen, M. A., and I. L. Chuang, Quantum Computation + (2003). and QuantumInformation, (CambridgeUniversityPress, + [5] Schlosshauer, M. Decoherence and the Quantum - to - 2000). + Classical Transition (Springer, Berlin, 2007). [26] Everett III, H., Relative state formulation of quantum + [6] Zurek, W. H. Pointer basis of a quantum apparatus: Into theory. Rev. Mod. Phys. 29, 454-462 (1957). + what mixture does the wavepacket collapse? Phys. Rev. [27] Everett III, H., 1957b, Ph. D. Dissertation, Princeton + D24, 1516-1525 (1981). University. + [7] Zurek, W. H. Environment-induced superselection rules. [28] DeWitt, B. S., and Graham, N., eds., The Many - Worlds + Phys. Rev. D26, 1862-1880 (1982). Interpretation of Quantum Mechanics (Princeton Univer- + [8] Paz, J.-P., and Zurek, W. H., Environment-induced deco- sity Press, Princeton, 1973). + herence and the transition from quantum to classical. pp. [29] Landau. L., Das D¨ampfungsproblem in der Wellen- + 533-614 in Coherent Atomic Matter Waves, Les Houches mechanik. Zeits. Phys. 45, 430-441 (1927). + Lectures, R. Kaiser, C. Westbrook, and F. David, eds. [30] von Neumann, J. 1932, Mathematical Foundations of + (Springer, Berlin, 2001). Quantum Theory, translated from German original by R. + [9] Zurek, W. H., Habib, S., and Paz, J.-P., Coherent states T. Beyer (Princeton University Press, Princeton, 1955). + via decoherence Phys. Rev. Lett. 70, 1187-1190 (1993). [31] Laplace, P. S,. 1820, A Philosophical Essay on Probabil- + [10] Tegmark, M., and Shapiro, H. S., Decoherence produces ities, English translation of the French original by F. W. + coherent states: An explicit proof for harmonic chains. Truscott and F. L. Emory (Dover, New York, 1951). + Phys. Rev. E50, 2538-2547 (1994). [32] Zurek, W. H., Environment-assisted invariance, causal- + [11] Gallis, M. R., The emergence of classicality via decoher- ity, and probabilities in quantum physics. Phys. Rev. + ence described by Lindblad operators. Phys. Rev. A53, Lett. 90, 120404 (2003). + 655 (1996). [33] Zurek, W. H., Probabilities from entanglement, Born’s + [12] Ollivier, H., Poulin, D, and Zurek, W. H., Objective rule from envariance. Phys. Rev. A71, 052105 (2005). + properties from subjective quantum states: Environment [34] Auletta, G., Foundations and Interpretation of Quantum + as a witness. Phys. Rev. Lett. 93, 220401 (2004). Theory (World Scientific, Singapore, 2000). + [13] Blume-Kohout, R., and Zurek, W. H., A simple example [35] Gleason, A. M., Measures on closed subspaces of Hilbert + of “Quantum Darwinism”: Redundant information stor- space, J. Math. Mech. 6, 855-893 (1957). + age in many-spin environments Found. Phys. 35, 1857 [36] Schlosshauer, M, and Fine, A., On Zurek’s derivation of + (2005). the Born rule. Found. Phys. 35(2), 197-213 (2005) + [14] Blume-Kohout, R., and Zurek, W. H., Quantum Darwin- [37] Barnum, H., No-signalling-based version of Zurek’s + ism: Entanglement, branches, and the emergent classi- derivation of quantum probabilities: A note on + cality of redundantly stored quantum information. Phys. “Environment-assisted invariance, entanglement, + Rev. A73, 062310 (2006). and probabilities in quantum physics”, arXiv:quant- + [15] Blume-Kohout, R., and Zurek, W. H., Quantum Darwin- ph/0312150 (2003). + ism in quantum Brownian motion. Phys. Rev. Lett., 101, [38] Zurek, W. H., Relative States and the Environment: Ein- + 240405 (2008). selection, Envariance, Quantum Darwinism, and the Ex- + [16] J. P. Paz and A. Roncaglia, in preparation. istential Interpretation, arXiv:0707.2832 (2007). + [17] Zurek, W. H., Einselection and decoherence from an in- [39] Wheeler, J. A., It from Bit. p. 3 in Complexity, Entropy, + formation theory perspective. Ann. Physik (Leipzig), 9, and the Physics of Information, Zurek, W. H., ed. (Ad- + 822 (2000). dison Wesley, Redwood City, 1990). + [18] Born, M., Zur Quantenmechanik der Stossvorg¨ange [40] Darwin, C., The Origin of the Species. (1859). + Zeits. Phys. 37, 863-867 (1926). Acknowledgments: I am grateful to Robin Blume- + [19] M. Zwolak, H. T. Quan, and W. H. Zurek, in preparation. Kohout, Fernando Cucchietti, Juan Pablo Paz, David + [20] Wootters, W. K., and Zurek, W. H., A single quantum Poulin, Hai-Tao Quan, Michael Zwolak for stimulating + cannot be cloned. Nature 299, 802-803 (1982). discussions. This research was supported by an LDRD + [21] Dieks, D., Communication by EPR devices. Phys. Lett. + 92A, 271 (1982). grant at Los Alamos and, in part, by FQXi. diff --git a/archive/zurek/Quantum_Darwinism_Zurek_2009.pdf b/archive/zurek/Quantum_Darwinism_Zurek_2009.pdf new file mode 100644 index 00000000..e9a53453 --- /dev/null +++ b/archive/zurek/Quantum_Darwinism_Zurek_2009.pdf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1c887b195b8b2e23ccc436e153fc4e7506c75aa7f8ae997ca2da4e6489b9fd17 +size 843308