Files

12 KiB

6. An Empirical and Formal Research Program

A philosophical reconstruction earns its place in an academic monograph only if it clarifies what could be tested, falsified, or revised. The revised Volume 2 thesis makes several distinct claims: candidate agents exhibit scale-relative Markov blankets; some candidates actively maintain those boundaries; some contain robustly integrated cause-effect structures; and some of those structures are associated with phenomenal subjecthood. Each claim requires different evidence.

The first research task is blanket discovery. Given multivariate neural and bodily time series, investigators can estimate conditional dependencies across candidate partitions. Precision matrices, conditional mutual information, dynamic Bayesian networks, and state-space models offer complementary methods. The objective is not to find a perfect zero, which is unlikely in biological data, but to identify partitions with unusually low internal-external dependence conditional on blanket states.

Blanket discovery must be multiscale. Variables should be coarse-grained at cellular, microcircuit, regional, whole-brain, organismic, and potentially interpersonal levels. Temporal windows should range from milliseconds to developmental timescales where feasible. The key empirical object is a blanket landscape (\mathcal{B}_{\ell,\tau}), not a single boundary. Stable minima in this landscape identify candidate units.

The second task is causal validation. Observational conditional independence does not determine causal structure. Perturbations should target proposed sensory, active, internal, and external states. If the partition is correct, interventions should propagate according to the proposed mediation structure. Direct internal-external effects that bypass blanket states would weaken the model. Conversely, selective effects through sensory and active channels would support it.

Neuroscience already supplies relevant tools: optogenetics, transcranial magnetic stimulation, electrical stimulation, lesions, pharmacological perturbation, and closed-loop experiments. The challenge is to integrate these methods with explicit blanket models. A successful study would preregister a partition, predict intervention responses, and compare them with alternatives.

The third task is autonomy measurement. Researchers should perturb environmental conditions and quantify whether internal-active dynamics restore or preserve a viability-relevant boundary organization. In neural systems, viability may be operationalized through stable functional regimes, task performance, or contribution to organismic regulation. In cells or artificial agents, energy balance and structural maintenance may be more direct.

Autonomy claims should fail when boundary restoration is entirely attributable to external control. For example, if a neural preparation exhibits a blanket only because laboratory feedback holds it in a narrow regime, its autonomy is limited. Likewise, an artificial system repeatedly reset by an operator should not receive the same autonomy score as one that detects and repairs its own failures.

The fourth task is integrated causal analysis. Rather than inferring (\Phi>0) from recurrence, researchers should construct interventionally grounded transition models and evaluate partition effects. Exact IIT calculations are computationally difficult for large systems, so approximations will be necessary. The approximations should be validated on tractable subsystems and reported with sensitivity analyses across grains and timescales.

A crucial prediction of the synthesis is partial convergence: robust blanket interiors should often correspond to strongly integrated complexes, but not always. Cases of convergence support the idea that autonomous boundaries enclose intrinsic causal units. Cases of divergence reveal where the theory needs refinement. The divergence itself is scientifically valuable.

The fifth task concerns consciousness. No formal measure can be validated without independent evidence. In humans, reports, metacognitive performance, and behavioral responsiveness provide imperfect but important indicators. In non-reporting subjects, perturbational complexity, neural signatures, and comparative neurobiology offer indirect evidence. The theory should predict changes across wakefulness, sleep, anesthesia, seizures, and disorders of consciousness.

A strong test would compare four measures across conditions: blanket robustness, autonomy, causal integration, and consciousness indicators. If all four track together, the unified framework gains support. If blanket robustness remains stable while consciousness disappears under anesthesia, then blanket existence is not sufficient. If integration falls while organismic autonomy persists, the distinction between agent and subject is confirmed. Such dissociations are not failures of the layered framework; they are its central predictions.

The framework also generates negative tests. The claim that recurrent cortical microcircuits guarantee positive intrinsic information would be weakened if interventionally grounded partitions reveal effective decomposability. The claim that a cortical column is a minimal viable agent would be weakened if its proposed boundary is unstable across realistic coupling conditions or if its maintenance is wholly organism-dependent. The claim that blankets identify subjects would be weakened if many nonconscious systems exhibit equally robust blankets and integration.

Artificial systems provide a controlled domain. Researchers can construct agents with known architectures, manipulate feedback and recurrence, and measure boundary maintenance directly. Systems can be designed to vary independently in blanket strength, autonomy, and integration. For example, one system might have a strong input-output boundary but feed-forward internal processing; another might have recurrent integration but no self-maintenance; a third might maintain its hardware and policies under perturbation. Comparing these systems clarifies which properties support adaptive agency.

Collective systems test scale. Social groups, colonies, and organizations can exhibit conditional boundaries and self-maintaining dynamics. The framework should not automatically classify them as subjects. It should ask whether they possess integrated cause-effect structures and temporally coherent memory at the collective level. The answer may vary by organization and timescale. This is a feature of the theory's scale sensitivity.

Formal work is needed on approximate blankets. Exact conditional independence is brittle, while arbitrary tolerance risks blanket inflation. Thresholds should be justified through predictive performance, intervention stability, and model comparison. Bayesian model selection may quantify whether a blanket partition compresses the data better than alternatives. Information bottleneck methods may identify boundaries that preserve behaviorally relevant information while screening off irrelevant environmental detail.

Formal work is also needed on nested blankets. If cells, circuits, brains, and organisms each exhibit boundaries, their relations should be modeled explicitly. Higher-level boundaries may constrain lower-level dynamics, while lower-level failures can disrupt higher-level autonomy. Multilevel causal models can test whether higher-level variables have explanatory and intervention value beyond aggregated microstates.

The relation between free-energy formulations and causal claims requires special care. Variational free energy is a mathematical functional used in inference; physical free energy is a thermodynamic quantity; expected free energy is used in policy selection. Conflating them produces rhetorical unity at the cost of precision. A rigorous research program should state which functional is used, what variables it describes, and what empirical predictions follow.

Likewise, integrated information calculations must distinguish theoretical definitions from practical proxies. Neural complexity, perturbational complexity, recurrence, and synchronization are not interchangeable with (\Phi). Proxies may be useful, but their relationship to the target construct must be validated.

The layered framework provides a hierarchy of evidence. Level one evidence establishes an approximate statistical boundary. Level two shows counterfactual maintenance. Level three demonstrates robust causal integration. Level four links the organization to credible indicators of phenomenal subjecthood. Claims should be calibrated to the highest level actually supported.

This hierarchy disciplines the Intellecton concept without emptying it. It turns a metaphysical proposal into a sequence of tractable questions. It also allows the theory to succeed partially. Markov blankets may prove highly valuable for identifying autonomous organization even if IIT's identity claim is rejected. Integrated causal analysis may illuminate neural unity even if no unique minimal intellecton exists. A serious framework should permit such differentiated outcomes.

The empirical program therefore replaces proclamation with risk. Volume 2 becomes stronger when it states what observations would force revision. The most important risk is that its proposed properties fail to converge: statistical boundaries, autonomous units, integrated complexes, and subjects may occupy different scales. If so, the unified intellecton would need to become a relational architecture among distinct processes rather than a single unit. That possibility should be investigated, not excluded by definition.

Reproducibility and Governance

Reproducibility requires shared benchmarks containing simultaneous neural, bodily, behavioral, and environmental measurements under controlled perturbations. Competing blanket partitions and integration methods should be evaluated on common tasks. Synthetic systems with known causal graphs should accompany biological datasets so inference methods can be tested where ground truth exists.

Discovery must be separated from confirmation. Exploratory methods may identify candidate partitions and timescales. Confirmatory studies should freeze those choices before testing new conditions. Without this separation, flexible coarse-graining can make almost any system appear to satisfy the theory. Transparent reporting of failed partitions and negative results is essential.

Conceptual interoperability is equally important. Active inference, IIT, causal emergence, and enactivism use terms such as "intrinsic," "information," and "boundary" differently. Formal definitions and operational procedures should accompany empirical claims. Ethical caution is also warranted because candidate-subject measures may affect patients, animals, and artificial systems. False negatives and false positives both carry costs. The layered hierarchy prevents a single noisy metric from deciding status and turns Volume 2 into a demanding but feasible research program.

The program should culminate in adversarial collaborations. Proponents of blanket-based agency, IIT, enactivism, and skeptical alternatives should agree in advance on discriminating experiments and interpretation rules. This is particularly important because each framework can often redescribe unexpected findings after the fact. An adversarial design forces theories to risk distinct predictions.

One useful benchmark would manipulate recurrence and closed-loop autonomy independently in artificial neural agents. Another would compare proposed cortical boundaries during active behavior and passive replay. A third would examine whether the maximally integrated complex shifts with the most stable autonomous blanket across anesthesia and recovery. None of these experiments alone decides subjecthood. Together they test whether the convergence assumed by Volume 2 is a real feature of organized systems or an artifact of combining vocabularies.