AGI & CoCoSci
The reciprocation of Artificial General Intelligence (AGI) and Computational Cognitive Sciences (CoCoSci).
Contents
- Theory
- Non-Verbal Communication
- Pragmatics
- Coordination
- Problem Solving
- Reinforcement Learning
- Neural-Symbolic AI
- Explainable Deep Learning
- Methodologies for Experiments
- Meta Learning
- Marr's Levels of Analysis
- Gestalt
- The Aha! Moment
- Rationality
- Cognitive Architecture
- Theory of Mind
- Intuitive Physics
- Knowledge Representation
- Language Compositionality
- Scaling Up Behavioral Studies
- Causality
- AI Commonsense Reasoning
Generative Model
AI Concept Representation
This work presents ImageBind, an approach to learn a joint embedding across six different modalities - images, text, audio, depth, thermal, and IMU data. The authors show that all combinations of paired data are not necessary to train such a joint embedding, and only image-paired data is sufficient to bind the modalities together. ImageBind can leverage recent large scale vision-language models, and extends their zero-shot capabilities to new modalities just by using their natural pairing with images. It enables novel emergent applications 'out-of-the-box' including cross-modal retrieval, composing modalities with arithmetic, cross-modal detection and generation. The emergent capabilities improve with the strength of the image encoder and this work sets a new state-of-the-art on emergent zero-shot recognition tasks across modalities, outperforming specialist supervised models. Finally, the authors show strong few-shot recognition results outperforming prior work, and that ImageBind serves as a new way to evaluate vision models for visual and non-visual tasks.
Odorous compounds with similar POM representations are more likely to co-occur within a substance and be metabolically closely related; metabolic reaction sequences also follow smooth paths in POM despite large jumps in molecular structure.
Humans perceive the world using multi-modal sensory inputs such as vision, audition, and touch. This work investigates the cross-modal connection between vision and touch. The main challenge in this cross-domain modeling task lies in the significant scale discrepancy between the two: while our eyes perceive an entire visual scene at once, humans can only feel a small region of an object at any given moment. To connect vision and touch, this work introduces new tasks of synthesizing plausible tactile signals from visual inputs as well as imagining how we interact with objects given tactile data as input. To accomplish the goals, the authors first equip robots with both visual and tactile sensors and collect a large-scale dataset of corresponding vision and tactile image sequences. To close the scale gap, the authors present a new conditional adversarial model that incorporates the scale and location information of the touch. Human perceptual studies demonstrate that the model can produce realistic visual images from tactile data and vice versa.
Papers
Non-Verbal Communication
ACM SIGGRAPH'20, 2020. [All Versions]. [Project]. Rationality in feature sketching.
ACM SIGGRAPH'20, 2020. [All Versions]. [Project]. Rationality in feature sketching.
Pragmatics
Urban Experience and Design: Contemporary Perspectives on Improving the Public Realm, 2020. [All Versions]. [OSMnx Tool]. [OpenStreetMap Website].
Humanities and Social Sciences Communications, 2020. [All Versions]. [Code & Data]. [Dialogue Experimental Toolkit(DiET)]. The present study sets out to experimentally investigate how environmental factors come to shape the emergence of linguistic conventions. To this end, the authors adapt the classical Maze Game task to test the hypothesis that participants routinise different linguistic strategies to communicate positions in the maze contingent on particular environmental affordances (i.e. structure of the mazes). The results confirm that subtle environmental motivations drive the emergence of different communicative conventions in an otherwise identical task, suggesting that linguistic adaptations are highly sensitive to factors of the shared task environment.
ICML'23 Workshop on Theory-of-Mind, 2023. [All Versions]. [Project].
Coordination
AAMAS'26, 2026. [All Versions]. Benchmarks are crucial for assessing multi-agent reinforcement learning (MARL) algorithms. While StarCraft II-related environments have driven significant advances in MARL, existing benchmarks like SMAC focus primarily on micromanagement, limiting comprehensive evaluation of high-level strategic intelligence. To address this, this work introduces HLSMAC, a new cooperative MARL benchmark with 12 carefully designed StarCraft II scenarios based on classical stratagems from the Thirty-Six Stratagems. Each scenario corresponds to a specific stratagem and is designed to challenge agents with diverse strategic elements, including tactical maneuvering, timing coordination, and deception, thereby opening up avenues for evaluating high-level strategic decision-making capabilities. The authors also propose novel metrics across multiple dimensions beyond conventional win rate, such as ability utilization and advancement efficiency, to assess agents' overall performance within the HLSMAC environment. The authors conduct a large-scale evaluation of 21 state-of-the-art MARL algorithms and LLM-based agents, with additional multi-seed analysis for relatively better-performing methods. The results demonstrate that HLSMAC serves as a robust testbed for advancing multi-agent strategic decision-making.
AAMAS'26, 2026. [All Versions]. Benchmarks are crucial for assessing multi-agent reinforcement learning (MARL) algorithms. While StarCraft II-related environments have driven significant advances in MARL, existing benchmarks like SMAC focus primarily on micromanagement, limiting comprehensive evaluation of high-level strategic intelligence. To address this, this work introduces HLSMAC, a new cooperative MARL benchmark with 12 carefully designed StarCraft II scenarios based on classical stratagems from the Thirty-Six Stratagems. Each scenario corresponds to a specific stratagem and is designed to challenge agents with diverse strategic elements, including tactical maneuvering, timing coordination, and deception, thereby opening up avenues for evaluating high-level strategic decision-making capabilities. The authors also propose novel metrics across multiple dimensions beyond conventional win rate, such as ability utilization and advancement efficiency, to assess agents' overall performance within the HLSMAC environment. The authors conduct a large-scale evaluation of 21 state-of-the-art MARL algorithms and LLM-based agents, with additional multi-seed analysis for relatively better-performing methods. The results demonstrate that HLSMAC serves as a robust testbed for advancing multi-agent strategic decision-making.
Problem Solving
Reinforcement Learning
Neural-Symbolic AI
The original paper on Abductive Learning, a derivative-free approach for neuro-symbolic learning.
[Project].
[Project].
Explainable Deep Learning
2021. Class Activation Map methods implemented in Pytorch, with many elegant features.
CVPR'17, 2017. [All Versions]. [Project]. [Dataset: Places365]. The original paper on visualizing the class activation maps to explain convolutional neural networks.
NeurIPS'20, 2020. [All Versions]. [Project]. A concept-composition version of network dissection.
ICLR'21, 2021. [All Versions]. [Code & Data]. [Project]. A perspective on image background provides strong clue for foreground classification.
Methodologies for Experiments
Meta Learning
A survey of meta-learning in the context of reinforcement learning.
Chelsea Finn's original paper on Model-Agnostic Meta-Learning (MAML).
A Bayesian account on MAML.
The milestone paper on context Meta-RL.
Marr's Levels of Analysis
A Marr's paradigm account on computational social science.
A Marr's paradigm account on machine learning.
The Aha! Moment
A computational account on insights for scientific discovery.
Rationality
The original paper that relates cognitive resource limitation with Bayesian rational analysis, in the case of categorization behavior.
The original paper on ``switch cost'', where subjects' responses are substantially slower and, usually, more error-prone immediately after a task switch.
Introducing the computational rationality framework for including information-processing bounds in rational analyses, which emphasizes the incorporation of computational mechanism into the definition of rational action.
A comprehensive review on the rationality of Bayesian computational models.
A resource-rational account on interpreting human intelligence.
An earlier version of the paper above.
Recent progress in artificial intelligence provides the opportunity to ask the question of what is unique about human intelligence, but with a new comparison class. The author argues that we can understand human intelligence, and the ways in which it may differ from artificial intelligence, by considering the characteristics of the kind of computational problems that human minds have to solve. The author claims that these problems acquire their structure from three fundamental limitations that apply to human beings: limited time, limited computation, and limited communication. From these limitations we can derive many of the properties we associate with human intelligence, such as rapid learning, the ability to break down problems into parts, and the capacity for cumulative cultural evolution.
Evidences support that people have some of the foundations for 'intuitive power analyses', which help people use intuitive statistical reasoning and metacognitive strategies to estimate how much information they might need to solve different discrimination problems.
The review focuses on how cognitive science can provide forward models of human decision-making and inverse models of how humans think about others’ decision-making. The authors highlight relevant recent developments, including approaches that synthesize black box and theory-driven modeling, accounts that recast heuristics and biases as forms of bounded optimality, and models that characterize human theory of mind and communication in decision-theoretic terms.
Cognitive Architecture
A neuroscience account on brain as a generative model.
The original paper introducing the adaptation perspective of human intelligence, the theoretical basis of the ACT cognitive architecture.
Theory of Mind
This paper presents a formal theory of the Naïve Utility Calculus as a probabilistic generative model, which highlights the role of cost and reward tradeoffs in a Bayesian framework for action-understanding. The model predicts with quantitative accuracy how people infer agents’ subjective costs and rewards based on their observable actions. By distinguishing between desires, goals, and intentions, the model extends to complex action scenarios unfolding over space and time in scenes with multiple objects and multiple action episodes.
[Project].
Intuitive Physics
Knowledge Representation
Language Compositionality
Scaling Up Behavioral Studies
Causality
A computatinoal account on causal transfer.
A piece of evidence for the capability of causal reasoning in intelligent animals.
AI Commonsense Reasoning
AAAI'20, 2020. [All Versions].
ECCV'20, 2020. [All Versions].
ECCV'22, 2022. [All Versions]. [Preprint]. This paper presents Sherlock, an annotated corpus of 103K images for testing machine capacity for abductive reasoning beyond literal image contents. The corpus construction process adopts a free-viewing paradigm: participants first observe and identify salient clues within images (e.g., objects, actions) and then provide a plausible inference about the scene, given the clue.
Sumit Gulwani's comprehensive review on program synthesis.
The original paper on second-order metarules.
Stephen Muggleton's original paper on Meta-Interpretive Learning (MIL).
A 30-year comprehensive review on Inductive Logic Programming.
This paper explores the limits of the current generation of large language models for program synthesis in general purpose programming languages.
Rational meaning construction, a computational framework for language-informed thinking that combines neural language models with probabilistic models for rational inference. Linguistic meaning is framed as a context-sensitive mapping from natural language into a probabilistic language of thought (PLoT)--a general-purpose symbolic substrate for generative world modeling.
A paper presenting a comprehensive survey of 27 existing large language models for NL2Code, and also review benchmarks and metrics, suggesting that the key factors contributing to the success of large language models for NL2Code are “Large Size, Premium Data, Expert Tuning”.
A systematic literature review on LLM4SE, with a particular focus on understanding how LLMs can be exploited to optimize processes and outcomes.
DSL Program Synthesis
This paper shows that deep learning methods can be leveraged to train a model end-to-end to automatically reverse engineer user interfaces and generate code from a single input image with over 77% of accuracy for three different platforms (i.e. iOS, Android and web-based technologies).
Recent development in Artificial Intelligence (AI) models has propelled their application in scientific discovery, but the validation and exploration of these discoveries require subsequent empirical experimentation. The concept of self-driving laboratories promises to automate and thus boost the experimental process following AI-driven discoveries. However, the transition of experimental protocols, originally crafted for human comprehension, into formats interpretable by machines presents significant challenges, which, within the context of specific expert domain, encompass the necessity for structured as opposed to natural language, the imperative for explicit rather than tacit knowledge, and the preservation of causality and consistency throughout protocol steps. Presently, the task of protocol translation predominantly requires the manual and labor-intensive involvement of domain experts and information technology specialists, rendering the process time-intensive. To address these issues, this work proposes a framework that automates the protocol translation process through a three-stage workflow, which incrementally constructs Protocol Dependence Graphs (PDGs) that approach structured on the syntax level, completed on the semantics level, and linked on the execution level. Quantitative and qualitative evaluations have demonstrated its performance at par with that of human experts, underscoring its potential to significantly expedite and democratize the process of scientific discovery by elevating the automation capabilities within self-driving laboratories.
This paper proposes CLAIRIFY, an approach that combines automatic iterative prompting with program verification to ensure programs written in data-scarce domain-specific language are syntactically valid and incorporate environment constraints.
AI Assisted Research
AI Assisted Research
A survey on the performance of LLMs within the context of scientific discovery, focusing on GPT-4.
This paper presents an automatic evaluation framework for the task of planning experimental protocols, and introduces BioProt: a dataset of biology protocols with corresponding pseudocode representations.
Functional proteomics provides critical insights into cancer mechanisms, facilitating the discovery of novel biomarkers and therapeutic targets. The authors have developed a comprehensive cancer functional proteomics resource using reverse phase protein arrays, incorporating data from nearly 8000 patient samples from The Cancer Genome Atlas and approximately 900 samples from the Cancer Cell Line Encyclopedia. The dataset includes a curated panel of nearly 500 high-quality antibodies, covering all major cancer hallmark pathways. To enhance the accessibility and analytic power of this resource, this work introduces DrBioRight 2.0, an intuitive bioinformatic platform powered by state-of-the-art large language models. DrBioRight enables researchers to explore protein-centric cancer omics data, perform advanced analyses, visualize results, and engage in interactive discussions using natural language. By streamlining complex proteogenomic analyses, this tool accelerates the translation of large-scale functional proteomics data into meaningful biomedical insights.
Imperative DSL Applications
This paper introduces BioCoder, a C++ library that enables biologists to express the exact steps needed to execute a protocol. In addition to being suitable for automation, BioCoder converts the code into a readable, English-language description for use by biologists.
The Corel DSL for cooking recipes enables understanding of and computation with ingredients, and can construct a nutrition label for the recipe.
This paper introduces Infinigen, a procedural generator of photorealistic 3D scenes of the natural world. Infinigen is entirely procedural: every asset, from shape to texture, is generated from scratch via randomized mathematical rules, using no external source and allowing infinite variation and composition.
This work introduces Infinigen Indoors, a Blender-based procedural generator of photorealistic indoor scenes. It builds upon the existing Infinigen system, which focuses on natural scenes, but expands its coverage to indoor scenes by introducing a diverse library of procedural indoor assets, including furniture, architecture elements, appliances, and other day-to-day objects. It also introduces a constraint-based arrangement system, which consists of a domain-specific language for expressing diverse constraints on scene composition, and a solver that generates scene compositions that maximally satisfy the constraints. The authors provide an export tool that allows the generated 3D objects and scenes to be directly used for training embodied agents in real-time simulators such as Omniverse and Unreal. Infinigen Indoors is open-sourced under the BSD license.
Large language models can produce creative and diverse responses. However, to integrate them into current developer workflows, it is essential to constrain their outputs to follow specific formats or standards. This work surveyed 51 experienced industry professionals to understand the range of scenarios and motivations driving the need for output constraints from a user-centered perspective. The authors identified 134 concrete use cases for constraints at two levels: low-level, which ensures the output adhere to a structured format and an appropriate length, and high-level, which requires the output to follow semantic and stylistic guidelines without hallucination. Critically, applying output constraints could not only streamline the currently repetitive process of developing, testing, and integrating LLM prompts for developers, but also enhance the user experience of LLM-powered features and applications. The authors conclude with a discussion on user preferences and needs towards articulating intended constraints for LLMs, alongside an initial design for a constraint prototyping tool.
The material acceleration platform, empowered by robotics and artificial intelligence, is a transformative approach for expediting material discovery processes across diverse domains. However, the development of an operating system for material acceleration platform faces challenges in simultaneously managing diverse experiments from multiple users. Specifically, when it is utilized by multiple users, the overlapping challenges of experimental modules or devices can lead to inefficiencies in both resource utilization and safety hazards. To overcome these challenges, this work presents an operation control system for material acceleration platform, namely, OCTOPUS, which is an acronym for operation control system for task optimization and job parallelization via a user-optimal scheduler. OCTOPUS streamlines experiment scheduling and optimizes resource utilization through integrating its interface node, master node and module nodes. Leveraging process modularization and a network protocol, OCTOPUS ensures the homogeneity, scalability, safety and versatility of the platform. In addition, OCTOPUS embodies a user-optimal scheduler. Job parallelization and task optimization techniques mitigate delays and safety hazards within realistic operational environments, while the closed-packing schedule algorithm efficiently executes multiple jobs with minimal resource waste. Copilot of OCTOPUS is developed to promote the reusability of OCTOPUS for potential users with their own sets of lab resources, which substantially simplifies the process of code generation and customization through GPT recommendations and client feedback. This work offers a solution to the challenges encountered within the platform accessed by multiple users, and thereby will facilitate its widespread adoption in material development processes.
Declarative DSL Applications
Advances in machine learning (ML) and automated experimentation are poised to vastly accelerate research in polymer science. Data representation is a critical aspect for enabling ML integration in research workflows, yet many data models impose significant rigidity making it difficult to accommodate a broad array of experiment and data types found in polymer science. This inflexibility presents a significant barrier for researchers to leverage their historical data in ML development. This work shows that a domain specific language, termed Chemical Markdown Language (CMDL), provides flexible, extensible, and consistent representation of disparate experiment types and polymer structures. CMDL enables seamless use of historical experimental data to fine-tune regression transformer (RT) models for generative molecular design tasks. The authors demonstrate the utility of this approach through the generation and the experimental validation of catalysts and polymers in the context of ring-opening polymerization---although the authors provide examples of how CMDL can be more broadly applied to other polymer classes. Critically, this work shows how the CMDL tuned model preserves key functional groups within the polymer structure, allowing for experimental validation. These results reveal the versatility of CMDL and how it facilitates translation of historical data into meaningful predictive and generative models to produce experimentally actionable output.
It is now possible to model all or parts of legal agreements using code (smart contracts), decreasing the cost and friction of creating, securing, and generating binding legal agreements. Lawyers lack basic tools to build these dynamic, “smart” contracts in a way that is enforceable and understandable to a legal professional. OpenLaw is a technology stack to help power next generation "smart" legal agreements, with a domain-specific markup language, a integration framework, and a series of general applications.
This paper presents Omega, a probabilistic programming language with support for counterfactual inference. This feature is accomplished by introducing a new operator to probabilistic programming akin to Pearl’s do.
People are remarkably capable of generating their own goals, beginning with child’s play and continuing into adulthood. Despite considerable empirical and computational work on goals and goal-oriented behaviour, models are still far from capturing the richness of everyday human goals. This work bridges this gap by collecting a dataset of human-generated playful goals (in the form of scorable, single-player games), modelling them as reward-producing programs and generating novel human-like goals through program synthesis. Reward-producing programs capture the rich semantics of goals through symbolic operations that compose, add temporal constraints and allow program execution on behavioural traces to evaluate progress. To build a generative model of goals, the authors learn a fitness function over the infinite set of possible goal programs and sample novel goals with a quality-diversity algorithm. Human evaluators found that model-generated goals, when sampled from partitions of program space occupied by human examples, were indistinguishable from human-created games. The authors also discovered that the model’s internal fitness scores predict games that are evaluated as more fun to play and more human-like.
This paper introduces the Scene Language, a visual scene representation that concisely and precisely describes the structure, semantics, and identity of visual scenes. It represents a scene with three key components: a program that specifies the hierarchical and relational structure of entities in the scene, words in natural language that summarize the semantic class of each entity, and embeddings that capture the visual identity of each entity. This representation can be inferred from pre-trained language models via a training-free inference technique, given text or image inputs.
This work proposes a systematic learning-based approach to the generation of massive quantities of synthetic 3D scenes and arbitrary numbers of photorealistic 2D images thereof, for the purposes of training, benchmarking, and diagnosing learning-based computer vision and robotics algorithms. In particular, the authors devise a learning-based pipeline of algorithms capable of automatically generating and rendering a potentially infinite variety of indoor scenes by using a stochastic grammar, represented as an attributed Spatial And-Or Graph, in conjunction with state-of-the-art physics-based rendering. The pipeline is capable of synthesizing scene layouts with high diversity, and it is configurable inasmuch as it enables the precise customization and control of important attributes of the generated scenes. It renders photorealistic RGB images of the generated scenes while automatically synthesizing detailed, per-pixel ground truth data, including visible surface depth and normal, object identity, and material information (detailed to object parts), as well as environments (e.g., illuminations and camera viewpoints). The authors demonstrate the value of the synthesized dataset, by improving performance in certain machine-learning-based scene understanding tasks—depth and surface normal prediction, semantic segmentation, reconstruction, etc.---and by providing benchmarks for and diagnostics of trained models by modifying object attributes and scene properties in a controllable manner.
This research addresses the understanding hardness raised from the conceptual discrepancy between contractual clauses and corresponding code of the Solidity programming language, by the design and study of a domain-specific smart contract language based on higher level of abstraction that can be automatically transformed to an implementation.
This paper investigates the application of domain-specific languages in product line engineering (PLE). It starts by analyzing the limits of expressivity of feature models. Feature models correspond to context-free grammars without recursion, which prevents the expression of multiple instances and references. The authors then show how domain-specific languages (DSLs) can serve as a middle ground between feature modeling and programming. They can be used in cases where feature models are too limited, while keeping the separation between problem space and solution space provided by feature models. This work then categorizes useful combinations between configuration with feature model and construction with DSLs and provide an integration of DSLs into the conceptual framework of PLE. Finally the authors show how use of a consistent, unified formalism for models, code, and configuration can yield important benefits for managing variability and trace ability.