Project Awesome project awesome

AGI & CoCoSci

The reciprocation of Artificial General Intelligence (AGI) and Computational Cognitive Sciences (CoCoSci).

Collection 371 stars GitHub

AI Concept Representation

ImageBind: One Embedding Space To Bind Them All 9.0k updated 4mo ago

This work presents ImageBind, an approach to learn a joint embedding across six different modalities - images, text, audio, depth, thermal, and IMU data. The authors show that all combinations of paired data are not necessary to train such a joint embedding, and only image-paired data is sufficient to bind the modalities together. ImageBind can leverage recent large scale vision-language models, and extends their zero-shot capabilities to new modalities just by using their natural pairing with images. It enables novel emergent applications 'out-of-the-box' including cross-modal retrieval, composing modalities with arithmetic, cross-modal detection and generation. The emergent capabilities improve with the strength of the image encoder and this work sets a new state-of-the-art on emergent zero-shot recognition tasks across modalities, outperforming specialist supervised models. Finally, the authors show strong few-shot recognition results outperforming prior work, and that ImageBind serves as a new way to evaluate vision models for visual and non-visual tasks.

Metabolic activity organizes olfactory representations 30 updated 2y ago

Odorous compounds with similar POM representations are more likely to co-occur within a substance and be metabolically closely related; metabolic reaction sequences also follow smooth paths in POM despite large jumps in molecular structure.

Connecting Touch and Vision via Cross-Modal Prediction 77 updated 6y ago

Humans perceive the world using multi-modal sensory inputs such as vision, audition, and touch. This work investigates the cross-modal connection between vision and touch. The main challenge in this cross-domain modeling task lies in the significant scale discrepancy between the two: while our eyes perceive an entire visual scene at once, humans can only feel a small region of an object at any given moment. To connect vision and touch, this work introduces new tasks of synthesizing plausible tactile signals from visual inputs as well as imagining how we interact with objects given tactile data as input. To accomplish the goals, the authors first equip robots with both visual and tactile sensors and collect a large-scale dataset of corresponding vision and tactile image sequences. To close the scale gap, the authors present a new conditional adversarial model that incorporates the scale and location information of the touch. Human perceptual studies demonstrate that the model can produce realistic visual images from tactile data and vice versa.

Papers

Theory

On the Complexity of Bayesian Generalization 8 updated 2y ago

This work examines concept generalization at a large scale in the natural visual spectrum. Established computational modes (i.e., rule-based or similarity-based) are primarily studied isolated, focusing on confined and abstract problem spaces. This work studies these two modes when the problem space scales up and when the complexity of concepts becomes diverse. At the representational level, the authors investigate how the complexity varies when a visual concept is mapped to the representation space. Prior literature has shown that two types of complexities build an inverted-U relation. Leveraging Representativeness of Attribute (RoA), the authors computationally confirm: Models use attributes with high RoA to describe visual concepts, and the description length falls in an inverted-U relation with the increment in visual complexity. At the computational level, the authors examine how the complexity of representation affects the shift between the rule- and similarity-based generalization. The authors hypothesize that category-conditioned visual modeling estimates the co-occurrence frequency between visual and categorical attributes, thus potentially serving as the prior for the natural visual world. Experimental results show that representations with relatively high subjective complexity outperform those with relatively low subjective complexity in rule-based generalization, while the trend is the opposite in similarity-based generalization.

Coordination

HLSMAC: A New StarCraft Multi-Agent Challenge for High-Level Strategic Decision-Making

AAMAS'26, 2026. [All Versions]. Benchmarks are crucial for assessing multi-agent reinforcement learning (MARL) algorithms. While StarCraft II-related environments have driven significant advances in MARL, existing benchmarks like SMAC focus primarily on micromanagement, limiting comprehensive evaluation of high-level strategic intelligence. To address this, this work introduces HLSMAC, a new cooperative MARL benchmark with 12 carefully designed StarCraft II scenarios based on classical stratagems from the Thirty-Six Stratagems. Each scenario corresponds to a specific stratagem and is designed to challenge agents with diverse strategic elements, including tactical maneuvering, timing coordination, and deception, thereby opening up avenues for evaluating high-level strategic decision-making capabilities. The authors also propose novel metrics across multiple dimensions beyond conventional win rate, such as ability utilization and advancement efficiency, to assess agents' overall performance within the HLSMAC environment. The authors conduct a large-scale evaluation of 21 state-of-the-art MARL algorithms and LLM-based agents, with additional multi-seed analysis for relatively better-performing methods. The results demonstrate that HLSMAC serves as a robust testbed for advancing multi-agent strategic decision-making.

HLSMAC: A New StarCraft Multi-Agent Challenge for High-Level Strategic Decision-Making

AAMAS'26, 2026. [All Versions]. Benchmarks are crucial for assessing multi-agent reinforcement learning (MARL) algorithms. While StarCraft II-related environments have driven significant advances in MARL, existing benchmarks like SMAC focus primarily on micromanagement, limiting comprehensive evaluation of high-level strategic intelligence. To address this, this work introduces HLSMAC, a new cooperative MARL benchmark with 12 carefully designed StarCraft II scenarios based on classical stratagems from the Thirty-Six Stratagems. Each scenario corresponds to a specific stratagem and is designed to challenge agents with diverse strategic elements, including tactical maneuvering, timing coordination, and deception, thereby opening up avenues for evaluating high-level strategic decision-making capabilities. The authors also propose novel metrics across multiple dimensions beyond conventional win rate, such as ability utilization and advancement efficiency, to assess agents' overall performance within the HLSMAC environment. The authors conduct a large-scale evaluation of 21 state-of-the-art MARL algorithms and LLM-based agents, with additional multi-seed analysis for relatively better-performing methods. The results demonstrate that HLSMAC serves as a robust testbed for advancing multi-agent strategic decision-making.

Rationality

The Adaptive Nature of Human Categorization Behavior

The original paper that relates cognitive resource limitation with Bayesian rational analysis, in the case of categorization behavior.

Task switching

The original paper on ``switch cost'', where subjects' responses are substantially slower and, usually, more error-prone immediately after a task switch.

Computational Rationality: Linking Mechanism and Behavior Through Bounded Utility Maximization

Introducing the computational rationality framework for including information-processing bounds in rational analyses, which emphasizes the incorporation of computational mechanism into the definition of rational action.

Computational rationality: A converging paradigm for intelligence in brains, minds, and machines

A comprehensive review on the rationality of Bayesian computational models.

Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources

A resource-rational account on interpreting human intelligence.

Rational Use of Cognitive Resources: Levels of Analysis Between the Computational and the Algorithmic

An earlier version of the paper above.

Understanding Human Intelligence through Human Limitations

Recent progress in artificial intelligence provides the opportunity to ask the question of what is unique about human intelligence, but with a new comparison class. The author argues that we can understand human intelligence, and the ways in which it may differ from artificial intelligence, by considering the characteristics of the kind of computational problems that human minds have to solve. The author claims that these problems acquire their structure from three fundamental limitations that apply to human beings: limited time, limited computation, and limited communication. From these limitations we can derive many of the properties we associate with human intelligence, such as rapid learning, the ability to break down problems into parts, and the capacity for cumulative cultural evolution.

Foundations of intuitive power analyses in children and adults

Evidences support that people have some of the foundations for 'intuitive power analyses', which help people use intuitive statistical reasoning and metacognitive strategies to estimate how much information they might need to solve different discrimination problems.

Cognitive Science as a Source of Forward and Inverse Models of Human Decisions for Robotics and Control

The review focuses on how cognitive science can provide forward models of human decision-making and inverse models of how humans think about others’ decision-making. The authors highlight relevant recent developments, including approaches that synthesize black box and theory-driven modeling, accounts that recast heuristics and biases as forms of bounded optimality, and models that characterize human theory of mind and communication in decision-theoretic terms.

Program Synthesis

Sumit Gulwani's comprehensive review on program synthesis.

The Discovery of the Equator or Concept Driven Learning

The original paper on second-order metarules.

Towards combining inductive logic programming with Bayesian networks
Meta-interpretive learning: application to grammatical inference

Stephen Muggleton's original paper on Meta-Interpretive Learning (MIL).

Learning Efficient Logical Robot Strategies Involving Composable Objects
Learning Higher-Order Logic Programs through Abstraction and Invention
How Much Can Experimental Cost Be Reduced in Active Learning of Agent Strategies?
Meta-Interpretive Learning from noisy images
Learning efficient logic programs
Learning higher-order logic programs
Logical reduction of metarules
Playgol: Learning Programs Through Play
Machine Discovery of Comprehensible Strategies for Simple Games Using Meta-interpretive Learning
Forgetting to Learn Logic Programs
Turning 30: New Ideas in Inductive Logic Programming
Inductive logic programming at 30: a new introduction

A 30-year comprehensive review on Inductive Logic Programming.

Learning programs by learning from failures
Complete Bottom-Up Predicate Invention in Meta-Interpretive Learning
Meta-Interpretive Learning as Metarule Specialisation
Qualitative choice logic
Derivative-free optimization of high-dimensional non-convex functions by sequential random embeddings
Finitely Generated Groups and First-Order Logic
Program Synthesis Guided Reinforcement Learning
Program Synthesis with Large Language Models

This paper explores the limits of the current generation of large language models for program synthesis in general purpose programming languages.

From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought

Rational meaning construction, a computational framework for language-informed thinking that combines neural language models with probabilistic models for rational inference. Linguistic meaning is framed as a context-sensitive mapping from natural language into a probabilistic language of thought (PLoT)--a general-purpose symbolic substrate for generative world modeling.

Large Language Models Meet NL2Code: A Survey

A paper presenting a comprehensive survey of 27 existing large language models for NL2Code, and also review benchmarks and metrics, suggesting that the key factors contributing to the success of large language models for NL2Code are “Large Size, Premium Data, Expert Tuning”.

Large Language Models for Software Engineering: A Systematic Literature Review

A systematic literature review on LLM4SE, with a particular focus on understanding how LLMs can be exploited to optimize processes and outcomes.

DSL Program Synthesis

pix2code: Generating Code from a Graphical User Interface Screenshot

This paper shows that deep learning methods can be leveraged to train a model end-to-end to automatically reverse engineer user interfaces and generate code from a single input image with over 77% of accuracy for three different platforms (i.e. iOS, Android and web-based technologies).

Expert-level protocol translation for self-driving labs

Recent development in Artificial Intelligence (AI) models has propelled their application in scientific discovery, but the validation and exploration of these discoveries require subsequent empirical experimentation. The concept of self-driving laboratories promises to automate and thus boost the experimental process following AI-driven discoveries. However, the transition of experimental protocols, originally crafted for human comprehension, into formats interpretable by machines presents significant challenges, which, within the context of specific expert domain, encompass the necessity for structured as opposed to natural language, the imperative for explicit rather than tacit knowledge, and the preservation of causality and consistency throughout protocol steps. Presently, the task of protocol translation predominantly requires the manual and labor-intensive involvement of domain experts and information technology specialists, rendering the process time-intensive. To address these issues, this work proposes a framework that automates the protocol translation process through a three-stage workflow, which incrementally constructs Protocol Dependence Graphs (PDGs) that approach structured on the syntax level, completed on the semantics level, and linked on the execution level. Quantitative and qualitative evaluations have demonstrated its performance at par with that of human experts, underscoring its potential to significantly expedite and democratize the process of scientific discovery by elevating the automation capabilities within self-driving laboratories.

Errors are Useful Prompts: Instruction Guided Task Programming with Verifier-Assisted Iterative Prompting 47 updated 1y ago

This paper proposes CLAIRIFY, an approach that combines automatic iterative prompting with program verification to ensure programs written in data-scarce domain-specific language are syntactically valid and incorporate environment constraints.

AI Assisted Research

AI Assisted Research

The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4 77 updated 1y ago

A survey on the performance of LLMs within the context of scientific discovery, focusing on GPT-4.

BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology 28 updated 1y ago

This paper presents an automatic evaluation framework for the task of planning experimental protocols, and introduces BioProt: a dataset of biology protocols with corresponding pseudocode representations.

DrBioRight 2.0: an LLM-powered bioinformatics chatbot for large-scale cancer functional proteomics analysis

Functional proteomics provides critical insights into cancer mechanisms, facilitating the discovery of novel biomarkers and therapeutic targets. The authors have developed a comprehensive cancer functional proteomics resource using reverse phase protein arrays, incorporating data from nearly 8000 patient samples from The Cancer Genome Atlas and approximately 900 samples from the Cancer Cell Line Encyclopedia. The dataset includes a curated panel of nearly 500 high-quality antibodies, covering all major cancer hallmark pathways. To enhance the accessibility and analytic power of this resource, this work introduces DrBioRight 2.0, an intuitive bioinformatic platform powered by state-of-the-art large language models. DrBioRight enables researchers to explore protein-centric cancer omics data, perform advanced analyses, visualize results, and engage in interactive discussions using natural language. By streamlining complex proteogenomic analyses, this tool accelerates the translation of large-scale functional proteomics data into meaningful biomedical insights.

Imperative DSL Applications

Biocoder: A programming language for standardizing and automating biology protocols 1 updated 7y ago

This paper introduces BioCoder, a C++ library that enables biologists to express the exact steps needed to execute a protocol. In addition to being suitable for automation, BioCoder converts the code into a readable, English-language description for use by biologists.

Corel: A DSL for Cooking Recipes

The Corel DSL for cooking recipes enables understanding of and computation with ingredients, and can construct a nutrition label for the recipe.

Infinite Photorealistic Worlds Using Procedural Generation

This paper introduces Infinigen, a procedural generator of photorealistic 3D scenes of the natural world. Infinigen is entirely procedural: every asset, from shape to texture, is generated from scratch via randomized mathematical rules, using no external source and allowing infinite variation and composition.

Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation

This work introduces Infinigen Indoors, a Blender-based procedural generator of photorealistic indoor scenes. It builds upon the existing Infinigen system, which focuses on natural scenes, but expands its coverage to indoor scenes by introducing a diverse library of procedural indoor assets, including furniture, architecture elements, appliances, and other day-to-day objects. It also introduces a constraint-based arrangement system, which consists of a domain-specific language for expressing diverse constraints on scene composition, and a solver that generates scene compositions that maximally satisfy the constraints. The authors provide an export tool that allows the generated 3D objects and scenes to be directly used for training embodied agents in real-time simulators such as Omniverse and Unreal. Infinigen Indoors is open-sourced under the BSD license.

"We Need Structured Output": Towards User-centered Constraints on Large Language Model Output

Large language models can produce creative and diverse responses. However, to integrate them into current developer workflows, it is essential to constrain their outputs to follow specific formats or standards. This work surveyed 51 experienced industry professionals to understand the range of scenarios and motivations driving the need for output constraints from a user-centered perspective. The authors identified 134 concrete use cases for constraints at two levels: low-level, which ensures the output adhere to a structured format and an appropriate length, and high-level, which requires the output to follow semantic and stylistic guidelines without hallucination. Critically, applying output constraints could not only streamline the currently repetitive process of developing, testing, and integrating LLM prompts for developers, but also enhance the user experience of LLM-powered features and applications. The authors conclude with a discussion on user preferences and needs towards articulating intended constraints for LLMs, alongside an initial design for a constraint prototyping tool.

OCTOPUS: operation control system for task optimization and job parallelization via a user-optimal scheduler 1.7k updated yesterday

The material acceleration platform, empowered by robotics and artificial intelligence, is a transformative approach for expediting material discovery processes across diverse domains. However, the development of an operating system for material acceleration platform faces challenges in simultaneously managing diverse experiments from multiple users. Specifically, when it is utilized by multiple users, the overlapping challenges of experimental modules or devices can lead to inefficiencies in both resource utilization and safety hazards. To overcome these challenges, this work presents an operation control system for material acceleration platform, namely, OCTOPUS, which is an acronym for operation control system for task optimization and job parallelization via a user-optimal scheduler. OCTOPUS streamlines experiment scheduling and optimizes resource utilization through integrating its interface node, master node and module nodes. Leveraging process modularization and a network protocol, OCTOPUS ensures the homogeneity, scalability, safety and versatility of the platform. In addition, OCTOPUS embodies a user-optimal scheduler. Job parallelization and task optimization techniques mitigate delays and safety hazards within realistic operational environments, while the closed-packing schedule algorithm efficiently executes multiple jobs with minimal resource waste. Copilot of OCTOPUS is developed to promote the reusability of OCTOPUS for potential users with their own sets of lab resources, which substantially simplifies the process of code generation and customization through GPT recommendations and client feedback. This work offers a solution to the challenges encountered within the platform accessed by multiple users, and thereby will facilitate its widespread adoption in material development processes.

Declarative DSL Applications

Artificial intelligence driven design of catalysts and materials for ring opening polymerization using a domain-specific language 16 updated 6mo ago

Advances in machine learning (ML) and automated experimentation are poised to vastly accelerate research in polymer science. Data representation is a critical aspect for enabling ML integration in research workflows, yet many data models impose significant rigidity making it difficult to accommodate a broad array of experiment and data types found in polymer science. This inflexibility presents a significant barrier for researchers to leverage their historical data in ML development. This work shows that a domain specific language, termed Chemical Markdown Language (CMDL), provides flexible, extensible, and consistent representation of disparate experiment types and polymer structures. CMDL enables seamless use of historical experimental data to fine-tune regression transformer (RT) models for generative molecular design tasks. The authors demonstrate the utility of this approach through the generation and the experimental validation of catalysts and polymers in the context of ring-opening polymerization---although the authors provide examples of how CMDL can be more broadly applied to other polymer classes. Critically, this work shows how the CMDL tuned model preserves key functional groups within the polymer structure, allowing for experimental validation. These results reveal the versatility of CMDL and how it facilitates translation of historical data into meaningful predictive and generative models to produce experimentally actionable output.

OpenLaw

It is now possible to model all or parts of legal agreements using code (smart contracts), decreasing the cost and friction of creating, securing, and generating binding legal agreements. Lawyers lack basic tools to build these dynamic, “smart” contracts in a way that is enforceable and understandable to a legal professional. OpenLaw is a technology stack to help power next generation "smart" legal agreements, with a domain-specific markup language, a integration framework, and a series of general applications.

A Language for Counterfactual Generative Models 178 updated 5mo ago

This paper presents Omega, a probabilistic programming language with support for counterfactual inference. This feature is accomplished by introducing a new operator to probabilistic programming akin to Pearl’s do.

Goals as reward-producing programs 13 updated 1y ago

People are remarkably capable of generating their own goals, beginning with child’s play and continuing into adulthood. Despite considerable empirical and computational work on goals and goal-oriented behaviour, models are still far from capturing the richness of everyday human goals. This work bridges this gap by collecting a dataset of human-generated playful goals (in the form of scorable, single-player games), modelling them as reward-producing programs and generating novel human-like goals through program synthesis. Reward-producing programs capture the rich semantics of goals through symbolic operations that compose, add temporal constraints and allow program execution on behavioural traces to evaluate progress. To build a generative model of goals, the authors learn a fitness function over the infinite set of possible goal programs and sample novel goals with a quality-diversity algorithm. Human evaluators found that model-generated goals, when sampled from partitions of program space occupied by human examples, were indistinguishable from human-created games. The authors also discovered that the model’s internal fitness scores predict games that are evaluated as more fun to play and more human-like.

The Scene Language: Representing Scenes with Programs, Words, and Embeddings

This paper introduces the Scene Language, a visual scene representation that concisely and precisely describes the structure, semantics, and identity of visual scenes. It represents a scene with three key components: a program that specifies the hierarchical and relational structure of entities in the scene, words in natural language that summarize the semantic class of each entity, and embeddings that capture the visual identity of each entity. This representation can be inferred from pre-trained language models via a training-free inference technique, given text or image inputs.

Configurable 3D Scene Synthesis and 2D Image Rendering with Per-pixel Ground Truth Using Stochastic Grammars

This work proposes a systematic learning-based approach to the generation of massive quantities of synthetic 3D scenes and arbitrary numbers of photorealistic 2D images thereof, for the purposes of training, benchmarking, and diagnosing learning-based computer vision and robotics algorithms. In particular, the authors devise a learning-based pipeline of algorithms capable of automatically generating and rendering a potentially infinite variety of indoor scenes by using a stochastic grammar, represented as an attributed Spatial And-Or Graph, in conjunction with state-of-the-art physics-based rendering. The pipeline is capable of synthesizing scene layouts with high diversity, and it is configurable inasmuch as it enables the precise customization and control of important attributes of the generated scenes. It renders photorealistic RGB images of the generated scenes while automatically synthesizing detailed, per-pixel ground truth data, including visible surface depth and normal, object identity, and material information (detailed to object parts), as well as environments (e.g., illuminations and camera viewpoints). The authors demonstrate the value of the synthesized dataset, by improving performance in certain machine-learning-based scene understanding tasks—depth and surface normal prediction, semantic segmentation, reconstruction, etc.---and by providing benchmarks for and diagnostics of trained models by modifying object attributes and scene properties in a controllable manner.

Domain Specific Language for Smart Contract Development

This research addresses the understanding hardness raised from the conceptual discrepancy between contractual clauses and corresponding code of the Solidity programming language, by the design and study of a domain-specific smart contract language based on higher level of abstraction that can be automatically transformed to an implementation.

Product Line Engineering Using Domain-Specific Languages

This paper investigates the application of domain-specific languages in product line engineering (PLE). It starts by analyzing the limits of expressivity of feature models. Feature models correspond to context-free grammars without recursion, which prevents the expression of multiple instances and references. The authors then show how domain-specific languages (DSLs) can serve as a middle ground between feature modeling and programming. They can be used in cases where feature models are too limited, while keeping the separation between problem space and solution space provided by feature models. This work then categorizes useful combinations between configuration with feature model and construction with DSLs and provide an integration of DSLs into the conceptual framework of PLE. Finally the authors show how use of a consistent, unified formalism for models, code, and configuration can yield important benefits for managing variability and trace ability.

Design Automation