RemyxCodeExecutor: Agentic Code Exploration and Execution#

Check out the Remyx Docs for new utilities released!

Discover → Explore → Experiment with novel codebases through AI-guided exploration of Remyx-built Docker images.

The Problem#

Experimenting with novel codebases means hours resolving dependencies (CUDA versions, conflicting packages, undocumented requirements) plus manual code archaeology to find what matters in unfamiliar repos.

The Solution#

RemyxCodeExecutor provides:

Remyx-built Docker images that reproduce codebases from 7000+ arXiv papers (expanding to other resources): * Complete environments with dependencies resolved * Pull and execute immediately * Reproduce and extend codebases to your use case

AI agents via AG2 (AutoGen) that guide exploration: * Navigate repos and identify key implementations * Explain architecture and core logic * Execute code and interpret results * Modify for custom experiments

This notebook shows you how to search for Docker images, launch AI-guided exploration, and run experiments on reproduced codebases in minutes instead of hours.

Prerequisites#

Install AG2 with Remyx support:

pip install ag2[remyx]

Make sure you have the following dependencies: * Remyx AI API key * Docker * OpenAI API key for LLM agents

Set your API tokens as environment variables:

export REMYXAI_API_KEY=your_remyxai_token
export OPENAI_API_KEY=your_openai_key

import os

# Ensure you have your API tokens set
assert os.getenv("REMYXAI_API_KEY"), "Please set REMYXAI_API_KEY environment variable"
assert os.getenv("OPENAI_API_KEY"), "Please set OPENAI_API_KEY environment variable"

Step 1: Discover Papers#

Search 1000+ research papers with pre-built Docker environments:

from remyxai.client.search import SearchClient

client = SearchClient()
query = "CLIP semantic alignment"

# Search for papers
papers = client.search(query=query, has_docker=True, max_results=5)

# Browse results
for paper in papers:
    print(f"📖 {paper.title[:50]}...")
    print(f"   arXiv: {paper.arxiv_id}")
    print(f"   image: {paper.docker_image}")
    print(f"   abstract: {paper.abstract}\n")

You’ll see results like:

📖 CLIPin: A Non-contrastive Plug-in to CLIP for Mult...
   arXiv: 2508.06434v1
   image: remyxai/2508.06434v1:latest
   abstract: Large-scale natural image-text datasets, especially those automatically
collected from the web, often suffer from loose semantic alignment due to weak
supervision, while medical datasets tend to have high cross-modal correlation
but low content diversity. These properties pose a common challenge for
contrastive language-image pretraining (CLIP): they hinder the model's ability
to learn robust and generalizable representations. In this work, we propose
CLIPin, a unified non-contrastive plug-in th

📖 COOkeD: Ensemble-based OOD detection in the era of...
   arXiv: 2507.22576v1
   image: remyxai/2507.22576v1:latest
   abstract: Out-of-distribution (OOD) detection is an important building block in
trustworthy image recognition systems as unknown classes may arise at
test-time. OOD detection methods typically revolve around a single classifier,
leading to a split in the research field between the classical supervised
setting (e.g. ResNet18 classifier trained on CIFAR100) vs. the zero-shot
setting (class names fed as prompts to CLIP). In both cases, an overarching
challenge is that the OOD detection performance is implici

📖 Mammo-CLIP Dissect: A Framework for Analysing Mamm...
   arXiv: 2509.21102v1
   image: remyxai/2509.21102v1:latest
   abstract: Understanding what deep learning (DL) models learn is essential for the safe
deployment of artificial intelligence (AI) in clinical settings. While previous
work has focused on pixel-based explainability methods, less attention has been
paid to the textual concepts learned by these models, which may better reflect
the reasoning used by clinicians. We introduce Mammo-CLIP Dissect, the first
concept-based explainability framework for systematically dissecting DL vision
models trained for mammograp

📖 CLASP: General-Purpose Clothes Manipulation with S...
   arXiv: 2507.19983v1
   image: remyxai/2507.19983v1:latest
   abstract: Clothes manipulation, such as folding or hanging, is a critical capability
for home service robots. Despite recent advances, most existing methods remain
limited to specific tasks and clothes types, due to the complex,
high-dimensional geometry of clothes. This paper presents CLothes mAnipulation
with Semantic keyPoints (CLASP), which aims at general-purpose clothes
manipulation over different clothes types, T-shirts, shorts, skirts, long
dresses, ... , as well as different tasks, folding, flatt

📖 Personalized Education with Ranking Alignment Reco...
   arXiv: 2507.23664v1
   image: remyxai/2507.23664v1:latest
   abstract: Personalized question recommendation aims to guide individual students
through questions to enhance their mastery of learning targets. Most previous
methods model this task as a Markov Decision Process and use reinforcement
learning to solve, but they struggle with efficient exploration, failing to
identify the best questions for each student during training. To address this,
we propose Ranking Alignment Recommendation (RAR), which incorporates
collaborative ideas into the exploration mechanism,

Step 2: Fast Exploration#

You can quickly explore the contents of the codebase and environment using the explore() method of the RemyxCodeExecutor.

How it works:

Pulls Docker image with paper’s code and dependencies
Creates AI agents (one explores, one executes)
Interactive session starts - you guide the exploration
Ask free form questions about the code, create your own tests, and expand upon the research!

You can launch an interactive session where you are able to chat with the system of agents or run automatically without pausing to run default tests and exploration.

Quick Start (Default Exploration)#

from autogen.coding import RemyxCodeExecutor

arxiv_id = papers[0].arxiv_id
executor = RemyxCodeExecutor(arxiv_id=arxiv_id)
executor.explore()

Batch Mode (Automated)#

# Runs automatically without pausing
executor = RemyxCodeExecutor(arxiv_id=arxiv_id)
result = executor.explore(goal="Run the default example quickstart", interactive=False, max_turns=10)

print(f"✅ Completed {len(result.chat_history)} steps")

Real-world Example: Exploring CLIPin#

Let’s explore the CLIPin paper - a method that improves CLIP’s semantic alignment using non-contrastive learning.

from autogen.coding import RemyxCodeExecutor

# Create executor for CLIPin paper
executor = RemyxCodeExecutor(arxiv_id=arxiv_id)

# Start interactive exploration
result = executor.explore(
    goal="""Explore CLIPin step-by-step:

    Phase 1: Understanding
    - Show repository structure
    - Read the README
    - Find the CLIPin model code

    Phase 2: Architecture
    - Explain the non-contrastive approach
    - Show the loss function
    - Compare with standard CLIP

    Work step-by-step. Explain clearly.
    """,
    interactive=True,
)

You can expect the output after multiple turns to look like:

...

--------------------------------------------------------------------------------
Replying as research_explorer. Provide feedback to code_executor. Press enter to skip and use auto-reply, or type 'exit' to end the conversation:

>>>>>>>> NO HUMAN INPUT RECEIVED.

>>>>>>>> USING AUTO REPLY...
research_explorer (to code_executor):

The `model.py` file contains the architecture of the CLIPin model, which expands upon the original CLIP architecture. Here’s a summary of the key components and classes that it implements:

### Key Components and Classes:

1. **Bottleneck Class**:
   - Implements a residual block used in building the ResNet architecture. Each block consists of three convolutional layers and has a skip connection to facilitate training deeper networks.

2. **AttentionPool2d Class**:
   - Implements an attention pooling layer that uses multi-head attention to aggregate features spatially, enhancing the model's capability to capture relationships in the data.

3. **ModifiedResNet Class**:
   - Implements a modified version of the ResNet architecture tailored for the model. It features three "stem" convolutions and an attention-based pooling layer instead of an average pool at the end.

4. **Transformers and Vision Transformers**:
   - **Transformer Class**: Implements multi-layer transformer blocks equipped with residual connections.
   - **VisionTransformer Class**: Specializes in extracting features from images and integrates a transformer mechanism for better contextual understanding.

5. **TextEncoder Class**:
   - Encodes textual data using embeddings and a transformer architecture. It incorporates positional embeddings to maintain the order of the tokens.

6. **CLIP Class**:
   - This is the main class that integrates both vision and text encoders, using components previously defined.
   - It implements various projections and transformation layers that were specifically tailored for contrastive and non-contrastive learning approaches.
   - This class also contains the forward method which computes embeddings for images and text, as well as similarity metrics.

7. **Initialization and Parameter Management**:
   - Several methods handle initialization, copying of parameters for momentum models, and setting grad checkpointing to reduce memory usage.
   - The `initialize_parameters` method sets the correct weights for different sections in the model.

8. **Utility Functions**:
   - Functions like `convert_weights` convert the model parameters to half precision for performance optimization during inference.

### Next Steps:

Now that we have outlined the structure and functionality of the CLIPin model, the next phase is to understand its non-contrastive approach and how it compares with the standard CLIP model in terms of loss function and training strategy.

Shall we explore its non-contrastive approach next?

Building on Research#

Use paper code as starting point for your own projects and research

from autogen import ConversableAgent
from autogen.coding import RemyxCodeExecutor

# Start with paper's environment
executor = RemyxCodeExecutor(arxiv_id=arxiv_id)

# Create your own agent for custom experiments
agent = ConversableAgent(
    "my_researcher", llm_config=False, code_execution_config={"executor": executor}, human_input_mode="NEVER"
)

# Run your custom code in paper's environment
agent.generate_reply(
    messages=[
        {
            "role": "user",
            "content": """```python
# Your custom experiment here
from clip.model import CLIPin
model = CLIPin.load_pretrained()
# ... your modifications ...
```""",
        }
    ]
)

Advanced Features#

Custom Docker Args#

You can pass additional args to container_create_kwargs for further customization and configuration of containers like passing additional environment variables or switching to a GPU enabled container runtime.

executor = RemyxCodeExecutor(
    arxiv_id=arxiv_id,
    timeout=600,
    container_create_kwargs={
        "environment": {
            "HF_TOKEN": os.getenv("HF_TOKEN"),
            "WANDB_API_KEY": os.getenv("WANDB_API_KEY"),
        },
        "mem_limit": "16g",
    },
)

Paper Metadata#

executor = RemyxCodeExecutor(arxiv_id=arxiv_id)
context = executor.get_paper_context()
print(context)

Direct Use of Docker Images#

# If you know the image name
executor = RemyxCodeExecutor(image="remyxai/2508.06434v1:latest", timeout=300)

Manual Agent Control#

# For advanced users who want full control
executor = RemyxCodeExecutor(arxiv_id=arxiv_id)
executor_agent, writer_agent = executor.create_agents(goal="Custom exploration", llm_model="gpt-4o")

# Customize the chat
result = executor_agent.initiate_chat(writer_agent, message="Begin exploring the contents in /app", max_turns=10)

Tips & Tricks#

Start with Search

You can quickly browse the catalog of pre-built images for papers you may want to experiment. Search papers and prebuilt Docker images using full text, keywords, or arXiv IDs.

from remyxai.client.search import SearchClient

papers = SearchClient().search(query="data synthesis techniques", has_docker=True, max_results=10)

for p in papers:
    print(f"{p.arxiv_id}: {p.title[:50]}...")

Use Interactive Mode for Learning

Pause at each step to guide the agents in your exploration

executor.explore(
    goal="Explain this paper's approach",
    interactive=True,  # Lets you guide each step
)

Use Batch Mode for Experiments

Expand your experimentation by running multiple papers automatically:

paper_ids = ["2508.06434v1", "2103.00020v1", "2010.11929v2"]

results = {}
for arxiv_id in paper_ids:
    executor = RemyxCodeExecutor(arxiv_id=arxiv_id)
    result = executor.explore(goal="Run quickstart", interactive=False, verbose=False)
    results[arxiv_id] = result

# Compare results
for arxiv_id, result in results.items():
    print(f"{arxiv_id}: {len(result.chat_history)} steps")

Check Metadata

Get a quick summary of all the available resources for a paper you may be interested in exploring further

executor = RemyxCodeExecutor(arxiv_id=arxiv_id)
print(executor.get_paper_context())
# Shows: title, GitHub, working directory, quickstart hints

Summary#

This notebook showed you how RemyxCodeExecutor transforms research paper execution:

Three Powerful Modes: - Quick Start: executor.explore() - AI-guided exploration with defaults - Learning Mode: Interactive step-by-step with custom goals - Batch Mode: Automated experiments across multiple papers

What Makes It Special: - Pre-configured Docker environments for 1000+ papers - Zero dependency setup (everything pre-installed) - AI agents that explain as they explore - Reproducible execution every time

Quick Reference#

```python # 1. Search from remyxai.client.search import SearchClient papers = SearchClient().search(“your topic”, has_docker=True)

2. Create executor#

from autogen.coding import RemyxCodeExecutor executor = RemyxCodeExecutor(arxiv_id=papers[0].arxiv_id)

3. Explore (pick one mode)#

executor.explore() # Quick start executor.explore(goal=“…”, interactive=True) # Learning executor.explore(goal=“…”, interactive=False) # Batch