Campbell-Quest

Final Project | MSc Game Development (Programming) | Kingston University

Platform: PC

Engine: Unity

Team Size: 1

Role: Lead Programmer

Roles and Responsibilities

  • System Architecture: Designed a modular pipeline linking Python (LLM backend) and Unity (frontend tools), using LangChain, Ollama, and ScriptableObjects.
  • LLM Integration & Prompt Engineering: Integrated Llama3.1, Gemma2, and Mistral-Nemo via structured prompts to generate quests, dialogues, NPCs, and items with contextual relevance.
  • Unity Tooling: Built editor tools in C# for generating and serializing game assets, including quests, NPC prefabs, and dialogue trees.
  • Automated Evaluation: Developed an evaluation pipeline using LangSmith and GPT-4 Turbo to assess clarity, creativity, engagement, and coherence of generated content.
  • Usability & Refactoring: Refined the UI by separating tools into dedicated windows and implemented reusable processors for maintainable expansion.
  • Documentation & Reporting: Produced technical documentation, UML diagrams, and a final report covering system design, outcomes, and commercial potential.

Overview

This project explores the use of Large Language Models (LLMs) to procedurally generate narrative-driven quests in role-playing games (RPGs). The core objective was to create a system that supports smaller development teams by automating the generation of rich, engaging side-quests with minimal manual effort. While handcrafted narratives still offer the highest quality, they are resource-intensive. This project demonstrates that LLMs can effectively bridge that gap by supplementing main storylines with coherent, creative, and context-aware side content.

The result is Campbell-Quest, a modular system that integrates a Python-based generation backend with the Unity Editor. It enables developers to dynamically generate quests, dialogue trees, NPCs, and quest-related items directly within their game development environment. The tool leverages models like Llama3.1, Gemma2, and Mistral-Nemo—locally deployed via Ollama—to ensure efficiency, scalability, and offline compatibility.

Instead of replacing human creativity, this system enhances it. It empowers developers to rapidly prototype and populate game worlds with compelling quest content, freeing up time for polish, iteration, and narrative refinement. The project also incorporates an automated evaluation pipeline that assesses the output quality using metrics like clarity, engagement, and creativity, ensuring consistent and meaningful player experiences.

This procedural generation tool exemplifies the potential of AI-assisted game development, offering a scalable and customizable solution for teams seeking to expand their narrative scope without expanding their production overhead.

Assets Used

Project Report

This project report provides a comprehensive breakdown of the system’s architecture, development process, evaluation methodology, and technical decisions made throughout production. If you're interested in the full academic write-up, you can download it below.

Download pdf Download docx

Key Features & Capabilities

The system is designed to bring powerful narrative tools into the hands of game developers, especially those working on limited budgets or smaller teams. By integrating large language models into the development workflow, it offers a streamlined solution for generating quest content that is both dynamic and richly detailed. Key capabilities include:

🧠 Procedural Quest Generation Using LLMs

At the core of the system is an AI-driven quest generation engine powered by large language models like Llama3.1, Gemma2, and Mistral-Nemo. These models generate narrative content—including quest descriptions, objectives, and rewards—based on customizable prompts and context data.

🎮 Seamless Unity Integration

The tool is implemented as a Unity Editor extension, enabling in-engine quest creation via intuitive editor windows. Developers can generate and visualize quests, dialogue, NPCs, and items directly within Unity, complete with automatic asset creation and serialization using ScriptableObjects and prefabs.

🗣️ Procedural Dialogue and Character Generation

Beyond quests, the system also supports the procedural generation of dialogue trees and NPC profiles. Dialogues are context-aware and linked to quest objectives, ensuring coherence between gameplay and narrative. NPCs are equipped with dialogue, behaviors, and interactions tailored to their role in the quest.

🧩 Modular Python Backend (Campbell-Quest)

The backend system is built in Python using LangChain and Ollama, with a focus on modularity, clarity, and local deployment. This ensures fast execution and avoids dependency on cloud services, making the system efficient and privacy-friendly.

🔍 Evaluation and Iteration Pipeline

An integrated evaluation system powered by LangSmith and GPT-4 Turbo allows for automated assessment of generated quests based on key metrics such as conciseness, coherence, engagement, creativity, and narrative complexity. This enables fast iteration and benchmarking across different models.

🛠️ Customizability and Extensibility

Developers can tailor quest generation using configurable input fields (prompts, characters, settings, items, and rewards). The modular architecture allows for easy expansion of features, such as adding new quest templates, item types, or NPC behaviors.

⚡ Local Deployment and Efficiency

The system is optimized for local execution, minimizing latency and eliminating internet dependency. Among the supported models, Llama3.1 offers the best balance of speed and coherence, making it ideal for rapid prototyping.

🧪 Designed for Supplementary Content

The system is purpose-built for side-quests and secondary content. It acknowledges current limitations in AI creativity and ensures generated narratives are complementary—rather than central—to the main storyline, maintaining narrative integrity while expanding world depth.

Development Process

The development of the Procedural Quest Generation system was carried out in three major phases—research and planning, iterative design and implementation, and evaluation and validation. Each phase contributed to refining the system's capabilities and ensuring robust integration with the Unity engine.

📚 Phase 1: Initial Research & Planning

The project began with an in-depth exploration of best practices in quest design. This research drew insights from academic literature and industry analyses, including resources from Game Maker’s Toolkit, Extra Credits, and The Architect of Games. The focus was on understanding what makes quests engaging—multi-part narratives, contextual objectives, and character-driven storytelling.

Simultaneously, foundational technologies were selected. Llama3.1 was chosen as the primary language model, deployed locally via Ollama. The backend was developed using Python, with early prototypes testing basic text generation and schema adherence.


import ollama

def generate_quest(template_info, questline_example, objective_info):
    brainstorming_system_prompt = (f"You are a Quest Designer at a Game studio.\n"
    f"You have been tasked with creating compelling Side-Quests for a Role-Playing Game.\n"
    f"The game is set in a Fantasy setting.\n"
    f"Create engaging and creative questlines that enhance the player's experience and provide meaningful content.\n"
    f"You should create multi-part questlines.\n"
    f"Try to compelling narratives that deviate from the norms.\n"

    f"\n###\n"

    f"The questline generated should follow the \"template\" given below:\n"

    f"{template_info}\n"

    f"Given below is an example. Use it for reference only:\n"

    f"{questline_example}\n"

    f"\n###\n"

    f"Each quest of the questline should be of a type otlined in the \"quest_objectives\" below:\n"

    f"{objective_info}\n"
    
    f"\n###\n"

    f"\nGive a name to the questline as a whole.\n"

    f"\nDescribe each quest in the format given:\n"
    f"Name:\nType:\nGoal:\nDescription:\n")

    response = ollama.chat(model="llama3", messages=[
        {
            "role": "system",
            "content": brainstorming_system_prompt
        }
    ], options={"temperature": 2})

    return response["message"]["content"]
                        
Initial Quest Generation Agent
Quest Flow
Initial Output Example

A modular structure for the system was planned during this phase, and preliminary prompts for quest generation were designed. The pipeline was named Campbell-Quest, referencing Joseph Campbell's Hero’s Journey as inspiration for structured storytelling.

🔄 Phase 2: Iterative Design & Implementation

This phase involved multiple development iterations, each targeting a distinct feature or system component:

  • LLM Integration with Unity: A Python-based generation pipeline was wrapped using Unity’s Python scripting interface. Initial tests verified cross-language communication and JSON serialization for data handling.
  • Quest Editor Tooling: A custom Unity Editor window, CampbellEditorWindow, was built to let users define prompts, objectives, characters, and locations. Generated content was parsed and converted into ScriptableObjects for seamless asset creation.
  • Dialogue, NPC, and Item Pipelines: The tool was extended to procedurally generate dialogue trees tied to quest progress, as well as NPCs and in-game items. These were implemented using modular C# classes like DialogueGenerator, NpcGenerator, and ItemGenerator, enabling automated prefab creation and asset linking.
  • UI Refactor & Modularization: The single editor window was split into four specialized interfaces—Quest, Dialogue, NPC, and Item Editors—each with dedicated processors and generators, improving maintainability and user experience.
Initial Editor Window
Initial Editor Window

Each iteration included testing within Unity to verify integration fidelity and gameplay coherence.

🧪 Phase 3: Evaluation & Validation

With the system fully functional, focus shifted to assessing the quality of generated quests. Evaluation metrics were established, covering:

  • Conciseness & Coherence
  • Relevance & Engagement
  • Creativity & Narrative Complexity

A LangChain-based evaluation pipeline was implemented using LangSmith and GPT-4 Turbo. This pipeline automated testing across multiple models (Llama3.1, Gemma2, Mistral-Nemo) using structured prompts and response analysis.

Evaluation results informed further refinements, particularly in template design, prompt engineering, and output formatting. The trade-offs between latency, token usage, and narrative quality were also studied to optimize system performance for different use cases.

Technical Architecture

The system architecture behind the Procedural Quest Generator is built around a modular and extensible framework that bridges the capabilities of Python-based language models with the Unity game engine. The architecture consists of two core components: the Campbell-Quest Python Backend and the Unity Frontend Tooling, connected via a tightly integrated pipeline that supports both development and runtime workflows.

🧠 Campbell-Quest Python Backend

At the heart of the backend is the Campbell-Quest Python package. It leverages:

  • LangChain for modular prompting and agent orchestration.
  • Ollama for local model deployment, enabling efficient and offline-compatible operations.
  • LangSmith and GPT-4 Turbo for structured evaluation and iterative testing.

The backend is responsible for generating all narrative content, including:

  • Quest narratives, objectives, and rewards.
  • Dialogue trees associated with specific NPCs.
  • NPC profiles with roles and behaviors.
  • Quest-related items (e.g., equipment, pickups).

Each component is built as a dedicated module (questAgents, dialogueAgents, enemyAgents, itemAgents) with reusable prompt templates and schema validators. Output is structured in standardized JSON formats to ensure smooth handoff to Unity.

🧰 Unity Integration Layer

On the Unity side, the system is implemented as an Editor extension using custom C# tools. It integrates with the backend through Unity's Python Scripting interface, executing Python scripts and retrieving generated data for asset creation.

Key components include:

  • Editor Windows: Custom interfaces for generating quests, dialogues, NPCs, and items.
  • Processors: Middleware classes that pass editor data to the backend and parse results.
  • Generators: Classes that convert JSON into usable game assets such as ScriptableObjects and prefabs.

Each content type is handled independently to promote scalability. For example:

  • QuestGenerator creates serialized quests with objectives and rewards.
  • DialogueGenerator creates conversation assets linked to NPCs and quest progression.
  • NpcGenerator builds prefabs with combat, quest, and dialogue components.
  • ItemGenerator produces inventory items and pickup objects tied to quest events.

The pipeline is decoupled to support asynchronous workflows, modular testing, and parallel development.

🔄 Data Flow Overview

  1. Input Configuration: Developers use the Unity Editor to define prompts, objectives, characters, settings, and other context data.
  2. Python Execution: The corresponding Processor class formats this input and runs the appropriate Python script from Campbell-Quest.
  3. LLM Output Parsing: The output is returned as structured JSON, parsed by the Generator class.
  4. Asset Creation: Based on the JSON, Unity assets (e.g., ScriptableObjects, prefabs) are automatically created and stored.
Final Quest Editor Window
Final Quest Editor Window
Example Generated Quest
Example Generated Quest
Final Dialogue Editor and Example
Final Dialogue Editor and Example

🧩 Extensibility

The architecture is designed to be highly extensible:

  • New language models can be added by modifying the ChatOllama integration.
  • New content types (e.g., factions, lore books) can be added by extending the backend modules and corresponding Unity UI components.
  • Developers can define new prompt templates, reward schemas, or dialogue formats without affecting core logic.

Evaluation Pipeline

A key component of this project was assessing the effectiveness of different language models in generating high-quality quests. While the procedural quest generation system itself does not include an in-editor evaluation pipeline, an external evaluation framework was developed to benchmark the output of different LLMs using a fixed dataset of prompts.

🤖 Language Models & Local Deployment

The system supports multiple locally deployable large language models (LLMs), including:

  • Llama3.1 – Offers the best balance between performance and speed.
  • Gemma2 – Delivers high coherence and engagement.
  • Mistral-Nemo – Excels in creativity and narrative complexity.

All models were deployed locally via Ollama, allowing for efficient, offline generation within the development environment. Prompts were constructed using LangChain, enabling structured and context-rich inputs for generating quests, dialogue, NPCs, and items.

📏 Evaluation Methodology

To assess the narrative quality of model outputs, an external evaluation pipeline was created using LangSmith and GPT-4 Turbo as the evaluator. This pipeline was used only during the testing phase and is not embedded into the Unity tool.

A dataset of prompt configurations—comprising objectives, characters, and settings—was used as input across the different models. Each model’s generated quest was then evaluated using qualitative metrics:

  • Conciseness
  • Coherence
  • Relevance
  • Engagement
  • Creativity
  • Narrative Complexity

Each evaluation was conducted using GPT-4 Turbo under a controlled scoring rubric to ensure consistency and objectivity.

🧪 Pipeline Overview

  1. Generating quests using each model from a shared prompt dataset.
  2. Automatically scoring each output based on the defined metrics.
  3. Comparing results to identify strengths and weaknesses of each model.

import os
from src.campbell_quest import quest_generator
from dotenv import load_dotenv
import time

from openai import RateLimitError

from langchain_openai import ChatOpenAI
from langsmith.schemas import Example, Run
from langsmith.evaluation import LangChainStringEvaluator, evaluate

load_dotenv()

def load_json(filename):
    try:
        with open(f"{filename}.json", "r") as file:
            info = file.read()
            print(f"{filename}.json read successfully.")
            return info
    except Exception as e:
        print(f"An error occurred: {e}")

### Quest Generation

def evaluate_quest_generation_llama(inputs: dict) -> dict:    
    prompt = inputs["prompt"]
        
    objectives = load_json("example_objectives")
    locations = inputs["locations"]
    characters = inputs["characters"]
    
    quest_schema = load_json("quest_schema")
    
    initial_generated_quest = quest_generator.generate_initial_quest(prompt, objectives, locations, characters)
    quest_with_objectives = quest_generator.generate_quest_with_objectives(initial_generated_quest, locations, characters)
    formatted_quest = quest_generator.get_formatted_quest(quest_with_objectives, quest_schema)    
    
    return {"quest": formatted_quest}
        
### Evaluation Calls
        
def run_evaluation_llama():
    dataset_name = "ds-campbell-evaluation-50"      
    evaluators = [run_clarity_evaluator, run_engagement_evaluator, run_creativity_evaluator]
    prefix = "llama"
    
    evaluate(
        evaluate_quest_generation_llama,
        data=dataset_name,
        evaluators=evaluators,
        experiment_prefix=prefix
    )
    
### Evaluator Wrappers  
    
def run_clarity_evaluator(root_run: Run, example: Example) -> dict:
    evaluator = get_clarity_evaluator()
    run_evaluator = evaluator.as_run_evaluator()

    max_retries = 12
    backoff_factor = 2  # Exponential backoff factor
    initial_delay = 10  # Initial delay in seconds

    for attempt in range(max_retries):
        try:
            results = run_evaluator.evaluate_run(root_run, example)
            return results
        except RateLimitError as e:
            if attempt < max_retries - 1:  # Don't delay on the last attempt
                delay = initial_delay * (backoff_factor ** attempt)
                print(f"RateLimitError encountered. Retrying in {delay} seconds...")
                time.sleep(delay)
            else:
                print("Max retries reached. Raising the RateLimitError.")
                raise e  # Re-raise the exception if max retries are reached
            
def run_engagement_evaluator(root_run: Run, example: Example) -> dict:
    evaluator = get_engagement_evaluator()
    run_evaluator = evaluator.as_run_evaluator()

    max_retries = 12
    backoff_factor = 2  # Exponential backoff factor
    initial_delay = 10  # Initial delay in seconds

    for attempt in range(max_retries):
        try:
            results = run_evaluator.evaluate_run(root_run, example)
            return results
        except RateLimitError as e:
            if attempt < max_retries - 1:  # Don't delay on the last attempt
                delay = initial_delay * (backoff_factor ** attempt)
                print(f"RateLimitError encountered. Retrying in {delay} seconds...")
                time.sleep(delay)
            else:
                print("Max retries reached. Raising the RateLimitError.")
                raise e  # Re-raise the exception if max retries are reached
    
def run_creativity_evaluator(root_run: Run, example: Example) -> dict:
    evaluator = get_creativity_evaluator()
    run_evaluator = evaluator.as_run_evaluator()

    max_retries = 12
    backoff_factor = 2  # Exponential backoff factor
    initial_delay = 10  # Initial delay in seconds

    for attempt in range(max_retries):
        try:
            results = run_evaluator.evaluate_run(root_run, example)
            return results
        except RateLimitError as e:
            if attempt < max_retries - 1:  # Don't delay on the last attempt
                delay = initial_delay * (backoff_factor ** attempt)
                print(f"RateLimitError encountered. Retrying in {delay} seconds...")
                time.sleep(delay)
            else:
                print("Max retries reached. Raising the RateLimitError.")
                raise e  # Re-raise the exception if max retries are reached    
    
### Evaluators Setup    
    
def get_clarity_evaluator():
    criterion = {
        "conciseness": "Is this response concise, delivering the necessary information in a clear and straightforward manner without unnecessary elaboration? It should prioritize brevity while ensuring that the answer remains complete and informative.",
        "coherence": "Is this response coherent, logically structured, and easy to follow? The information provided should flow naturally, with ideas and facts presented in a manner that makes sense as a whole, ensuring that the user can easily understand the response."
    }
    
    eval_llm = ChatOpenAI(temperature=0.0, model="gpt-4-turbo")
    
    evaluator = LangChainStringEvaluator(
        "score_string",
        config={
        "criteria": criterion,
        "llm": eval_llm
        },
        prepare_data = lambda run, example: {
                "prediction": run.outputs["quest"],
                "input": example.inputs
            },
        )
    
    return evaluator
    
def get_engagement_evaluator():
    criterion = {
        "relevance": "Is this response relevant, directly addressing the user's query without deviating into unrelated topics. It should focus on providing information or solutions that are directly applicable to the user's needs or context.",
        "engagement": "Is this response engaging, capturing the user's interest and maintaining their attention throughout the response. It should encourage further interaction or exploration."
    }
    
    eval_llm = ChatOpenAI(temperature=0.0, model="gpt-4-turbo")
    
    evaluator = LangChainStringEvaluator(
        "score_string",
        config={
        "criteria": criterion,
        "llm": eval_llm
        },
        prepare_data = lambda run, example: {
                "prediction": run.outputs["quest"],
                "input": example.inputs
            }
        )
    
    return evaluator
    
def get_creativity_evaluator():
    criterion = {
        "creativity": "Is this response creative, offering unique or innovative solutions, ideas, or perspectives that demonstrate originality and imagination. It should go beyond conventional or expected responses, providing a fresh and interesting take on the topic.",
        "narrative complexity" : "Is this response narratively complex, incorporating multiple elements such as characters, locations, and objectives in a way that creates a rich and engaging story. It should involve various plot points, twists, and interactions that enhance the overall narrative experience."
    }
    
    eval_llm = ChatOpenAI(temperature=0.0, model="gpt-4-turbo")
    
    evaluator = LangChainStringEvaluator(
        "score_string",
        config={
        "criteria": criterion,
        "llm": eval_llm
        },
        prepare_data = lambda run, example: {
                "prediction": run.outputs["quest"],
                "input": example.inputs
            }
        )
    
    return evaluator
    
if __name__ == "__main__":
    # Get the absolute path of the current script file
    script_path = os.path.abspath(__file__)

    # Extract the directory containing the script file
    script_directory = os.path.dirname(script_path)

    # Change the working directory 
    os.chdir(f"{script_directory}\\sample")
    
    run_evaluation_llama()
                        
Evalutaion Pipeline

This allowed for a data-driven comparison of model behavior under the same generation conditions, providing insight into which models best support different narrative goals.

📊 Key Findings

  • Mistral-Nemo demonstrated the highest scores for creativity and narrative complexity.
  • Gemma2 produced coherent and well-balanced content, with strong engagement.
  • Llama3.1 was the most efficient in terms of speed and token usage, making it ideal for fast iteration and local development.

These findings informed the model choice and prompt design strategies used in the final implementation.

Comparative Design

To understand the unique contributions of this system, it’s important to compare it with other notable implementations of procedural storytelling in games. While many existing systems excel at generating emergent narratives, few are capable of producing structured, coherent quests with both short- and long-term arcs. This project fills that gap by leveraging large language models to deliver authored-quality side quests with minimal human input.

🧬 RimWorld – Emergent Storytelling Through Simulation

Strengths:
RimWorld’s AI storyteller dynamically generates events based on player actions, creating emotionally rich, emergent narratives. The game excels in producing short-term, reactive stories that feel unique to each playthrough.

Limitations:
It lacks long-term narrative structure and quest-like objectives. Stories emerge from systems rather than crafted arcs.

Comparison:
While RimWorld emphasizes reactive storytelling, this system produces goal-oriented quests with a beginning, middle, and end. LLMs enable cohesive narratives that are both structured and reactive, offering a hybrid approach.

⚔️ Middle-earth: Shadow of Mordor – The Nemesis System

Strengths:
The Nemesis System creates personal rivalries with orc enemies, adapting their status, dialogue, and personality based on player interactions. It’s a powerful example of dynamic character-driven storytelling.

Limitations:
It’s confined to NPC evolution and rivalries, with limited scope beyond individual character arcs.

Comparison:
This system goes beyond NPC interactions, generating complete quests with objectives, rewards, and branching dialogue. It expands dynamic narrative generation to encompass world-building, not just character rivalries.

🛡️ Wildermyth – Character Arcs via Branching Events

Strengths:
Wildermyth delivers personalized character development through algorithmically chosen story events. Each hero evolves through choices and consequences shaped by their personality traits.

Limitations:
While character arcs are procedurally generated, the main storylines are pre-authored and curated. The system struggles to dynamically build overarching narratives.

Comparison:
This project enables both character-driven and plot-driven questlines generated dynamically from scratch. It’s not limited by fixed plot points or curated story maps, allowing for endless variation and replayability.

🎯 Summary of Advantages

Feature RimWorld Shadow of Mordor Wildermyth Campbell-Quest (This Project)
Emergent Systems
Dynamic Character Development
Structured Questlines Limited
Multi-model LLM Support
Dialogue & Asset Generation Partial

By synthesizing the strengths of these systems and addressing their limitations, the Campbell-Quest tool offers a more holistic and extensible approach to procedural narrative design. It bridges the gap between simulation-driven emergence and author-style quest structure—making it ideal for games seeking to balance systemic design with compelling storytelling.

Challenges & Lessons Learned

Building a fully functional procedural quest generation system that blends AI creativity with structured game development workflows presented several technical and conceptual challenges. Each obstacle provided valuable insights into the realities of working with large language models, game engines, and the evolving landscape of AI-assisted game design.

⚖️ Balancing Creativity vs. Control

One of the biggest challenges was finding the right balance between the creativity of LLMs and the structure required for functional quests. While models like Mistral-Nemo could generate imaginative and richly detailed stories, their output was often too verbose or inconsistent to integrate smoothly into game logic.

Lesson: AI creativity needs to be carefully constrained through prompt engineering, schema enforcement, and context-aware generation. Controlled freedom, rather than unrestricted creativity, produces the most useful content.

🧩 Modular Integration Complexity

Designing a system that integrates a Python-based backend with Unity’s C# environment introduced architectural complexity. Synchronizing data between the two runtimes, managing JSON serialization, and automating asset generation required careful planning.

Lesson: Modular design is essential. Isolating responsibilities between processors, generators, and editor interfaces significantly improved maintainability and made debugging far easier.

🐞 Managing LLM Output Reliability

LLMs are non-deterministic by nature. Outputs varied between generations, and minor changes in input often caused unexpected formatting issues or inconsistencies in structure.

Lesson: To maintain reliability, strict formatting schemas were enforced, and post-processing functions were introduced to sanitize and validate output before asset creation. Automated evaluation also helped flag problematic generations.

📉 Performance vs. Narrative Depth

High-performing models like Llama3.1 offered low latency and resource usage but scored lower in narrative complexity and creativity compared to Gemma2 or Mistral-Nemo. Longer and more nuanced quests often came at the cost of slower generation times.

Lesson: Model selection must be context-dependent. For real-time generation, speed is paramount. For pre-authored content pipelines, richer models can be used. A configurable model-selection layer was introduced as a result.

🧪 Evaluation Pipeline Refinement

Automating evaluation using LLMs posed a meta-challenge—using one AI to judge another. Early attempts led to vague or inconsistent scores, especially when evaluation criteria were not clearly defined.

Lesson: Evaluation must be grounded in well-structured, explainable metrics. By refining prompts for evaluators and structuring comparisons across models, reliable benchmarking was achieved.

🧠 Learning Curve with New Frameworks

Tools like LangChain, LangSmith, and Ollama were new at the time of development. Integrating them effectively required significant experimentation and adaptation, especially due to limited documentation and rapidly evolving APIs.

Lesson: Rapid prototyping and community-supported learning (e.g., Udemy courses, GitHub issues) were essential. Building small proof-of-concepts first helped accelerate deeper integration later.

🧰 Developer Usability

Designing a tool for developers meant creating an interface that was powerful but not overwhelming. Early versions of the editor were cluttered and difficult to navigate, leading to confusion and misuse.

Lesson: Usability should be treated as a core feature. The editor was ultimately restructured into dedicated windows for quests, dialogues, NPCs, and items—each with focused functionality and streamlined workflows.

🚀 Summary

This project highlighted that building AI-assisted tools is not just a technical challenge, but also a design problem. The key to success lies in:

  • Understanding where AI adds the most value
  • Designing guardrails for creativity
  • Engineering flexible, modular systems
  • Balancing innovation with usability

These lessons form the foundation for future enhancements, making the system more robust, adaptable, and valuable to real-world game development pipelines.

Future Plans & Commercial Potential

The current implementation of Campbell-Quest demonstrates that Large Language Models (LLMs) can significantly enhance narrative design workflows in RPG development. Looking forward, there are several directions in which the system can evolve—both in terms of technical sophistication and real-world application.

🌱 Future Development Plans

🧩 Multi-Part Questlines & Overarching Narratives

While the current system excels at generating one-shot side quests, future iterations aim to support complex, multi-stage questlines. This would involve persistent narrative threads, evolving character arcs, and dynamically branching consequences over time.

🧠 Persistent World State & Memory

Integrating memory-like mechanisms could allow the system to recall past player actions or previously generated quests, creating a sense of continuity and reactivity that mimics hand-authored narratives.

🛠 Improved Prompt Customization & Fine-Tuning

More advanced configuration tools will allow developers to create custom templates for specific genres or tones (e.g., noir mysteries, post-apocalyptic survival). Incorporating model fine-tuning or reinforcement learning from human feedback (RLHF) is also under consideration.

🧪 Expanded Evaluation Framework

The evaluation pipeline can be extended with player-testing modules and qualitative feedback tools. This would allow designers to measure how players perceive and interact with generated content in actual gameplay scenarios.

☁️ Cloud-Based Deployment

A web-based version of the system is being explored to support remote teams, lower local hardware requirements, and enable collaborative content iteration via a shared interface.

💼 Commercial Potential

🎮 Plugin for Unity Asset Store

By packaging the tool as a plug-and-play Unity asset, indie developers and small studios can instantly add quest generation capabilities to their workflow. Minimal setup and full in-editor support would make it accessible even to non-programmers.

🧑‍💻 Subscription-Based SaaS Model

A hosted, cloud-accessible version with pay-per-use or tiered subscription models could serve studios working in collaborative environments or those without local GPU support.

🧰 White-Label Licensing

The system could be adapted and branded for integration into third-party game engines or proprietary toolchains, especially for companies looking to boost their narrative systems without building from scratch.

📚 Educational & Prototyping Tool

Campbell-Quest can be used in academic settings to teach game design, storytelling, and AI-assisted workflows. It also has value as a rapid prototyping tool for narrative game jams or early-stage concept validation.

🚀 Vision

The broader vision is to democratize narrative design—giving teams of all sizes access to high-quality, dynamic content creation without needing massive writing departments. As LLMs continue to improve, Campbell-Quest has the potential to evolve into a full-stack procedural storytelling engine that not only generates content but adapts it in real time based on player behavior and design intent.

© Copyright Jasfiq Rahman 2024

Icons by Icons8 and Flaticon.