The Engineering behind this blog: SDD and Harness in a 100% AI workflow

Table of Contents
This article consolidates the entire process and knowledge acquired with the project as AI-Driven, from the choice of the base framework for the blog to the deploy on Github Pages.
Motivation and Context about the experiment #
After the last few weeks studying and understanding about Spec-Driven Development (SDD) and, especially, Harness Engineering. I decide to put it in practice with a tidy project - called too as brownfield - that was in drawer: my own blog.
Besides, the idea of this experiment was inspired by an Fábio Akita article (here), where it shows the process of using AI as an assistant in a real project, containing implementation details. And the blog project was incremented with context docs created from _Eugênio (Gnios) _with tips on how we can document context about the project for AI use (here).
In summary, this project validates the thesis about how it’s using AI as an assistant daily. Next, I bring how it was discuss ideas through brainstormings, build specs with a simple SDD custom flow and adjust instructions for agents through the feedback itself using Harness Engineering principles.
Hands-on: The source code resulting from this experiment is available in the luanmds/luanmds.github.io open repository.
Tooling, Stack & Custom Harness of the Project #
Before to deep dive in the methodology and metrics, let’s break down the base technologies and the custom harness created to this project.
Stack used #
Hugo Framework: For building the site’s body and theme Congo with custom colors.Github Pages and Github Actions: To host and deploy the site. Both are free from Github.- To know more check their documentation.
CodeRabbit: Synthetic AI Agent QA - with free tier! - to review the code generated in Pull Requests opened by other agents in the repository.- Configured through the
.coderabbit.yamlfile
- Configured through the
Playwright: Tool/Library used to automate functional tests on web pages. In the project, it has the role of validating front-end changes before proceeding with commit and merge.OpenCode: AI Agent via terminal, focused on coding and tool usage. Used as the main coding partner. The biggest advantage is the ability to use different LLM models and their skills.- It is orchestrated by the project’s harness to follow the SDD workflow.
- Being model-agnostic, it was possible to test with different LLMs, such as Claude Sonnet, Claude Haiku and GPT-5.3 Codex.
Harness Feedforward #
AGENTS.md: Principal file with all project summary and a onboard manual for agents to follow the configured workflow for the project..docs/: Folder with detailed context information related to the project. It contains all directives and each file is mapped in theAGENTS.md.specs/: Folder with developed and implemented specifications. Each specification has atasks.mdfile where everything that needs to be done so that we consider the implementation as Done is listed.
Follow a print about the project harness organization:

The Spec-Driven Methodology and Agents Orchestration #
Something I learned from studying Harness is that understanding the lifecycle better is fundamental for initial alignment and, especially, what can or cannot be done.
In Spec-Driven Development, the specification is the pillar that guides the entire content creation process. Starting from a brainstorming, going through a specification step, decomposition into tasks and finally the implementation. Here I realized that the specification becomes the main artifact of this process, different from traditional development where the code is the main artifact.
The Delivery Lifecycle #
The workflow was divided into clear and interdependent stages:
1. Planning #
- Brainstorming: Using the
brainstorming-skill(from Superpowers Skills) to validate architectural decisions and stack before the first commit. - Specification (Spec): Creating Markdown files in the
specs/folder (operating in PLAN mode) detailing the expected behavior and technical constraints. - User Validation: Mandatory step (Verify and Validate) where the specification is reviewed, refined and approved by the “Product Owner” (the user itself) before any code is generated.
2. Execution #
- Decomposition into Tasks: Translation of the approved Spec into a
tasks.mdfile with atomic, parallelizable items and a well-defined Definition of Done (DoD). - Implementation: Phase where the AI agents take on the execution of the tasks and write the code under the strict supervision of the Harness.
- Playwright Tests: Visual and functional validation of the delivery using the
playwright-skillrunning locally (via Docker) before proceeding to commit.
3. Quality and Deploy #
- Pull Request and CodeRabbit: Packaging the changes in a PR, triggering the automated code review of CodeRabbit AI.
- Continuous Deploy: Automated publication via GitHub Actions sending the validated version to GitHub Pages.
Use Cases Highlighted #
During experiment time, some scenarios are highlighted, in the practice, showing the potential and flexibility of this approach.
1. The Giant Session: Building the Base (PaperMod Theme + Multilingual) #
This was the most dense session of the entire project. With 285 minutes of real active time and 671 messages exchanged, the agent was responsible for generating the complete base of the PaperMod theme on the blog and the entire language switching system. The most impressive numbers: ~73.9 million tokens were consumed in this single session, resulting in a net balance of +904 lines of code added. Of this total of tokens, an incredible 97% were read directly from the already established context cache.
CodeRabbit’s role (Synthetic QA)
Despite the massive code generation, autonomous agents can make mistakes in structural details. In the Pull Request #3 that implemented these changes, CodeRabbit identified three critical review points:
- It noticed that the build via Docker in the pipeline was running as
root, suggesting the--user $(id -u):$(id -g)flag to avoid artifacts with permission issues. - It warned that the theme submodule was tied to an unstable development commit and recommended fixing it to the official release tag.
- It demanded the refactoring of the front-end internationalization logic: instead of using hardcoded ifs in the templates (
{{ if eq .Lang "pt" }}), it advised registering these strings in the correct language files and usingi18nkeys.
2. Spec 007: PaperMod to Congo theme migration with Parallel Subagents #
Another notable case was the specification Spec 007 (PaperMod to Congo theme migration). The work lasted 71 minutes, with a concentrated change of +253 lines and removal of 229 lines, and a total consumption of ~10.6 million tokens.
Instead of a linear execution, I applied the Subagent-Driven Development pattern: the orchestrator agent triggered 8 parallel sub-agents. Each sub-agent took on an independent task (colors, typography, menu structure), all operating simultaneously on the same source of truth (the Spec). This allowed for a complex migration in record time, with guaranteed architectural consistency.
CodeRabbit’s role (Synthetic QA)
In the Pull Request #11 focused on the color palette (Crimson Circuitry), CodeRabbit acted by demanding consistency in the adopted patterns:
- It located a subtle technical debt: 7 color values in the pure numeric format of
rgba()forgotten incustom.css. It demanded that they be replaced by the correct invocation of our design system’s CSS variables (--color-primary-*). - In addition to the code review, it read the Harness rules and actively reviewed the documented completion of the task, demanding that the checkboxes of the Spec itself (
.mdfiles) be updated to “DONE” (- [x]).
Analysis about OpenCode metrics #
The data below, were extracted directly from OpenCode database (via opencode.db) covering the entire project period. In total, there were 25 sessions (~12.3 hours of real active time), which modified 96 files and generated a net balance of +1,891 lines of code (+2,765 added, -874 removed).
Follow the details about processing metrics:
| Metric | Quantity | % of total |
|---|---|---|
| Total Tokens Counted | ~141.7 million | 100% |
| Cache Reads (reused) | ~134.2 million | 94.7% |
| Cache Writes (stored) | ~4.35 million | 3.1% |
| Output (model-generated) | ~548 thousand | 0.4% |
| Reasoning (hidden reasoning) | ~308 thousand | 0.2% |
| Genuinely new input | ~2.26 million | 1.6% |
The Cache Magic and Zero Cost
The most revealing data of this experiment was the 94.7% of tokens being read from the cache (Context Efficiency). It is interesting to note that the agent keeps the complete context “hot” (files, documentation, history) with each message sent, but does not need to reprocess what is already cached.
This explains how it was possible to consume 141.7 million tokens without additional cost using the GitHub Copilot subscription. The actual inference consumption (new input + reasoning + output) was only ~3.1 million tokens.
Main Sessions and Productivity
Follow the table with the distribution of effort and code balance in the main project sessions:
| Section (Focus) | Active Time | Lines Balance | Total Tokens |
|---|---|---|---|
| Project Base (HuGo + PaperMod + Multilingual) | 285 min | +904 / -111 | 73.9M |
Articles Migration (files .doc from Google Drive) | 166 min | +0 / -0 | 23.2M |
| Update About Page | 73 min | +36 / -38 | 8.8M |
| Theme Congo Migration (Spec 007) | 71 min | +253 / -229 | 10.6M |
| Final adjustments + update specs | 39 min | +447 / -214 | 5.9M |
| Responsiveness/favicon/Tags | 33 min | +25 / -16 | 5.7M |
| README.md | 32 min | +144 / -16 | 7.1M |
| Context Collecting | 16 min | +804 / -198 | 3.6M |
| Configure code automation | 14 min | +101 / -2 | 1.5M |
Note: The “Articles Migration” session took 166 minutes and processed 23 million tokens without modifying any lines of code in the final repository. This happened because the content and images were processed outside of version control (batch raw content generation).
Summary about activities by Session:
- Project Base: Session is more dense than others, the agent set up the complete base of the blog with Hugo, the initial PaperMod theme and the internationalization infrastructure (PT/EN) with translation key.
- Articles Migration: Session long focused in process text from draft (via Google Drive/Medium) and format them to markdown with front matter adequate.
- Update About Page: Creation and update of specific content for the About page, such as profile picture, history and punctual design adjustments.
- Theme Congo Migration (Spec 007): Planning and Execution about Spec 007, orchestrating 8 parallel sub-agents to migrate the colors and layout of the old theme to the Congo, adjusting typography and menus simultaneously.
- Final adjustments + update specs: Revision of templates, standardization of the format of the artifacts in the
specs/folder and refinements before the final deploy. - Responsiveness/favicon/Tags: Fine adjustments of UI/UX, making navigation more responsive, fixing the favicon and adjusting the display of tags in the posts.
- README.md: Generation of the repository public file, extracting context directly from the internal documentation after the project was almost finished.
- Context Collecting: Session dedicated to generate the base documentation in the
.docs/folder, mapping stack, architecture and establishingAGENTS.mdfrom the current state. - Configure code automation: Initial configuration of linting, CI/CD and integration of CodeRabbit (synthetic QA for Pull Requests).
LLM Models Utilized in main sessions #
| Model | Sessions | Total Processed | Cache Read | Cache Write | Real Processed |
|---|---|---|---|---|---|
gpt-5.3-codex | 5 | ~90,2M | ~87,5M (96%) | — | ~2,7M |
claude-sonnet-4.6 | 20 | ~47,9M | ~43,4M (90%) | ~4,1M | ~387K |
claude-haiku-4.5 | 1 | ~4,9M | ~4,4M (91%) | ~368K | ~36K |
Note: Real processed = new input + output + reasoning — what the model actually inferred.
Conclusion #
After putting version 1.0 of the project into production (https://luanmds.github.io), I listed some conclusions and lessons learned along the process:
Mindset change as a Developer: The developer becomes a “Context Designer” and a “Agent Orchestrator”. I don’t think this is a bad thing, but it requires a new mindset to interact with AIs to extract the maximum benefit. Still, it’s necessary to understand what the AI is generating and have solid knowledge about Software Architecture and Design to ensure that the software maintains an acceptable level of quality.
Documentation is the key for a good experience: As a GenAI continues to evolve, the capacity of interaction with it becomes a critical skill. The quality of documentation directly influences the AI’s ability to understand the project context and generate relevant and accurate responses.
- The importance of
AGENTS.mdas a central documentation artifact, that is, a guide that the Agent always carries with it when interacting with it. - The SDD, independent of using a framework or a specific tool (as OpenCode), it show the best way to document a project. This is because it’s based in the concept of “documentation of what needs to be done” instead of “documentation of what was done”.
- The importance of
The Harness process is the most important in the process of using AI as an assistant: Without it, the AI has difficulty understanding the project context and generating relevant and precise responses. Therefore, it is important to always review the input used by the AI (Feedforward) and what it returns as a response (Feedback) so that it can refine the project context.
That’s all folks… #
Did you enjoy this report or have any questions about how I applied these concepts in practice? Leave a comment on the repository or reach out to me on social media. Your feedback is always welcome!
References #
SDD, Harness Engineering & Context Engineering #
- Spec-Driven Development: AI Assisted Coding Explained
- Understanding Spec-Driven-Development: Kiro, spec-kit, and Tessl — Martin Fowler
- The ONLY guide you’ll need for GitHub Spec Kit
AI-Assisted Development #
- AI-Assisted Coding Tutorial – OpenClaw, GitHub Copilot, Claude Code, CodeRabbit, Gemini CLI
- Do Zero à Pós-Produção em 1 Semana — Fábio Akita (in pt-br)
- Antes de qualquer ferramenta: como documentar seu projeto para a IA — Gnios (in pt-br)
- How do thinking and reasoning models work?
- Large Language Models Survey — arxiv.org