[{"content":"","date":null,"permalink":"https://luanmds.com/en/categories/","section":"Categories","summary":"","title":"Categories"},{"content":"Software engineering focused on backend, distributed architecture, and practical AI for developers.\n","date":null,"permalink":"https://luanmds.com/en/","section":"Luan Mello","summary":"","title":"Luan Mello"},{"content":"","date":null,"permalink":"https://luanmds.com/en/posts/","section":"Posts","summary":"","title":"Posts"},{"content":"","date":null,"permalink":"https://luanmds.com/en/categories/software-engineering/","section":"Categories","summary":"","title":"Software-Engineering"},{"content":"This article consolidates the entire process and knowledge acquired with the project as AI-Driven, from the choice of the base framework for the blog to the deploy on Github Pages.\nMotivation and Context about the experiment #After the last few weeks studying and understanding about Spec-Driven Development (SDD) and, especially, Harness Engineering. I decide to put it in practice with a tidy project - called too as brownfield - that was in drawer: my own blog.\nBesides, the idea of this experiment was inspired by an Fábio Akita article (here), where it shows the process of using AI as an assistant in a real project, containing implementation details. And the blog project was incremented with context docs created from _Eugênio (Gnios) _with tips on how we can document context about the project for AI use (here).\nIn summary, this project validates the thesis about how it\u0026rsquo;s using AI as an assistant daily. Next, I bring how it was discuss ideas through brainstormings, build specs with a simple SDD custom flow and adjust instructions for agents through the feedback itself using Harness Engineering principles.\nHands-on: The source code resulting from this experiment is available in the luanmds/luanmds.github.io open repository.\nTooling, Stack \u0026amp; Custom Harness of the Project #Before to deep dive in the methodology and metrics, let\u0026rsquo;s break down the base technologies and the custom harness created to this project.\nStack used # Hugo Framework: For building the site\u0026rsquo;s body and theme Congo with custom colors. Github Pages and Github Actions: To host and deploy the site. Both are free from Github. To know more check their documentation. CodeRabbit: Synthetic AI Agent QA - with free tier! - to review the code generated in Pull Requests opened by other agents in the repository. Configured through the .coderabbit.yaml file Playwright: Tool/Library used to automate functional tests on web pages. In the project, it has the role of validating front-end changes before proceeding with commit and merge. OpenCode: AI Agent via terminal, focused on coding and tool usage. Used as the main coding partner. The biggest advantage is the ability to use different LLM models and their skills. It is orchestrated by the project\u0026rsquo;s harness to follow the SDD workflow. Being model-agnostic, it was possible to test with different LLMs, such as Claude Sonnet, Claude Haiku and GPT-5.3 Codex. Harness Feedforward # AGENTS.md: Principal file with all project summary and a onboard manual for agents to follow the configured workflow for the project. .docs/: Folder with detailed context information related to the project. It contains all directives and each file is mapped in the AGENTS.md. specs/: Folder with developed and implemented specifications. Each specification has a tasks.md file where everything that needs to be done so that we consider the implementation as Done is listed. Follow a print about the project harness organization:\nThe Spec-Driven Methodology and Agents Orchestration #Something I learned from studying Harness is that understanding the lifecycle better is fundamental for initial alignment and, especially, what can or cannot be done.\nIn Spec-Driven Development, the specification is the pillar that guides the entire content creation process. Starting from a brainstorming, going through a specification step, decomposition into tasks and finally the implementation. Here I realized that the specification becomes the main artifact of this process, different from traditional development where the code is the main artifact.\nThe Delivery Lifecycle #The workflow was divided into clear and interdependent stages:\n1. Planning # Brainstorming: Using the brainstorming-skill (from Superpowers Skills) to validate architectural decisions and stack before the first commit. Specification (Spec): Creating Markdown files in the specs/ folder (operating in PLAN mode) detailing the expected behavior and technical constraints. User Validation: Mandatory step (Verify and Validate) where the specification is reviewed, refined and approved by the \u0026ldquo;Product Owner\u0026rdquo; (the user itself) before any code is generated. 2. Execution # Decomposition into Tasks: Translation of the approved Spec into a tasks.md file with atomic, parallelizable items and a well-defined Definition of Done (DoD). Implementation: Phase where the AI agents take on the execution of the tasks and write the code under the strict supervision of the Harness. Playwright Tests: Visual and functional validation of the delivery using the playwright-skill running locally (via Docker) before proceeding to commit. 3. Quality and Deploy # Pull Request and CodeRabbit: Packaging the changes in a PR, triggering the automated code review of CodeRabbit AI. Continuous Deploy: Automated publication via GitHub Actions sending the validated version to GitHub Pages. flowchart TD %% Phase 1 subgraph Planning direction LR B[Brainstorming Skill] --\u003e S[Spec Creation] S --\u003e V{User Validation} V -- Refine --\u003e S end %% Phase 2 subgraph Execution direction LR T[Task Decomposition] --\u003e I[Implementation] I --\u003e P[Playwright Tests] P -- Fail --\u003e I end %% Phase 3 subgraph QD[Quality \u0026 Deploy] direction LR PR[Create Pull Request] --\u003e CR[CodeRabbit AI Review] CR -- Issues Resolved --\u003e D[Deploy to GH Pages] end Planning -- Approved --\u003e Execution Execution -- Pass --\u003e QD Use Cases Highlighted #During experiment time, some scenarios are highlighted, in the practice, showing the potential and flexibility of this approach.\n1. The Giant Session: Building the Base (PaperMod Theme + Multilingual) #This was the most dense session of the entire project. With 285 minutes of real active time and 671 messages exchanged, the agent was responsible for generating the complete base of the PaperMod theme on the blog and the entire language switching system. The most impressive numbers: ~73.9 million tokens were consumed in this single session, resulting in a net balance of +904 lines of code added. Of this total of tokens, an incredible 97% were read directly from the already established context cache.\nCodeRabbit\u0026rsquo;s role (Synthetic QA)\nDespite the massive code generation, autonomous agents can make mistakes in structural details. In the Pull Request #3 that implemented these changes, CodeRabbit identified three critical review points:\nIt noticed that the build via Docker in the pipeline was running as root, suggesting the --user $(id -u):$(id -g) flag to avoid artifacts with permission issues. It warned that the theme submodule was tied to an unstable development commit and recommended fixing it to the official release tag. It demanded the refactoring of the front-end internationalization logic: instead of using hardcoded ifs in the templates ({{ if eq .Lang \u0026quot;pt\u0026quot; }}), it advised registering these strings in the correct language files and using i18n keys. 2. Spec 007: PaperMod to Congo theme migration with Parallel Subagents #Another notable case was the specification Spec 007 (PaperMod to Congo theme migration). The work lasted 71 minutes, with a concentrated change of +253 lines and removal of 229 lines, and a total consumption of ~10.6 million tokens.\nInstead of a linear execution, I applied the Subagent-Driven Development pattern: the orchestrator agent triggered 8 parallel sub-agents. Each sub-agent took on an independent task (colors, typography, menu structure), all operating simultaneously on the same source of truth (the Spec). This allowed for a complex migration in record time, with guaranteed architectural consistency.\nCodeRabbit\u0026rsquo;s role (Synthetic QA)\nIn the Pull Request #11 focused on the color palette (Crimson Circuitry), CodeRabbit acted by demanding consistency in the adopted patterns:\nIt located a subtle technical debt: 7 color values in the pure numeric format of rgba() forgotten in custom.css. It demanded that they be replaced by the correct invocation of our design system\u0026rsquo;s CSS variables (--color-primary-*). In addition to the code review, it read the Harness rules and actively reviewed the documented completion of the task, demanding that the checkboxes of the Spec itself (.md files) be updated to \u0026ldquo;DONE\u0026rdquo; (- [x]). Analysis about OpenCode metrics #The data below, were extracted directly from OpenCode database (via opencode.db) covering the entire project period. In total, there were 25 sessions (~12.3 hours of real active time), which modified 96 files and generated a net balance of +1,891 lines of code (+2,765 added, -874 removed).\nFollow the details about processing metrics:\nMetric Quantity % of total Total Tokens Counted ~141.7 million 100% Cache Reads (reused) ~134.2 million 94.7% Cache Writes (stored) ~4.35 million 3.1% Output (model-generated) ~548 thousand 0.4% Reasoning (hidden reasoning) ~308 thousand 0.2% Genuinely new input ~2.26 million 1.6% The Cache Magic and Zero Cost\nThe most revealing data of this experiment was the 94.7% of tokens being read from the cache (Context Efficiency). It is interesting to note that the agent keeps the complete context \u0026ldquo;hot\u0026rdquo; (files, documentation, history) with each message sent, but does not need to reprocess what is already cached.\nThis explains how it was possible to consume 141.7 million tokens without additional cost using the GitHub Copilot subscription. The actual inference consumption (new input + reasoning + output) was only ~3.1 million tokens.\nMain Sessions and Productivity\nFollow the table with the distribution of effort and code balance in the main project sessions:\nSection (Focus) Active Time Lines Balance Total Tokens Project Base (HuGo + PaperMod + Multilingual) 285 min +904 / -111 73.9M Articles Migration (files .doc from Google Drive) 166 min +0 / -0 23.2M Update About Page 73 min +36 / -38 8.8M Theme Congo Migration (Spec 007) 71 min +253 / -229 10.6M Final adjustments + update specs 39 min +447 / -214 5.9M Responsiveness/favicon/Tags 33 min +25 / -16 5.7M README.md 32 min +144 / -16 7.1M Context Collecting 16 min +804 / -198 3.6M Configure code automation 14 min +101 / -2 1.5M Note: The \u0026ldquo;Articles Migration\u0026rdquo; session took 166 minutes and processed 23 million tokens without modifying any lines of code in the final repository. This happened because the content and images were processed outside of version control (batch raw content generation).\nSummary about activities by Session:\nProject Base: Session is more dense than others, the agent set up the complete base of the blog with Hugo, the initial PaperMod theme and the internationalization infrastructure (PT/EN) with translation key. Articles Migration: Session long focused in process text from draft (via Google Drive/Medium) and format them to markdown with front matter adequate. Update About Page: Creation and update of specific content for the About page, such as profile picture, history and punctual design adjustments. Theme Congo Migration (Spec 007): Planning and Execution about Spec 007, orchestrating 8 parallel sub-agents to migrate the colors and layout of the old theme to the Congo, adjusting typography and menus simultaneously. Final adjustments + update specs: Revision of templates, standardization of the format of the artifacts in the specs/ folder and refinements before the final deploy. Responsiveness/favicon/Tags: Fine adjustments of UI/UX, making navigation more responsive, fixing the favicon and adjusting the display of tags in the posts. README.md: Generation of the repository public file, extracting context directly from the internal documentation after the project was almost finished. Context Collecting: Session dedicated to generate the base documentation in the .docs/ folder, mapping stack, architecture and establishing AGENTS.md from the current state. Configure code automation: Initial configuration of linting, CI/CD and integration of CodeRabbit (synthetic QA for Pull Requests). LLM Models Utilized in main sessions # Model Sessions Total Processed Cache Read Cache Write Real Processed gpt-5.3-codex 5 ~90,2M ~87,5M (96%) — ~2,7M claude-sonnet-4.6 20 ~47,9M ~43,4M (90%) ~4,1M ~387K claude-haiku-4.5 1 ~4,9M ~4,4M (91%) ~368K ~36K Note: Real processed = new input + output + reasoning — what the model actually inferred.\nConclusion #After putting version 1.0 of the project into production (https://luanmds.github.io), I listed some conclusions and lessons learned along the process:\nMindset change as a Developer: The developer becomes a \u0026ldquo;Context Designer\u0026rdquo; and a \u0026ldquo;Agent Orchestrator\u0026rdquo;. I don\u0026rsquo;t think this is a bad thing, but it requires a new mindset to interact with AIs to extract the maximum benefit. Still, it\u0026rsquo;s necessary to understand what the AI is generating and have solid knowledge about Software Architecture and Design to ensure that the software maintains an acceptable level of quality.\nDocumentation is the key for a good experience: As a GenAI continues to evolve, the capacity of interaction with it becomes a critical skill. The quality of documentation directly influences the AI\u0026rsquo;s ability to understand the project context and generate relevant and accurate responses.\nThe importance of AGENTS.md as a central documentation artifact, that is, a guide that the Agent always carries with it when interacting with it. The SDD, independent of using a framework or a specific tool (as OpenCode), it show the best way to document a project. This is because it\u0026rsquo;s based in the concept of \u0026ldquo;documentation of what needs to be done\u0026rdquo; instead of \u0026ldquo;documentation of what was done\u0026rdquo;. The Harness process is the most important in the process of using AI as an assistant: Without it, the AI has difficulty understanding the project context and generating relevant and precise responses. Therefore, it is important to always review the input used by the AI (Feedforward) and what it returns as a response (Feedback) so that it can refine the project context.\nThat\u0026rsquo;s all folks… #Did you enjoy this report or have any questions about how I applied these concepts in practice? Leave a comment on the repository or reach out to me on social media. Your feedback is always welcome!\nReferences #SDD, Harness Engineering \u0026amp; Context Engineering # Spec-Driven Development: AI Assisted Coding Explained Understanding Spec-Driven-Development: Kiro, spec-kit, and Tessl — Martin Fowler The ONLY guide you\u0026rsquo;ll need for GitHub Spec Kit AI-Assisted Development # AI-Assisted Coding Tutorial – OpenClaw, GitHub Copilot, Claude Code, CodeRabbit, Gemini CLI Do Zero à Pós-Produção em 1 Semana — Fábio Akita (in pt-br) Antes de qualquer ferramenta: como documentar seu projeto para a IA — Gnios (in pt-br) How do thinking and reasoning models work? Large Language Models Survey — arxiv.org ","date":"May 9, 2026","permalink":"https://luanmds.com/en/posts/engineering-blog-sdd-harness/","section":"Posts","summary":"A technical report on how this blog was built using AI-assisted development, validating the limits of SDD and Harness Engineering.","title":"The Engineering behind this blog: SDD and Harness in a 100% AI workflow"},{"content":"In this article, we will compile the concepts, definitions and general best practices about Integration Tests. In depth, we will explore the scenarios where integration tests fit best, how to implement them to maintain ease, and when not to use mocks in your scenarios. Understanding this will empower you to make decisions regarding automated testing and to estimate the required effort before writing a single line of code with any programming language or framework.\nHands-on: To consolidate this study, I create a repository tests-dotnet-best-practices. There, you find all these article concepts!\nEnjoy the read!\nTest Pyramid and where the integration is in # The famous test pyramid organizes the types of tests a software can perform, always from cheaper (Unit Tests at the base) to more expensive (E2E in the top). The integration tests are in the middle, above the unit tests, typically making up 20% to 30% of your test suite.\nIntegration tests occupy an intermediate layer, being responsible to verify the connection, interaction and contracts between components, modules, or services. Additionally, they expose system-level problems and ensure high coverage as an important feedback before every deployment.\nIt is worth noting that integration tests do not guarantee 100% coverage and should be used in conjunction with other tests, like unit tests. However, they remove that nagging doubt by answering questions such as: \u0026ldquo;If I update this module, will it break the dependent modules?\u0026rdquo; or \u0026ldquo;How can I ensure this flow keeps working when component X is unavailable?\u0026rdquo;\nApproaches to test integrations #To choose the right approach, it is important to understand how your application components are coupling and what the complexity level is involved in isolating failures within the flows that connect them.\nTip: Start with critical flows that you must ensure that it works in many situations.\nFollow the approach types:\nBig Bang: All modules are integrated simultaneously and tested as a whole. Although it seems fast, this approach is a long shot; debugging becomes extremely difficult when an error occurs, since the root cause could be anywhere in the execution chain. Incremental: These are approaches to test a module group of an application. Follow them: - Top-Down: The test starts from the superior layers (as Controllers or APIs) and moves downward toward the infrastructure layers. Stubs are used to replace the lower-level modules that have not yet been integrated. - Bottom-Up: The test starts from low-level modules (as Repositories and database Drivers) and up to business logic layers. It’s excellent to validate the data persistence early in development lifecycle. - Sandwich (Hybrid): Joins the advantages about Top-Down and Bottom-Up approaches. It’s test the core of system while peripheral layers are gradually integrated. Test Scope #For a more modern perspective, particularly in microservices and distributed systems, it is essential to differentiate the scope of the tests:\nNarrow Integration Tests: Focus solely on the communication between a service and one specific external component (e.g., your repository and a SQL Server). The rest of the system is replaced by test doubles. These are faster to run and easier to maintain. Broad Integration Tests: Validate the integration of all live components that make up a feature, crossing multiple layers and services. They require a more complex environment setup, but guarantee that the complete end-to-end flow works perfectly Ideals Scenarios to Use them #Not every feature requires an integration test. The true value of these tests lies in validating flows that cross the boundaries of your application. Indispensable scenarios include:\n1. Communication with Persistence and Cache Infrastructure #Scenarios where the application connects with a database (e.g., SQL or NoSQL) and cache systems (e.g., Redis). This ensures that queries, ORM mappings (like Entity Framework), migrations, and integrity constraints (foreign keys, indexes) execute as expected on the real database engine.\n2. Integration with Messaging Services #Scenarios involving the publish and consume of events in Message Brokers (e.g, RabbitMQ, Azure Service Bus, Kafka). This validates correct object serialization, proper queue/topic configuration, and ensures the application reacts correctly to connection failures or retry policies.\n3. Consuming External APIs and Web Services #When the system relies of third-party APIs (e.g., payment gateways, geolocation services, etc.). This allows if the external API contract is still respected and how the system handles with many HTTP status codes (like 400, 401, 500) and timeouts. Contract Tests might overlap here.\n4. Critical Business Flows #Core processes that absolutely can’t fail and cross many services or domains in the same application. For example, a checkout process involving components to inventory, payments and logistics. The unit tests typically mock the dependencies in these flows, which can obscure logic bugs that only surface during the real chain of calls.\nBest Practices to follow #To implement integration tests requires more than just writing code; requires an environment strategy and a clear understanding of where system risks lie. Below, are some best practices – which I consider always mandatory – to ensure a best implementation and tests maintenance:\nIdentifying components and the SUT #Before you touch the keyboard, the fundamental first step is to map and to diagram all the components of system. Include the SUT (System Under Test). In the integration context, the SUT generally is your API or a specific service that connects with the “external world”. This provides a clear view of your infrastructure boundaries and external dependencies.\nFocus on areas where the code performs I/O operations, such as databases, third-party APIs, microservices, and message queues. Pay special attention to highly coupled components, as they are undoubtedly the most fragile! If your architecture uses Adapters or Repository patterns, these are your primary targets to verify accurate data translation between your domain and external systems.\nTechniques like BDD (Behaviour-Driven Design) and Event Storming, can make this mapping process easier. To measure coupling, you can use Afferent and Efferent Coupling metrics.\nCreating a Production-like environment #The usefulness of an integration test is directly tied to your fidelity. The best modern practice is utilizing containerization (via Docker) to simulate real infrastructure. This enables you to run tests against database engines and message brokers that are identical to those in production, drastically increasing your chances of catching real configuration or behavioral errors.\nFurthermore, integrate this environment into your CI pipeline. While not every single commit requires a full suite run, having this automated safety net prevents regressions from reaching the final environment.\nTest Doubles, why don’t we use them in critical flows #The Test Doubles are powerful tools for isolation and performance. They are perfect for scenarios involving slow, inaccessible dependencies or when simulating hard-to-reproduce failures like network timeouts.\nMocks: To validate specific behaviours and interactions. Stubs: To give simple, canned responses. Fakes: To functional implementations, but simplify (like SQLite in memory). However, be careful: avoid using doubles in critical flows. A “fake” database might have different complex transactions or case-sensitivity rules than your real database, which can mask fatal bugs. For your core business logic, high-fidelity validation against resources is non-negotiable for a safe deployment.\nContract Tests through integration #In distributed ecosystems and microservices, ensuring that data consumers and providers speak the exact same language is vital. Contract Testing solves the issue of integration tests becoming too slow or flaky due to reliance on multiple live services.\nThis type of test focuses purely on message structures and communication protocols, making it lighter and faster since it doesn\u0026rsquo;t require the entire ecosystem to be up and running. It can be executed as a dedicated subset of your integration strategy using tools like Pact to efficiently validate API and messaging compatibility\nThis is all Folks… #If you\u0026rsquo;ve made it this far, you\u0026rsquo;ve realized that writing integration tests isn\u0026rsquo;t just about increasing code coverage percentage. By understanding your application\u0026rsquo;s boundaries, choosing the right approach for each scope, and applying modern tools to simulate the real world (like real containers instead of just mocks), you drastically elevate your system\u0026rsquo;s resilience.\nTheory is the map, but practice is the journey. Be sure to check out the tests-dotnet-best-practices repository to see how to apply these concepts in real code using .NET 9, Testcontainers, and Aspire.\nIf you found this content helpful, send me your feedback and share it with your team. What are the biggest challenges you face when creating integration tests in your projects? Let me know in the comments, let’s exchange experiences!\nReferences # https://martinfowler.com/bliki/IntegrationTest.html Integration Testing - Engineering Fundamentals Playbook Integration Testing - Software Engineering - GeeksforGeeks ASP.NET Core Integration Testing Tutorial https://learn.microsoft.com/en-us/aspnet/core/test/integration-tests Choosing a testing strategy - EF Core | Microsoft Learn https://coupling.dev/posts/related-topics/afferent-and-efferent-coupling/ https://behave.readthedocs.io/en/latest/philosophy/ ","date":"Mar 28, 2026","permalink":"https://luanmds.com/en/posts/integration-tests-overview/","section":"Posts","summary":"Concepts, definitions and best practices in Integration Tests — when to use them, how to structure your test suite, and when to avoid mocks.","title":"Integration Tests: An Overview and Best Practices"},{"content":"","date":null,"permalink":"https://luanmds.com/en/categories/testing/","section":"Categories","summary":"","title":"Testing"},{"content":"In this article, I will use the Saga Pattern to demonstrate how to deal with transactions in scenarios and do rollbacks about these transactions in microservices like compiled way with plus a sample approaching principal topics.\nThe point here is to use this article for a consult in the future, I await your feedback. Enjoy your reading!\nSaga’s Idea #The big idea to use Sagas started from a specific problem in systems using longer or sequential transactions (known as Long-Lived Transactions or LLT ) with database atomic operations and interactions with other systems. The standard shape of LLT is to distribute transactions, in other words, to make operations in many services or databases that manipulate related data.\nThis pattern matured with time and acquired new characteristics that assist distributed systems’ modern architectures, databases like NoSQL and message brokers that deal with data consistency and find the balance using the CAP Theorem.\nBefore demonstrating this pattern, go to understand the distributed transactions problem!\nDistributed Transactions and ACID Issues #Nowadays, we find microservice architectures in modern systems using distributed transactions and the ACID model. There are limitations in this approach that can create problems when synchronising data and undo operations for unavailability or cancellation of a process.\nThe first problem shows itself precisely in dependency on a microservice when it makes an operation that depends on another operation from another microservice that sends a request for the first microservice. The second problem appears in a principal characteristic of microservices: unique architecture, when they are formed using patterns as database-per-microservice model, where there is the liberty of selecting any database type and how data persist, besides communicating in your own way (e.g. by messaging or HTTP protocol). This complicates, even more, the ACID model implementation because of the extensive management for each made transaction and, if a transaction fails, notifies others transactions about the problem and they take action.\nThere are ways to keep data consistent with distributed transactions beyond Saga Pattern - I show you soon, I promise - like the protocols inter-process, two-phase commit, Try-Confirm-Cancel (TCC) and others. This article focuses on Saga since the biggest used and appropriate in most modern solutions and software architectures.\nThe Saga Pattern #A saga is a flow represented by a transaction sequence following a specific synchronous and/or asynchronous order. These transactions aren’t distributed and each transaction is local. After a transaction is confirmed to finish, it calls the next transaction so on until the end of Saga. Be possible to have a saga identifier for each transaction in the chain and get a complete tracker flow.\nOrchestration and Choreography #Saga can be implemented using two actual models:\nWith an Orchestrator, known as Saga Execution Coordinator (SEC), controls each saga transaction and is oriented to the next transaction or undo commits from old transactions when some fail (I talk about this as follows). As a maestro and your orchestra, know always the next step and who has to do it.\nIn this simple sample, a Broker consolidates, manages and provides all the communication between the three services. Orchestration includes a strong dependency on the centraliser as a bad side. In this case, if the broker is offline, all services are offline too.\nOr the Choreography model, like a Flash Mob (I am old!), each participant - or a microservice, in our case - knows how and when to make your step wait or not for the step of other participants. Backing to our reality, microservices know the saga, your steps and when to start your local transaction.\nThe example below shows a flow binding four systems (A, B, C and D) and three steps to do. In each step, the systems act on a received action, in this case, we use a messaging broker and HTTP protocol. The biggest drawback is the administration and monitoring of each system to guarantee end-to-end operations.\nChallenges to implement #Independent of the implemented approach and any other pattern or technology, there will be barriers to be considered in Trade-Off balance.\nFollow a list with some possible problems:\nObservability implementation better-detailed way for each step in the flow of Saga; Will have fail points. And to understand how to revert they each application is essential (I detail with an example to the next topic); Debugging and tracking an entity or aggregate can be difficult because exists many of those. So including a Saga Identifier and coordination between the apps is extremely necessary; High probability of using Events (see Event-Driven Architecture) in Saga flow. Improving complexity even more and obligate to improve anything sent and received control in queues and topics. Exemplify with Use Case #To exemplify how to use Saga, we’ve a dummy use case compound with three microservices in a Marketing Area. Below is a resume of how each service operates:\nFirst is the create posts service via API used by a Marketing employee, the Post Service. The second is sending e-mails to external customer service when there is a new post and using a messaging bus to communicate, the Notification Service. Last is an external service that receives a post from the messaging bus and persists it in a database to an external WebSite, the Website Service. The Saga begin with the creation of a marketing post (with discounts, coupons, promotions…) that must notify the customers and update the website.\nNext, go with to success Saga flow complete:\nSo beautiful! But, what if when to create a new post and notification to customers by e-mail was concluded at the same time the post wasn’t updated on the website? We have a critical failure and must handle it. Next, I show how to deal with this problem.\nHandling Failures in a Saga #Retry Pattern #When there is a failure, we can keep retaining a step of Saga for a certain amount of times combined with an interval time until we are sure the failure occurred together with it will not resolved automatically. This retry approach to any action is called Retry Pattern.\nLookback the before flow let’s imagine a situation where the messaging was offline by some seconds and the message to the Website service wasn’t sent (step 3). Assuming that the Post Service has a policy of retaining messages trying a certain amount of times each x minutes, we have the flow:\nIs great, right? the service performed many retentions until messaging was online again. Also, we don’t need an alert screaming around. But…when we are sure of failure, even after applying all policies of retry? Need to take other actions to resolve or undo this in our Saga. The answer can be the Compensating Transactions.\nCompensating Transactions #Okay, there was a failure in our flow preventing to continues Saga. We should compensate for this error and undo previous actions.\nTo compensate, Saga’s pattern has the compensating transactions as an indicator mechanism for a transaction done to be…undone and can warn other applications to undo your local transactions too.\nTherefore, we continue with critical failure cases where customers receive notifications with new promotions or discounts but there are not any promotions on the website! The next steps are to undo and send an errata they like as compensation. So, changing point of fail in flow and has a new resolution:\nSteps 3a and 3b are compensating transactions that activate as soon as step 2 fails.\nReducing Rollbacks #Also is possible to prevent failures and compensations by just analysing points of failure and reorganizing the process. Too is possible to review functional and non-functional requirements.\nReviewing all flow, prevent the previous failure just by adjusting the send of Post to be updated on the website first before sending notifications to customers. In other words, send a message to the Website service (step 3) after getting a response from the message that was received successfully (steps 4a and 4b) and then send a message to notify customers (step 5). Such avoids the inconvenience mentioned likewise system will be resilient and trackable of failures.\nFinally, here is the refactored flow:\nIt’s all folks… #In this article, we saw not easily found details of the Saga Pattern plus problems that appear when implementing it. The pattern is commonly used together with many patterns like CQRS and Event-Driven, upgrading system architecture and fortifying your structure in various applications and products.\nI hope you’ve increased your knowledge and range of techniques to apply engineering or architecture software! See you Later o/\nReferences # Sample of Saga using Kafka and .Net: https://github.com/luanmds/kafka-dotnet-study/tree/main/sample-02 https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/saga/saga https://medium.com/codex/compensating-transaction-in-microservices-15b1f88a7c29 https://livebook.manning.com/book/microservices-patterns/chapter-4/ https://blog.sofwancoder.com/try-confirm-cancel-tcc-protocol https://en.wikipedia.org/wiki/Inter-process_communication https://en.wikipedia.org/wiki/Two-phase_commit_protocol https://www.cs.cornell.edu/andru/cs711/2002fa/reading/sagas.pdf https://en.wikipedia.org/wiki/Long-lived_transaction https://www.lifewire.com/the-acid-model-1019731 ","date":"Feb 13, 2025","permalink":"https://luanmds.com/en/posts/saga-pattern-overview/","section":"Posts","summary":"A compact guide to the Saga Pattern — how to manage distributed transactions in microservices using Choreography and Orchestration.","title":"Saga Pattern - An overview with use case"},{"content":"","date":null,"permalink":"https://luanmds.com/en/categories/system-design/","section":"Categories","summary":"","title":"System-Design"},{"content":" I\u0026rsquo;m a Software Engineer from Rio de Janeiro, Brazil, focused on Distributed Systems and Backend Architecture. With experience at companies like Stone, Saphira, and BTG Pactual, I specialized in building large-scale microservices and resilient data flows using technologies such as .NET, Kafka, Azure/AWS, and related tools.\nTech Stack \u0026amp; Expertise # .NET \u0026amp; C#: Development of mission-critical services. Architecture: Event-Driven, CQRS, Event Sourcing, and C4/UML modeling. Infrastructure: Cloud (Azure/AWS), Docker, and Kafka-based messaging. Quality: Clean Code, automated testing, and modern engineering practices. I write to document my studies, share practical day-to-day learnings, and help other engineers make better architecture and software design decisions.\nFind me online # ","date":null,"permalink":"https://luanmds.com/en/about/","section":"Luan Mello","summary":"","title":"About"},{"content":"Use the search button in the header (magnifier) or press / to open quick search.\n","date":null,"permalink":"https://luanmds.com/en/search/","section":"Luan Mello","summary":"","title":"Search"}]