Published

- 17 min read

Enterprise Integration Lab

EN | 中文
img of Enterprise Integration Lab

GitHub Repository: Enterprise Integration Lab

1 Project Overview

Enterprise Integration Lab is a local portfolio project that demonstrates how independent enterprise systems can exchange, normalize, synchronize, and observe business data through an event-driven integration lifecycle.

This project was intentionally designed as an industry-neutral enterprise simulation. It does not model any real company, real internal system, or domain-specific production workflow. Instead, it uses generic enterprise concepts such as customers, agreements, service requests, operational cases, documents, canonical business objects, sync logs, and lineage records.

The goal of this project was not simply to build another CRUD application. The goal was to make enterprise integration architecture visible, explainable, and reviewable.


2 Why I Built This Project

Many enterprise systems are not built as one clean monolith. They often grow as separate systems with different responsibilities, data models, lifecycle states, and operational ownership.

A realistic integration platform needs to answer questions such as:

  • Where did this request originate?
  • Which system owns the raw submission?
  • Which system owns the operational workflow?
  • How does downstream status affect enterprise-level lifecycle visibility?
  • How can documents be stored without mixing binary files into business tables?
  • How can integration events be processed asynchronously?
  • How can duplicate events avoid creating duplicate downstream records?
  • How can auditability and lineage survive across system boundaries?

Enterprise Integration Lab was built to explore those questions in a concrete, runnable system.


3 Final System Overview

The final project includes:

  • Docker Compose runtime
  • FastAPI backend
  • asynchronous worker service
  • PostgreSQL with multiple logical schemas
  • Redis container
  • MinIO document object storage
  • static frontend dashboard served through nginx
  • reference data layer
  • intake portal form
  • optional document upload
  • event-driven worker lifecycle
  • operational case workflow simulation
  • canonical business object synchronization
  • sync logs
  • lineage records
  • status history audit trail
  • generic demo seed data
  • portfolio-ready README and architecture diagrams

The final repository is public-ready and contains only generic demo data.

The main demo surface is a browser-based dashboard served by the frontend container through nginx on port 8080. In the lab environment, I access it from a browser at the Enterprise Lab VM address, for example http://<enterprise-lab-vm-ip>:8080. This dashboard is important because it makes the integration lifecycle visible without requiring a reviewer to inspect PostgreSQL directly.

Figure 1. Dashboard overview showing aggregate counts, explanatory panels, intake form, and the main operating surface for the integration demo.


4 Containerized Runtime Architecture

The containerized runtime diagram belongs at the beginning of the architecture discussion because it explains the physical deployment shape before the article moves into data models and event flows. It shows how the user browser, nginx-served frontend, FastAPI backend, PostgreSQL, MinIO, and worker service cooperate inside the Docker Compose environment.

The system runs as a containerized enterprise simulation platform. Each major responsibility is isolated into its own service:

  • frontend portal and dashboard
  • backend orchestration API
  • asynchronous worker processing
  • PostgreSQL enterprise data model
  • Redis runtime container
  • MinIO document repository

The backend handles intake orchestration and read-only dashboard APIs. The worker continuously polls integration events and synchronizes operational and canonical lifecycle state.

This separation was important because it made the system feel closer to an enterprise integration environment rather than a single application with all logic in one place.

Figure 2. Proxmox runtime view showing the enterprise-lab Ubuntu VM running as the infrastructure host for the integration platform.

Figure 3. Docker Compose runtime evidence showing the backend, frontend, PostgreSQL, Redis, MinIO, and worker services running, with worker logs processing integration events.


5 Logical Schema Boundaries

PostgreSQL is used not only as a database, but also as a way to express system boundaries.

The project defines multiple logical schemas:

SchemaResponsibility
referenceGeneric master/reference data used by the intake form
intakeRaw user-submitted requests
source_systemFirst internal source-system request record
document_repoDocument metadata and object-storage references
integrationEvents and worker sync logs
operationalDownstream operational case records and status history
canonicalEnterprise-normalized business objects and lineage
dashboardReserved for future read models

One of the most important design decisions was to avoid collapsing these responsibilities into a single table or schema. Even though this is a demo project, the schema layout reflects how enterprise systems often need clear ownership boundaries.

Figure 4. PostgreSQL runtime inspection showing the logical schemas and completed integration events used to verify event synchronization.


6 Event-Driven Enterprise Lifecycle

The event-driven lifecycle diagram fits here because it gives the reader the first full end-to-end view of the system behavior. It connects the intake portal, source-system record, integration event, worker processing, operational case, canonical business object, and auditability layer into one visible flow.

The core lifecycle starts when a user submits a service request through the intake portal.

The request moves through several layers:

  1. The intake layer stores the raw submission.
  2. The source-system layer creates an internal service request.
  3. The integration layer records a pending event.
  4. The worker polls and processes the event asynchronously.
  5. The operational layer receives or reuses a downstream case.
  6. The canonical layer creates or updates an enterprise-normalized business object.
  7. Sync logs and lineage records preserve explainability.

This project demonstrates enterprise lifecycle orchestration rather than simple CRUD processing. The API does not directly create every downstream object. Instead, it creates source records and integration events. The worker is responsible for asynchronous synchronization.

That distinction became one of the most important architectural lessons in the project.

Figure 5. Newly submitted request entering the system as pending, before the asynchronous worker creates downstream operational and canonical records.

Figure 6. Worker-processed submission showing the source request as integrated, the operational case as open, and the canonical business object as active.


7 Semantic Mapping and Enterprise Lineage

The semantic mapping and lineage diagram belongs in this section because it explains the conceptual purpose of the canonical layer. It shows how local meanings from intake, source-system, and operational layers are normalized into one enterprise semantic view while lineage and synchronization history remain auditable.

Different enterprise systems often describe similar business concepts in different ways.

In this project:

  • intake submissions represent raw user input
  • source requests represent internal source-system records
  • operational cases represent downstream work records
  • canonical business objects represent enterprise-normalized semantics

The canonical layer behaves like a shared enterprise semantic coordinate system. It does not replace local systems. Instead, it provides a consistent enterprise-level view across otherwise independent operational systems.

Lineage records preserve relationships such as:

  • source request to operational case
  • operational case to canonical business object

This allows the dashboard and SQL queries to explain how data moved through the system.

Figure 7. Lineage records showing how service requests map to operational cases and how operational cases map to canonical business objects.


8 Operational Workflow and Lifecycle Synchronization

The operational workflow diagram is placed after the semantic model because it zooms into one of the most important feedback loops: downstream operational activity continuously updates the enterprise lifecycle state through integration events and worker synchronization.

The operational system simulates downstream human workflow.

Operational cases can move through these states:

  • open
  • in_progress
  • completed
  • rejected

Allowed transitions are intentionally simple:

  • open -> in_progress
  • open -> rejected
  • in_progress -> completed
  • in_progress -> rejected

Completed and rejected cases are terminal.

Each operational status change writes:

  • previous status
  • new status
  • changed by
  • change reason
  • changed timestamp

It also creates an OperationalCaseStatusChanged integration event. The worker processes that event and updates the canonical lifecycle status asynchronously.

The mapping is:

Operational StatusCanonical Status
openactive
in_progressin_progress
completedcompleted
rejectedrejected

This preserves local operational independence while keeping enterprise lifecycle visibility synchronized centrally.

Figure 8. Operational case after being moved to in_progress, with the canonical lifecycle state synchronized to match the downstream workflow state.

Figure 9. Completed operational case showing terminal lifecycle state and status history, preserving the transition from open to in_progress to completed.


9 Document Repository Flow

The project includes an optional document upload flow.

When a user submits a service request, they can upload one supporting document. The binary file is stored in MinIO, while metadata is stored in PostgreSQL under document_repo.documents.

The metadata includes:

  • document ID
  • linked submission ID
  • linked request ID
  • file name
  • document type
  • storage key
  • upload timestamp

This design keeps binary object storage separate from relational business records.

The dashboard displays document metadata, but it does not preview or download files. That was an intentional scope boundary for the MVP.

Figure 10. Intake portal document type selector showing how supporting documents are classified before upload.

Figure 11. MinIO repository inspection showing uploaded supporting documents stored as objects outside the relational business tables.


10 Reference Data Layer

A later phase introduced a generic reference data layer.

The purpose was to avoid letting users type arbitrary customer or agreement data into the intake form. Instead, the form uses controlled reference data from backend APIs.

Reference tables include:

  • reference.customers
  • reference.agreements
  • reference.request_types
  • reference.teams

This made the portal feel more enterprise-like. Users select known reference data rather than submitting free-form values that may not exist in upstream systems.

An important follow-up improvement was replacing embedded reference data in request_description with structured customer_id and agreement_id fields. This was a useful modeling correction: enterprise relationships should be structured data, not hidden inside text.

Figure 12. Customer reference selector showing controlled customer choices loaded from the reference data layer.

Figure 13. Request type selector showing standardized service request classifications instead of free-form request categories.

Figure 14. Assigned team selector showing operational ownership choices represented as controlled reference data.


11 Dashboard and Observability

The dashboard is the main demo surface.

It shows:

  • submissions
  • source requests
  • operational cases
  • canonical business objects
  • event statuses
  • sync logs
  • lineage records
  • document metadata
  • status history
  • lifecycle explanation
  • schema role legend
  • status meaning legend

The dashboard is intentionally read-only for lifecycle data, except for the controlled operational workflow action that simulates downstream human processing.

The dashboard was important because it turned database records into an explainable architecture story. Without it, the system could only be understood through SQL queries.

Figure 15. Sync log table showing completed worker actions and status propagation messages for operational-to-canonical lifecycle synchronization.


12 Implementation Journey

The project evolved through multiple phases. Each phase added one architectural layer or corrected one modeling issue.

Phase 1: Infrastructure Skeleton

The first milestone created the basic runtime foundation:

  • docker-compose.yml
  • FastAPI backend skeleton
  • worker service skeleton
  • PostgreSQL container
  • Redis container
  • MinIO container
  • frontend placeholder
  • .env.example
  • README setup instructions

At this stage, no business logic was implemented. The goal was to establish the containerized structure first.

The main lesson was that architecture should be made visible before implementation details grow around it.

Phase 2: Database Schema Layer

The second milestone implemented the database schema layer.

Tables were created across logical schemas for:

  • raw intake
  • source system records
  • integration events
  • canonical objects
  • operational records
  • document metadata
  • reference data
  • sync logs
  • lineage records

The schema included:

  • UUID business IDs
  • primary keys
  • foreign keys
  • indexes
  • status check constraints
  • JSONB fields where appropriate
  • created_at and updated_at timestamps

This phase established the system boundary model that guided the rest of the project.

Phase 3A: Minimal Event-Driven Lifecycle

The first working lifecycle implemented:

  1. POST /intake/submissions
  2. write intake.submissions
  3. create source_system.service_requests
  4. create integration.events
  5. worker polls pending events
  6. worker creates canonical.business_objects
  7. worker writes integration.sync_logs

This was the point where the project stopped being static infrastructure and became a working integration simulation.

Phase 3B: Architecture Boundary Review

After the first lifecycle worked, I reviewed whether the API and worker responsibilities were cleanly separated.

The review focused on:

  • whether the API was doing work that belonged to the worker
  • whether source, integration, and canonical layers remained distinct
  • whether transaction handling was safe
  • whether worker polling could duplicate processing
  • whether event status updates were explainable
  • whether sync logs and lineage records told a complete story

The review found several risks that needed hardening before adding more features.

Phase 3C: Worker Hardening

This phase addressed critical stability issues.

The key improvements were:

  • worker idempotency
  • deterministic source references
  • unique constraints to prevent duplicate canonical objects
  • safe rollback on worker failure
  • failed sync log recording in a new transaction
  • cleaner backend service boundaries

The backend was refactored into service modules:

  • intake service
  • source system service
  • integration event service

This made the code structure match the architecture more clearly.

The biggest lesson was that event-driven systems need idempotency early. Without it, duplicate events can quietly corrupt downstream data.

Phase 4: Operational System Sync

The next phase added the downstream operational system.

When the worker processed a RequestCreated event, it now created or reused:

  • operational.operation_cases
  • canonical.business_objects

It also wrote lineage records for:

  • service request to operational case
  • operational case to canonical business object

This phase made the project feel much more like an enterprise integration scenario. The source request was no longer just normalized into a canonical object; it also produced a downstream operational record.

Phase 5: Observability Dashboard

The project then added read-only dashboard APIs and a frontend dashboard.

Backend APIs included:

  • GET /dashboard/submissions
  • GET /dashboard/submissions/{submission_id}
  • GET /dashboard/events
  • GET /dashboard/lineage
  • GET /dashboard/sync-logs

The frontend changed from a placeholder into a basic lifecycle dashboard.

This surfaced a practical deployment bug: the frontend JavaScript originally called localhost:8000. That worked inside the VM, but failed when accessing the dashboard from another machine on the LAN because browser localhost referred to the viewer’s machine.

The fix was to use nginx reverse proxy routing:

  • frontend requests /api/...
  • nginx proxies /api to backend:8000

This was a useful reminder that browser networking context is different from container or VM networking context.

Phase 6A: Reference Data Layer

This phase added generic enterprise reference data:

  • customers
  • agreements
  • request types
  • teams

The backend exposed read-only reference APIs so the portal could load controlled dropdown values.

This avoided a common data quality problem: allowing portal users to type values that do not exist in enterprise master data.

Phase 6B: Intake Portal Form

The dashboard gained a basic intake portal form.

The form allowed users to select:

  • customer
  • agreement
  • request type
  • priority
  • assigned team

It also collected:

  • requester name
  • requester email
  • request description

At first, selected customer and agreement values were embedded into the request description. That worked technically, but it was not good enterprise modeling.

Phase 6C: Structured Reference Fields

The next correction moved customer and agreement references into structured fields:

  • customer_id
  • agreement_id

These fields were added to both:

  • intake.submissions
  • source_system.service_requests

Foreign keys linked them to the reference schema.

This was a valuable modeling lesson: text is not a substitute for relationships. If a concept has identity and referential meaning, it should be modeled structurally.

Phase 6D: Status Semantics Cleanup

The dashboard originally showed statuses such as:

  • source: integration_pending
  • operational: pending

Even after the worker had successfully processed the event, these labels made the system look unfinished.

The status semantics were cleaned up:

  • source request becomes integrated
  • operational case starts as open
  • canonical object remains active

The README and dashboard were updated to explain:

  • source integrated means the source request has been synchronized downstream
  • operational open means the downstream case has been created and is ready for processing
  • canonical active means the enterprise canonical object is valid and active

The lesson was that technically valid statuses can still be misleading to users. Status names are part of the architecture interface.

Phase 7A: Document Repository and Attachment Flow

This phase connected the document repository to the intake lifecycle.

The portal form gained an optional file input. The backend accepted multipart form data, uploaded the file to MinIO, and wrote metadata to document_repo.documents.

The document metadata linked to both:

  • the intake submission
  • the source service request

The lifecycle itself stayed unchanged. Documents became attached context, not drivers of workflow logic.

Phase 7B: Document Upload Risk Analysis

The document upload implementation was reviewed for consistency, transaction safety, idempotency, and security.

The review identified several risks:

  • MinIO upload failure could leave partial lifecycle state depending on transaction boundaries
  • MinIO upload success followed by database failure could create orphan objects
  • no file size limit
  • no file type allowlist
  • possible frontend XSS risk if document metadata was inserted through unsafe innerHTML

This review was one of the most valuable parts of the project because it exposed the difference between “feature works” and “feature is safe enough for a demo.”

Phase 7C: Document Upload Hardening

The hardening phase fixed the most important issues:

  • best-effort MinIO cleanup if metadata insert fails
  • 10MB upload limit
  • file extension allowlist
  • content type allowlist
  • safer frontend rendering for user-controlled text

Allowed MVP file types became:

  • PDF
  • TXT
  • CSV
  • PNG
  • JPEG

The README documents that validation is MVP-level and does not inspect magic bytes.

Phase 7D: Operational Workflow Simulation

The final functional feature added operational case workflow.

The backend added:

  • PATCH /operational/cases/{operation_case_id}/status

The database added:

  • operational.case_status_history

Each status update:

  1. validates the transition
  2. updates the operational case
  3. writes status history
  4. creates an OperationalCaseStatusChanged event
  5. lets the worker propagate status to the canonical layer

This completed the core integration loop: downstream operational activity can now update enterprise lifecycle visibility asynchronously.


13 Portfolio Readiness Remediation

After the functional phases, the project was reviewed as a public portfolio artifact.

The review focused on:

  • architecture clarity
  • dashboard explainability
  • README quality
  • terminology safety
  • demo data hygiene
  • public GitHub readiness

The remediation phase added:

  • .gitignore
  • removal of tracked .env
  • clean reset/reseed instructions
  • generic demo seed data
  • portfolio-oriented README sections
  • Mermaid architecture diagram
  • dashboard explanation panels
  • architecture diagrams in docs/assets
  • security notes
  • production-readiness disclaimer

The final README was shaped as an architecture portfolio entry, not just a developer runbook.

Public Repository Privacy Hardening

Before making the repository public, I performed a privacy and hygiene review.

The review checked:

  • tracked files
  • Git history
  • .env exposure
  • author email
  • tokens and private keys
  • real email addresses
  • local paths
  • LAN IP addresses
  • uploaded file metadata
  • domain-specific terminology

Two important issues were found:

  1. .env had existed in earlier Git history.
  2. commit author metadata exposed a real email address.

The repository history was then rewritten into a single clean public commit using a privacy-preserving noreply-style email. The old history was force-pushed away from the public branch.

This step was important because removing a file from the latest commit is not the same as removing it from Git history.


14 Key Technical Lessons

1. Event-driven systems need idempotency from the beginning

A worker may process the same event more than once. Without deterministic references and uniqueness constraints, duplicate events can create duplicate downstream records.

The project solved this by using deterministic source references and reusing existing canonical and operational records where appropriate.

2. Rollback handling must account for aborted transactions

When PostgreSQL transactions fail, the transaction can enter an aborted state. If failure logging happens inside the same broken transaction, the failure log may not be written.

The worker was hardened to rollback first, then open a new transaction to record failure status and sync logs.

3. Architecture boundaries should be reflected in code structure

The backend originally handled intake, source request creation, and event creation in one flow. That behavior was acceptable, but the code structure needed clearer boundaries.

Refactoring into service modules made the code easier to reason about:

  • intake service
  • source system service
  • integration service
  • document service
  • operational service
  • dashboard service

4. Browser networking is not container networking

The dashboard initially failed from a LAN browser because frontend JavaScript called localhost:8000.

The fix was to route API calls through nginx using relative paths:

  • browser calls /api/...
  • nginx proxies to backend:8000

This made the dashboard usable from other machines without changing backend business logic.

5. Structured relationships beat embedded text

Putting selected customer and agreement information into request_description worked temporarily, but it was not correct enterprise modeling.

Moving those fields into structured UUID relationships made the data model more reliable, queryable, and explainable.

6. Status names matter

A technically correct status can still confuse users.

Changing integration_pending to integrated and pending to open made the dashboard easier to understand without changing the underlying architecture.

7. Object storage and database transactions do not rollback together

MinIO uploads and PostgreSQL transactions are separate systems.

If a file upload succeeds but metadata insert fails, the object can become orphaned. The project added best-effort cleanup to reduce this risk.

8. Portfolio readiness is part of engineering

A project can be technically functional but still not ready to show publicly.

Public readiness required:

  • README storytelling
  • diagrams
  • demo data hygiene
  • .env cleanup
  • Git history cleanup
  • security disclaimers
  • clear scope boundaries

15 What This Project Is Not

This project is not production ready.

It intentionally does not include:

  • authentication
  • role-based authorization
  • TLS termination
  • production secret management
  • malware scanning
  • document preview/download
  • AI normalization implementation
  • complex retry policies
  • distributed locking
  • production observability stack
  • public internet deployment hardening

These omissions are documented because portfolio projects should be honest about scope.


16 Current Demo Workflow

A reviewer can run the project locally and follow this flow:

  1. Start the stack with Docker Compose.
  2. Open the dashboard.
  3. Review seeded demo records.
  4. Submit a new service request.
  5. Optionally upload a supporting document.
  6. Watch the worker process the event.
  7. Review operational and canonical records.
  8. Move the operational case to in_progress.
  9. Complete or reject the operational case.
  10. Review status history, sync logs, events, and lineage.

This gives a complete end-to-end demonstration of the integration lifecycle.


17 Final Architecture Value

The final system demonstrates several enterprise architecture concepts in one small project:

  • system boundary separation
  • event-driven orchestration
  • asynchronous worker processing
  • source-system records
  • downstream operational records
  • canonical data modeling
  • document repository separation
  • reference data governance
  • lineage and auditability
  • lifecycle observability
  • public-ready project documentation

What I like most about this project is that it does not rely on one impressive feature. Its value comes from the relationships between layers.

The system is small enough to run locally, but structured enough to explain real enterprise integration concerns.


18 Future Improvements

The next possible improvements would be:

  1. AI-assisted intake normalization
    Add AI suggestions for request classification and metadata extraction while keeping deterministic validation and synchronization as the source of truth.

  2. Document preview and download
    Add secure document retrieval with signed URLs or backend-mediated access.

  3. Advanced workflow rules
    Add richer transition rules, assignment logic, and escalation states.

  4. Retry and dead-letter handling
    Improve worker resilience with retry counts, exponential backoff, and dead-letter event states.

  5. Production-style observability
    Add metrics, structured logs, tracing, and operational dashboards.

  6. Migration framework
    Replace init SQL rebuilds with a formal migration tool such as Alembic.


19 Reflection

This project started as a simple infrastructure skeleton and gradually became a complete enterprise integration simulation.

The most useful parts of the process were not only the features that were added, but the reviews that found architectural and operational weaknesses:

  • worker idempotency
  • transaction failure handling
  • dashboard networking
  • structured reference modeling
  • status semantics
  • document upload safety
  • XSS prevention
  • public repository hygiene

Those corrections made the project stronger and also made the learning more concrete.

Enterprise architecture is not just about drawing boxes. It is about defining ownership, preserving meaning across boundaries, handling failure honestly, and making system behavior explainable.

Enterprise Integration Lab became a portfolio project because it demonstrates those ideas in a working, reviewable, runnable form.