Aller au contenu principal

Back to Project Overview

ShieldCraft AI Implementation Checklist

Overall Progress
80% Complete

🧱 Foundation & Planning

Lays the groundwork for a robust, secure, and business-aligned AI system. All key risks, requirements, and architecture are defined before data prep begins. Guiding Question: Before moving to Data Prep, ask: "Do we have clarity on what data is needed to solve the defined problem, and why?" Definition of Done: Business problem articulated, core architecture designed, and initial cost/risk assessments completed.


MSK + Lambda Integration To-Do List

  • 🟥 Ensure Lambda execution role has least-privilege Kafka permissions, scoped to MSK cluster ARN
  • 🟥 Deploy Lambda in private subnets with correct security group(s)
  • 🟥 Confirm security group allows Lambda-to-MSK broker connectivity (TLS port)
  • 🟥 Set up CloudWatch alarms for Lambda errors, throttles, and duration
  • 🟥 Set up CloudWatch alarms for MSK broker health, under-replicated partitions, and storage usage
  • 🟥 Route alarm notifications to the correct email/SNS topic
  • 🟥 Implement and test the end-to-end MSK + Lambda topic creation flow
  • 🟥 Update documentation for MSK + Lambda integration, including troubleshooting steps

Data Preparation

Guiding Question: Do we have the right data, in the right format, with clear lineage and privacy controls? Definition of Done: Data pipelines are operational, data is clean and indexed for RAG. Link to data_prep/ for schemas and pipelines.

  • 🟩 Identify and document all required data sources (logs, threat feeds, reports, configs)
  • 🟩 Data ingestion, cleaning, normalization, privacy, and versioning
  • 🟩 Build data ingestion pipelines
  • 🟩 Set up Amazon MSK (Kafka) cluster with topic creation
  • 🟥 Integrate Airbyte for connector-based data integration
  • 🟥 Implement AWS Lambda for event-driven ingestion and pre-processing
  • 🟥 Configure Amazon OpenSearch Ingestion for logs, metrics, and traces
  • 🟥 Build AWS Glue jobs for batch ETL and normalization
  • 🟥 Store raw and processed data in Amazon S3 data lake
  • 🟥 Enforce governance and privacy with AWS Lake Formation
  • 🟥 Add data quality checks (Great Expectations, Deequ)
  • 🟩 Implement data cleaning, normalization, and structuring
  • 🟩 Ensure data privacy (masking, anonymization) and compliance (GDPR, HIPAA, etc.)
  • 🟩 Establish data versioning for reproducibility
  • 🟩 Design and implement data retention policies
  • 🟩 Implement and document data deletion/right-to-be-forgotten workflows (GDPR)
  • 🟩 Modular data flows and schemas for different data sources
  • 🟩 Data lineage and audit trails for all data flows and model decisions
  • 🟩 Define and test disaster recovery, backup, and restore procedures for all critical data and services
  • 🟥 Text chunking strategy defined and implemented for RAG
  • 🟥 Experiment with various chunking sizes and overlaps (e.g., fixed, semantic, recursive)
  • 🟥 Handle metadata preservation during chunking
  • 🟥 Embedding model selection and experimentation for relevant data types
  • 🟥 Evaluate different embedding models (e.g., Bedrock Titan, open-source options)
  • 🟥 Establish benchmarking for embedding quality
  • 🟩 Vector database (or pgvector) setup and population
  • 🟩 Select appropriate vector store (e.g., Pinecone, Weaviate, pgvector)
  • 🟩 Implement ingestion pipeline for creating and storing embeddings
  • 🟩 Optimize vector indexing for retrieval speed
  • 🟩 Implement re-ranking mechanisms for retrieved documents (e.g., Cohere Rerank, cross-encoders)

AWS Cloud Foundation and Architecture

Guiding Question: Is the AWS environment production-grade, modular, secure, and cost-optimized for MLOps and GenAI workloads? Definition of Done: All core AWS infrastructure is provisioned as code, with cross-stack integration, config-driven deployment, and robust security/compliance controls. Architecture is modular, extensible, and supports rapid iteration and rollback.

  • 🟩 Multi-account, multi-environment AWS Organization structure with strict separation of dev, staging, and prod, supporting least-privilege and blast radius reduction.
  • Modular AWS CDK v2 stacks for all major AWS services:
    • 🟩 Networking (VPC, subnets, security groups, vault secret import)
    • 🟩 EventBridge (central event bus, rules, targets)
    • 🟩 Step Functions (workflow orchestration, state machines, IAM roles)
    • 🟩 S3 (object storage, vault secret import)
    • 🟩 Lake Formation (data governance, fine-grained access control)
    • 🟩 Glue (ETL, cataloging, analytics)
    • 🟩 Lambda (event-driven compute, triggers)
    • 🟩 Data Quality (automated validation, Great Expectations/Deequ)
    • 🟩 Airbyte (connector-based ingestion, ECS services)
    • 🟩 OpenSearch (search, analytics)
    • 🟩 Cloud Native Hardening (CloudWatch alarms, Config rules, IAM boundaries)
    • 🟩 Attack Simulation (automated security validation, Lambda, alarms)
    • 🟩 Secrets Manager (centralized secrets, cross-stack exports)
    • 🟩 MSK (Kafka streaming, broker info, roles)
    • 🟩 SageMaker (model training, deployment, monitoring)
    • 🟩 Budget (cost guardrails, alerts, notifications)
  • 🟩 Advanced cross-stack resource sharing and dependency injection (CfnOutput/Fn.import_value), enabling secure, DRY, and scalable infrastructure composition.
  • 🟩 Pydantic-driven config validation and parameterization, enforcing schema correctness and preventing misconfiguration at deploy time.
  • 🟩 Automated tagging and metadata propagation across all resources for cost allocation, compliance, and auditability.
  • 🟩 Hardened IAM roles, policies, and boundary enforcement, with automated least-privilege checks and centralized secrets management via AWS Secrets Manager.
  • 🟩 AWS Vault integration for secure credential management and developer onboarding.
  • 🟩 Automated S3 lifecycle policies, encryption, and access controls for all data lake buckets.
  • 🟩 End-to-end cost controls and budget alarms, with CloudWatch and SNS integration for real-time alerting.
  • 🟩 Cloud-native hardening stack (GuardDuty, Security Hub, Inspector) with automated findings aggregation and remediation hooks.
  • 🟩 Automated integration tests for all critical AWS resources, covering both happy and unhappy paths, and validating cross-stack outputs.
  • 🟩 Comprehensive documentation for stack interactions, outputs, and architectural decisions, supporting onboarding and audit requirements.
  • 🟩 GitHub Actions CI/CD pipeline for automated build, test, and deployment of all infrastructure code.
  • 🟩 Automated dependency management and patching via Poetry, ensuring reproducible builds and secure supply chain.
  • 🟩 Modular, environment-parameterized deployment scripts and commit automation for rapid iteration and rollback.
  • 🟩 Centralized error handling, smoke tests, and post-deployment validation for infrastructure reliability.
  • 🟩 Secure, reproducible Dockerfiles and Compose files for local and cloud development, with best practices enforced.
  • 🟩 Continuous compliance monitoring (Config, CloudWatch, custom rules) and regular security architecture reviews.

AI Core Development and Experimentation

Guiding Question: Are our models accurately solving the problem, and is the GenAI output reliable and safe? Definition of Done: Core AI models demonstrate accuracy, reliability, and safety according to defined metrics. Link to ai_core/ for model code and experiments.

  • 🟩 Selected Mistral-7B as the primary Foundation Model for ShieldCraft AI
  • 🟥 Select secondary Foundation Models (FMs) from Amazon Bedrock or Hugging Face (Phase 2 - multi-agent orchestration)
  • 🟥 Define core AI strategy (RAG, fine-tuning, hybrid approach)
  • 🟥 LangChain integration for orchestration and prompt management
  • 🟥 Prompt Engineering lifecycle implemented:
  • 🟥 Prompt versioning and prompt registry
  • 🟥 Prompt approval workflow
  • 🟥 Prompt experimentation framework
  • 🟥 Integration of human-in-the-loop (HITL) for continuous prompt refinement
  • 🟥 Guardrails and safety mechanisms for GenAI outputs:
  • 🟥 Establish Responsible AI governance: bias monitoring, model risk management, and audit trails
  • 🟥 Implement content moderation APIs/filters
  • 🟥 Define toxicity thresholds and response strategies
  • 🟥 Establish mechanisms for red-teaming GenAI outputs (e.g., adversarial prompt generation and testing)
  • 🟥 RAG pipeline prototyping and optimization:
  • 🟥 Implement efficient retrieval from vector store
  • 🟥 Context window management for LLMs
  • 🟥 LLM output parsing and validation (e.g., Pydantic for structured output)
  • 🟥 Address bias, fairness, and transparency in model outputs
  • 🟥 Implement explainability for key AI decisions where possible
  • 🟥 Automated prompt evaluation metrics and frameworks
  • 🟩 Model loading, inference, and resource optimization
  • 🟥 Experiment tracking and versioning (MLflow/SageMaker Experiments)
  • 🟥 Model registry and rollback capabilities (SageMaker Model Registry)
  • 🟥 Establish baseline metrics for model performance
  • 🟥 Cost tracking and optimization for LLM inference (per token, per query)
  • 🟥 LLM-specific evaluation metrics:
  • 🟥 Hallucination rate (quantified)
  • 🟥 Factuality score
  • 🟥 Coherence and fluency metrics
  • 🟥 Response latency per token
  • 🟥 Relevance to query
  • 🟥 Model and Prompt card generation for documentation
  • 🟥 Implement canary and shadow testing for new models/prompts

Application Layer and Integration

Guiding Question: Is the AI accessible, robust, and seamlessly integrated with existing systems? Definition of Done: API functional, integrated with UI, and handles errors gracefully. Link to application for API code and documentation.

  • 🟥 Define Core API endpoints for AI services
  • 🟥 Build production-ready, scalable API (FastAPI, Flask, etc.)
  • 🟥 Input/output validation and data serialization
  • 🟥 User Interface (UI) integration for analyst dashboard
  • 🟥 Implement LangChain Chains and Agents for complex workflows
  • 🟥 LangChain Memory components for conversational context
  • 🟥 Robust error handling and graceful fallbacks for API and LLM responses
  • 🟥 API resilience and rate limiting mechanisms
  • 🟥 Implement API abuse prevention (WAF, throttling, DDoS protection)
  • 🟥 Secure prompt handling and sensitive data redaction at the application layer
  • 🟥 Develop example clients/SDKs for API consumption
  • 🟥 Implement API Gateway (AWS API Gateway) for secure access
  • 🟥 Automated API documentation generation (e.g., OpenAPI/Swagger)

Evaluation and Continuous Improvement

Guiding Question: How do we continuously measure, learn, and improve the AI's effectiveness and reliability? Definition of Done: Evaluation framework established, feedback loops active, and continuous improvement process in place. Link to evaluation for metrics and dashboards.

  • 🟥 Automated evaluation metrics and dashboards (e.g., RAG evaluation tools for retrieval relevance, faithfulness, answer correctness)
  • 🟥 Human-in-the-loop (HITL) feedback mechanisms for all GenAI outputs
  • 🟥 Implement user feedback loop for feature requests and issues
  • 🟥 LLM-specific monitoring: toxicity drift, hallucination rates, contextual relevance
  • 🟥 Real-time alerting for performance degradation or anomalies
  • 🟥 A/B testing framework for prompts, models, and RAG configurations
  • 🟥 Usage analytics and adoption tracking
  • 🟥 Continuous benchmarking and optimization for performance and cost
  • 🟥 Iterative prompt, model, and data retrieval refinement processes
  • 🟥 Regular stakeholder feedback sessions and roadmap alignment

MLOps, Deployment and Monitoring

Guiding Question: Is the system reliable, scalable, secure, and observable in production? Definition of Done: CI/CD fully automated, system stable in production, and monitoring active. Link to mlops/ for pipeline definitions.

  • 🟩 Infrastructure as Code (IaC) with AWS CDK for all cloud resources
  • 🟩 CI/CD pipelines (GitHub Actions) for automated build, test, and deployment
  • 🟩 Containerization (Docker)
  • 🟥 Orchestration (Kubernetes/AWS EKS)
  • 🟩 Pre-commit and pre-push hooks for code quality checks
  • 🟩 Automated dependency and vulnerability patching
  • 🟥 Secrets scanning in repositories and CI/CD pipelines
  • 🟥 Build artifact signing and verification
  • 🟥 Secure build environment (e.g., ephemeral runners)
  • 🟥 Deployment approval gates and manual review processes
  • 🟥 Automated rollback and canary deployment strategies
  • 🟥 Post-deployment validation checks (smoke tests, integration tests)
  • 🟥 Continuous monitoring for cost, performance, data/concept drift
  • 🟥 Implement cloud cost monitoring, alerting, and FinOps best practices (AWS Cost Explorer, budgets, tagging, reporting)
  • 🟥 Secure authentication, authorization, and configuration management
  • 🟩 Secrets management (AWS Secrets Vault)
  • 🟥 IAM roles and fine-grained access control
  • 🟥 Schedule regular IAM access reviews and user lifecycle management
  • 🟩 Multi-environment support (dev, staging, prod)
  • 🟩 Automated artifact management (models, data, embeddings)
  • 🟩 Robust error handling in automation scripts
  • 🟥 Automated smoke and integration tests, triggered after build/deploy
  • 🟥 Static type checks enforced in CI/CD using Mypy
  • 🟥 Code coverage tracked and reported via Pytest-cov
  • 🟥 Automated Jupyter notebook dependency management and validation (via Nox and Nbval)
  • 🟥 Automated SageMaker training jobs launched via Nox and parameterized config
  • 🟩 Streamlined local development (Nox, Docker Compose)
  • 🟥 Command Line Interface (CLI) tools for common operations
  • 🟥 Automate SBOM generation and review third-party dependencies for supply chain risk
  • 🟥 Define release management and versioning policies for all major components

Security and Governance (Overarching)

Guiding Question: Are we proactively managing risk, compliance, and security at every layer and continuously? Definition of Done: Comprehensive security posture established, audited, and monitored across all layers. Link to security/ for policies and audit reports.

  • 🟥 Establish Security Architecture Review Board (if not already in place)
  • 🟥 Conduct regular Security Audits (internal and external)
  • 🟥 Implement Continuous compliance monitoring (GDPR, SOC2, etc.)
  • 🟥 Develop a Security Incident Response Plan and corresponding runbooks
  • 🟥 Implement Centralized audit logging and access reviews
  • 🟥 Develop SRE runbooks, on-call rotation, and incident management for production support
  • 🟥 Document and enforce Security Policies and Procedures
  • 🟥 Proactive identification and mitigation of Technical, Ethical, and Operational risks
  • 🟥 Leverage AWS security services (Security Hub, GuardDuty, Config) for enterprise posture
  • 🟥 Ensure data lineage and audit trails are established and maintained for all data flows and model decisions
  • 🟥 Implement Automated security scanning for code, containers, and dependencies (SAST, DAST, SBOM)
  • 🟥 Secure authentication, authorization, and secrets management across all services
  • 🟥 Define and enforce IAM roles and fine-grained access controls
  • 🟥 Regularly monitor for Infrastructure drift and automated remediation for security configurations

Documentation and Enablement

Guiding Question: Is documentation clear, actionable, and up-to-date for all stakeholders? Definition of Done: All docs up-to-date, onboarding tested, and diagrams published. Link to docs-site/ for rendered docs.

  • 🟩 Maintain up-to-date Docusaurus documentation for all major components
  • 🟩 Automated checklist progress bar update
  • 🟥 Architecture diagrams and sequence diagrams for all major flows
  • 🟥 Document onboarding, architecture, and usage for developers and analysts
  • 🟩 Add “How to contribute” and “Getting started” guides
  • 🟥 Automated onboarding scripts (e.g., one-liner to set up local/dev environment)
  • 🟥 Pre-built Jupyter notebook templates for common workflows
  • 🟥 End-to-end usage walkthroughs (from data ingestion to GenAI output)
  • 🟥 Troubleshooting and FAQ section
  • 🟥 Regularly update changelog and roadmap
  • 🟥 Set up customer support/feedback channels and integrate feedback into roadmap
  • 🟥 Changelog automation and release notes
  • 🟥 Automated notebook dependency management and validation
  • 🟥 Automated notebook validation in CI/CD
  • 🟥 Code quality and consistent style enforced (Ruff, Poetry)
  • 🟥 Contribution guidelines for prompt engineering and model adapters
  • 🟥 All automation and deployment workflows parameterized for environments
  • 🟥 Test coverage thresholds and enforcement
  • 🟥 End-to-end tests simulating real analyst workflows
  • 🟥 Fuzz testing for API and prompt inputs