Portrait of Yohans (John) Bekele
Available for new opportunities

Yohans (John) Bekele

AIOps Engineer |Site Reliability Engineer | DevOps Engineer | Full Stack Engineer

AWS Certified Cloud Practitioner | 4+ Years Experience

AIOps Engineer with 4+ years of progressive experience across AIOps and automation, site reliability, data analytics, and full-stack development now focused on designing and deploying agentic AI systems in production at enterprise scale ms.

01Selected Work

Things I've been building

A selection of 12 projects spanning cloud infrastructure, full-stack apps, and AI tooling — built end-to-end and shipped.

Archyra screenshot
Featured
Full-Stack
01 / 04

SPOTLIGHT / 01 of 04

Archyra

One npm package: 25 animated React components, an AWS architecture designer that exports Terraform/Pulumi, and an MCP server so Claude, Cursor, and Windsurf can install components for you.

TypeScriptReactFramer MotionNext.jsAWSTerraformPulumiMCP
Source
Featured
01/04
AI/LLM
/02

Discipline Coach

AI-powered accountability app that tracks discipline through three daily execution blocks, scoring consistency and generating intelligent insights to turn intention into action.

Claude AIReactTypeScriptTailwind CSS+3
SourceDemoView →
Full-StackAI/LLM
/03

Pokojowo

Full-stack room rental platform with AI-powered matching, real-time chat, and JWT authentication built on React, FastAPI, and MongoDB.

React 19FastAPIMongoDB AtlasSocket.IO+5
SourceDemoView →
DevOps
/04

Multi-Cloud CI/CD Pipeline

Built a production CI/CD pipeline orchestrating deployments across AWS and ECS with automated rollback, multi-environment promotion, and container image management.

AWS ECSECRGitHub ActionsDocker+4
View →
DevOpsFull-Stack
/05

Serverless Observability Platform

Built a multi-tenant serverless observability platform using AWS Lambda, API Gateway, and DynamoDB with CloudWatch and EventBridge for monitoring and alerting.

AWS LambdaAPI GatewayDynamoDBCloudWatch+6
View →
AI/LLMFull-Stack
/06

Teacherbot

AI-powered DevOps learning platform with personalized paths, adaptive exercises, and real-time intelligent feedback tailored to individual skill levels.

Next.jsFast APIClaud code agent sdkTypeScript+3
SourceDemoView →
AgroTech screenshot
Full-Stack
/07

AgroTech

Two-sided marketplace for renting farm equipment: owners list machinery, farmers browse, book a date range, and return when done.

ReactJavaScriptNode.jsMongoDB
SourceDemoView →
LawConnect screenshot
Full-StackAI/LLM
/08

LawConnect

Case management for advocates: cases, fees and invoices, a client portal, and a calendar that won't let a deadline slip.

MongoDBExpressReactNode.js+1
SourceDemoView →
Crypto Tracker screenshot
Full-Stack
/09

Crypto Info Finder

Type a coin name, get live market data back. Single-page React app over a public crypto API with a dark, retro UI.

ReactJavaScriptREST APICSS
SourceDemoView →

More on GitHub

Open source contributions, experiments, and side projects.

View All Repos
02Capability Stack

The tools behind the work

Technologies I reach for to ship reliable, scalable systems — grouped by what they actually do.

/01

Cloud Platforms & Services

Built production infrastructure from zero on AWS — 7 microservices across 4 environments.

AWS ECS FargateAurora PostgreSQLDynamoDBElastiCacheCloudFrontALBS3IAMVPCBedrockSageMakerKMSWAFv2GuardDuty
/02

Infrastructure as Code

Designed and maintained multi-environment IaC stacks with state management and feature flags.

Pulumi (Python)TerraformCloudFormation
/03

CI/CD & DevOps

Authored 20+ CI/CD workflows with SHA-pinned actions, security gates, and automated release notes.

GitHub ActionsAWS CodeBuildDockerECRProduction Approval GatesSecurity ScanningAutomated RollbacksFeature Flags
/04

AI & LLM Systems

Deployed agentic AI services, conversational AI, and document processing pipelines to production.

Claude/BedrockAnthropic APIpgvectorAgentic AIRAGLangChainDocument ProcessingEmbeddings
/05

Monitoring & Observability

Full-stack observability achieving 99.8% uptime through proactive monitoring and alert tuning.

New RelicDatadogCloudWatchPrometheusGrafanaELK StackAlertingAnomaly Detection
/06

Security

Implemented defense-in-depth security with compliance audits (CIS, AWS Best Practices).

WAFv2GuardDutySecurity HubKMS EncryptionZero-Trust S3 PoliciesSecrets ManagerMicrosoft Entra ID SSORBAC
/07

Programming & Full Stack

Backend services, frontend applications, automation scripts, and ETL pipelines.

PythonTypeScriptJavaScriptBashSQLFastAPINext.jsReactNode.jsExpressTailwind CSS
/08

Incident Management

Incident response, escalation, and resolution with documented runbooks and automation.

ServiceNowJIRAPagerDutyRoot Cause AnalysisPost-Incident ReviewsRunbooksSLA Management
03Track Record

Where I've done the work

Roles and responsibilities behind the projects above.

Built and maintained cloud infrastructure using Pulumi and AWS across ECS Fargate, ALB, CloudFront, ACM, VPC, S3, SSM, IAM, and Route 53 — codifying multi-environment deployments for dev, QA, production, lab, and feature stacks.

Designed and operated advanced GitHub Actions CI/CD pipelines for build, test, deploy, regression, warmup, teardown, and release workflows — using reusable jobs, environment-scoped secrets, OIDC-based AWS access, and safe rollback patterns.

Developed AI-powered CI/CD workflows that automated PR review, infrastructure review, frontend-impact analysis, dependency remediation, and ticket-to-PR delivery — improving review quality while reducing repetitive engineering toil.

Implemented infrastructure guardrails for IaC safety, cross-stack consistency, secret handling, deployment readiness, and environment isolation — helping prevent misconfigurations before they reached production.

Automated operational workflows in Python, Bash, GitHub Actions, and monorepo tooling — streamlining deployments, health checks, log collection, release tasks, and recurring maintenance across multiple services.

Supported containerized application delivery on AWS ECS, Docker, ECR, ALB, and CloudFront — improving service reliability through health checks, structured deployment workflows, origin verification, and production warmup routines.

Strengthened observability and incident response through CloudWatch logs, ECS diagnostics, structured logging, PII-safe redaction, and AI-assisted SRE tooling — making production issues easier to detect, investigate, and resolve.

Authored and maintained technical documentation, runbooks, workflow standards, and AI agent instructions — establishing repeatable engineering practices for infrastructure, CI/CD, security, and automated delivery.

ECS FargateAurora PostgreSQLpgvectorDynamoDBElastiCacheBedrockSageMakerS3CloudFrontALBIAMVPCKMSWAFv2GuardDutySecurity HubSecrets ManagerCloudWatchPulumiGitHub ActionsAurora PostgreSQLCloudFrontWAFDockerFastAPINext.js

Built an AI-powered Knowledge Base management system using RAG and agentic AI, resolved 10,000+ production incidents, and achieved 99.8% system uptime through proactive monitoring and automation.

  • Built an AI-powered Knowledge Base management system using RAG and agentic AI with agents that extract and synthesize information from multiple internal sources.
  • Exposed the knowledge base through a robust REST API designed for integration with legacy applications, enabling end users to self-diagnose issues before opening tickets.
  • Resolved 10,000+ production incidents via ServiceNow and JIRA, with structured root cause analysis, post-incident reviews, and runbook authoring.
  • Fed recurring incident patterns back into the knowledge base to improve support effectiveness.
  • Detected outages and client-side application faults early using clustering-based anomaly detection over user activity telemetry.
  • Achieved 99.8% system uptime through proactive monitoring, alert tuning, and preventive maintenance across enterprise applications.
  • Automated daily monitoring, health checks, and operational workflows in Python, Bash, and SQL, reducing manual effort by 15+ hours per week.
  • Supported database operations and migrations across MS SQL Server, PostgreSQL, and MySQL.

Monitored application performance and infrastructure health to ensure high availability and SLA compliance across distributed systems.

  • Tracked metrics, logs, events, and traces using New Relic and Datadog across distributed production environments.
  • Managed incident triage, event correlation, and escalation via ServiceNow and JIRA with signal enrichment and actionable alerting.
  • Reduced mean time to resolution (MTTR) by 45% through improved incident management workflows.
  • Automated routine operational tasks, health checks, and reporting with Python, Bash, and SQL scripts, improving team efficiency by 30%.
  • Built custom dashboards for anomaly detection and proactive alerting to surface issues before customer impact.
  • Supported database migrations and integrity verification across MS SQL Server, PostgreSQL, and MySQL production environments.
  • Created runbooks and automated remediation scripts for incident response procedures, enabling faster recovery and reduced human intervention.
  • Collaborated with development and infrastructure teams to implement observability solutions, reduce alert noise, and improve system reliability.
New RelicDatadogPythonBashSQLPostgreSQLMySQLServiceNowJIRAObservability

Built end-to-end AI and web applications across finance, education, legal, real estate, agriculture, and infrastructure automation domains.

  • Developed LLM-powered features including financial news analysis, entity extraction, sentiment analysis, stock-signal generation, and AI tutoring workflows with prompt design and evaluation.
  • Designed full-stack architectures with REST APIs, authentication flows, database schemas, background processing, file handling, user dashboards, admin panels, and responsive frontends.
  • Built infrastructure automation tooling translating visual input into Pulumi/Terraform code, with drag-and-drop AWS modeling, GitHub OAuth integration, and repository workflows.
  • Implemented production engineering practices including Docker containerization, API testing, reusable components, service separation, environment configuration, CI/CD setup, and deployment troubleshooting.
  • Used React, Next.js, TypeScript, Python/FastAPI, Node.js/Express, PostgreSQL, MongoDB, Docker, and cloud deployment workflows across all projects.
ReactNext.jsTypeScriptNode.jsFastAPIPythonMongoDBPostgreSQLDockerTailwind CSS

Executed large-scale data migration and transformation projects across multiple source systems, improving data processing efficiency by 30%.

  • Automated ETL pipelines using Python, Bash, and SQL to streamline migration workflows.
  • Designed data validation and cleansing routines to ensure accuracy, consistency, and compliance during migrations.
  • Built data ingestion pipelines that normalized and enriched datasets from heterogeneous sources.
  • Created dashboards and analytical reports providing actionable insights on migrated datasets for business continuity and compliance.
  • Authored technical documentation and best practices guides for migration workflows and automation scripts.
  • Collaborated with cross-functional teams to ensure seamless data integration with minimal disruption to business operations.
PythonBashSQLETL PipelinesData MigrationData ValidationAnalyticsDocumentation

Built and maintained secure, responsive React-based banking web interfaces across the MERN stack, covering component architecture, hooks-based state management, REST API integration, form validation, and production-ready layouts.

  • Developed internal banking operations dashboards and admin portals used by business and operations teams to monitor transactions, manage customer and account records, generate reports, and support daily reconciliation workflows.
  • Delivered customer-facing digital banking pages including product information, service pages, onboarding flows, and transactional web surfaces with a focus on accessibility, mobile responsiveness, browser compatibility, and clear user journeys.
  • Partnered with backend engineers on Node.js/Express API contracts and MongoDB schemas, ensuring accurate data handling for customer, transaction, reporting, and operational workflows.
  • Implemented frontend authentication and role-based access controls for internal banking tools, including protected routes, session-aware UI behavior, and conditional access to sensitive operational screens.
  • Maintained reusable UI components, styling conventions, and shared frontend patterns across internal and public-facing banking applications, improving consistency, reducing duplication, and accelerating delivery of new screens.
  • Improved frontend performance through code splitting, lazy loading, asset optimization, and bundle-size control, helping banking web services remain usable on slower networks and lower-end devices.
  • Supported production quality through component testing, linting, peer code reviews, defect fixes, and regression checks, helping reduce frontend issues before they reached banking staff or customers.
ReactNext.jsTypeScriptNode.jsFastAPIPythonMongoDBPostgreSQLDockerTailwind CSS
04Behind the Work

A bit about me

Portrait of Yohans (John) Bekele

Yohans (John) Bekele

Online · open to work

I'm an AIOps and Cloud Infrastructure Engineer who's taken AI-enabled systems from prototype to production across enterprise and freelance environments. At Thomson Reuters, I work on production AI workloads including agentic services, Bedrock and Anthropic API integrations, server-sent-event streaming, pgvector-based document ingestion, embedding pipelines, and async knowledge-graph processing. I also support infrastructure modernization through Pulumi-based provisioning, remote state management with S3/DynamoDB, cross-account migration work, and production verification across IAM, networking, databases, service health, and CI/CD.

I bring a strong operations background from application support, SRE-style incident response, and observability engineering. I've owned on-call responsibilities, led post-mortems for production incidents, and improved monitoring across New Relic, Datadog, ELK, and custom tooling. I've reduced operational toil through Python, Bash, SQL, and workflow automation. My work focuses on making systems easier to operate by improving alert quality, documenting runbooks, automating repeatable tasks, and feeding production lessons back into infrastructure-as-code and deployment pipelines.

Alongside enterprise engineering, I've delivered full-stack and AI-first products for freelance clients across React, FastAPI, MERN, Docker, cloud deployments, and LLM APIs—including marketplaces, case-management systems, workforce tools, AI assistants, trading platforms, and an AWS visual designer that emits IaC. This combination of AI engineering, cloud infrastructure, CI/CD, observability, and hands-on product delivery lets me bridge development, operations, and platform reliability in fast-moving technical environments.

05Verified

Certifications

06Let's Talk

Got a project in mind?

I'm always interested in hearing about SRE/DevOps, cloud infrastructure, and automation opportunities, or collaborating on AI/LLM applications and full stack projects. Based in Gdansk, Poland, I'm happy to discuss new opportunities and collaborations.