GIRIJESH SINGH OPEN TO LEAD DS ROLES

PORTFOLIO — 2026 · KITCHENER, ONTARIO, CA

Lead Data Scientist & Applied AI Architect — building production AI where hallucinations carry legal consequences.

2,000+
USERS
30K+
DAILY QUERIES
$10M+
IMPACT
7+ YRS
DEPTH
SCROLL

I think in systems,
not notebooks.

Three things that separate a production AI architect from someone who can run a notebook.

01

Zero-to-One GenAI Architecture

I don't just prompt-engineer; I build deterministic, multi-agent microservices from scratch. When off-the-shelf tools like LangChain fail at scale, I design custom orchestration layers (PydanticAI, FastMCP) that actually work in production.

02

High-Performance System Optimization

I bridge the gap between Data Science and Data Engineering. I optimize vector indices (Azure Cosmos DB IVF) to slash query latencies from minutes to seconds, circumventing hundreds of thousands in cloud scale-out costs.

03

Measurable Enterprise Impact

My architectures don't just live in notebooks. I've directed teams of 10–14 engineers to deploy systems that drive $10M+ in operational efficiency and process tens of thousands of queries daily across 4 global regions.

TECHNICAL PHILOSOPHY

Moving a GenAI prototype to a regulated production environment exposes the limits of wrapper libraries. Within 30 days of standard RAG on 100+ page insurance documents, I diagnosed catastrophic context collapse and citation failure. The fix wasn't a bigger context window — it was a custom runtime schema transpilation layer and a hierarchical node retrieval engine, built from scratch.

“The best GenAI architecture is the one that can provably tell you exactly why it gave the answer it did, every single time.”

FEATURED WORK

What I've built.

Production systems. Published research. Shipped products.

PRODUCTION · NDA

CoverAI — Zero-Hallucination Retrieval Engine

Lead Data Scientist · 2024–Present · Confidential Employer

Problem. Off-the-shelf LangChain RAG failed catastrophically on 700+ page insurance policies — hallucinating citations with legal consequences.

Build. Custom hierarchical JSON-tree retrieval with runtime schema transpilation. Deterministic citation validation streams output character-by-character. New carriers onboard via config — zero code changes.

PydanticAI FastMCP Hierarchical Node Retrieval Cosmos DB (IVF) Real-Time Citation Validation Azure OpenAI / GPT-4o
10×
Latency
<30s
E2E Response
$500K+
Cloud Saved
100%
Citations Valid
PRODUCTION

Autonomous Multi-Line Claims Routing

XGBoost pipeline for claim triage. SHAP for 100% regulatory audit trail. 45% cost reduction · $200K annual savings · 35% faster settlements.

XGBoost · SHAP · Django · RabbitMQ

INTERNAL · PRE-LLM

PriML — Natural Language → SQL

Team lead of 10. Fine-tuned Rat-SQL transformer. NL query → SQL → Plotly dashboard, self-serve. 87% accuracy on complex multi-table JOINs — before LLMs existed.

Rat-SQL · Fine-tuning · Plotly · Postgres

PRODUCTION · NDA

FNOL Classification Agent System

3-service microarchitecture (FastAPI + FastMCP + Azure Service Bus). PydanticAI agent generates type-safe models from per-tenant schemas at runtime. 95% alignment · 2% hallucination.

PydanticAI · FastMCP · Multi-Tenant

PUBLISHED · IEEE

Selective EEG Anonymization

Multi-objective autoencoders for Brain-Computer Interfaces. Selective anonymization preserves clinical signal while eliminating re-identification. PST 2023, Copenhagen.

View on IEEE Xplore ↗

SIDE PROJECTS

OPEN SOURCE

DirectorAI

Browser-native AI video editor. Natural language → FFmpeg.wasm. TensorFlow.js face detection, all client-side. No uploads, no server, no privacy tradeoff.

React 19 · FFmpeg.wasm · TensorFlow.js · Vite

GitHub ↗
LIVE

AI Conversation Exporter

Export ChatGPT, Claude, and Gemini conversations as TXT, Markdown, JSON, or HTML. Zero permissions, all processing local. Chrome + Firefox.

Chrome/Firefox Extension · Manifest V3 · Zero Permissions

GitHub ↗

BY THE NUMBERS

Real impact at scale.

Every number is earned, not estimated from a demo.

2,000+
Active Users
US · UK · AU · EU
30K+
Daily AI Queries
Multi-carrier · Multi-tenant
10×
Latency Reduction
Hours → under 30 seconds
$10M+
Annual Savings
In adjuster time
4
Global Regions
US · UK · AU · EU
Throughput Gain
Same hardware

TECHNICAL ARSENAL

AI Specializations

Machine Learning · Deep Learning · Generative AI · Agentic AI · Large Language Models · Computer Vision · Transformers · Explainable AI (SHAP)

LLM Orchestration

FastMCP · PydanticAI · Hierarchical Node Retrieval · Citation Validation · Azure OpenAI / GPT-4o · Cohere Reranking · Embedding Models

Core ML / AI

PyTorch · Transformers · XGBoost · SHAP · Vision OCR · NLP / Fine-tuning · Prophet · Scikit-learn · Plotly · Matplotlib · Seaborn

Languages

Python · SQL

Data & Cloud

PySpark · Microsoft Azure (Cosmos DB IVF · Service Bus) · Google Cloud Platform · AWS (Lambda · SageMaker · S3) · RabbitMQ · PostgreSQL · MongoDB · Docker

Delivery

FastAPI · OpenTelemetry · Arize Phoenix · Django · Flask · Adobe PDF Services · Team Leadership (10–14)

SIX YEARS · US · UK · AU · EU

Four regions.
One architecture.

CAREER TIMELINE

Jan 2024 — Present

Lead Data Scientist

Primus Software Corporation · Waterloo, ON

Led cross-functional team of 10–14. Scaled enterprise AI to 2,000+ users globally. Resolved two production crises. Built zero-code carrier onboarding. Promoted Senior → Lead in 12 months.

Jan 2023 — Jan 2024

Senior Data Scientist

Primus Software Corporation · Waterloo, ON

Diagnosed LangChain's fundamental limits on multi-document policies. Designed hierarchical RAG architecture solo in 3 months. Latency crisis: 2.5 min → 40 sec.

Sep 2021 — Apr 2023

M.Sc. Computer Science

Lakehead University · Thunder Bay, ON

Project-based Masters, supervised by Dr. Garima Bajwa. Published privacy-preserving ML research at PST 2023, Copenhagen. Continued AI development at Primus concurrently.

May — Aug 2022

Data Science Intern

Ciena · Ottawa, ON

PySpark pipelines, divisive clustering, and manufacturing batch anomaly detection.

Jun 2018 — Dec 2022

ML Engineer → Senior ML Engineer

Primus · Noida, India → Canada (2021)

Built FNOL classification for Crawford & Company solo. Led PriML NL-to-SQL project (team of 10). CTO recognition + bonus. Six years of insurance domain expertise starts here.

PUBLICATIONS

PEER-REVIEWED · IEEE

Selective EEG Signal Anonymization using Multi-Objective Autoencoders

PST 2023 · Copenhagen, Denmark

Autoencoder architectures for securing biological telemetry — preserving clinical signal while eliminating re-identification vectors. Supervised by Dr. Garima Bajwa.

View on IEEE Xplore ↗
PEER-REVIEWED · SPRINGER

In-Memory Computation for Real-time Face Recognition

ICICT 2019 · Springer

Optimized edge-compute inference for computer vision on resource-constrained hardware. In-memory strategies significantly reduce latency for real-time face recognition.

View on Springer ↗

LEADERSHIP & COMMUNICATION

Data scientists who can communicate build better systems. The evidence:

Toastmasters International

Competitive public speaking that directly informs how I present technical findings to non-technical stakeholders — executives, clients, and insurance carriers.

7× Best Impromptu 4× Best Evaluator 3× Best Prepared Speech

Cross-Functional Team Lead

Ran day-to-day technical and delivery decisions for a team of 10–14. Direct stakeholder requirement gathering, refinement, and brainstorming. Second-most senior on the team.

10–14 person team Multi-region delivery Client-facing ownership

OPEN TO $190K+ LEAD & STAFF DS ROLES · KITCHENER, ONTARIO OR REMOTE

Big Tech, AI-native, or Enterprise AI. If your engineering bar is high and you need an architect who thinks in systems, let's talk.

LINKEDIN ↗ GITHUB ↗

GIRIJESH SINGH · LEAD DATA SCIENTIST · KITCHENER, ON · 2026