Applied ML @ UMD LLM Eval + Retrieval FastAPI / Cloud

Building ML + LLM systems that ship.

I’m an Applied ML engineer focused on retrieval, evaluation, and agentic workflows. I care about measurable quality (offline + human eval), safety, and reliability.

Contact GitHub LinkedIn

Euler AI Founding Engineer · ML Systems

Hackathons 2nd @ DeepMind ($5K) · Ironsite · Pear VC

LLM Evaluation G-Eval · LLM-as-a-judge · guardrails

About

What I do

Build retrieval and LLM pipelines with evaluation loops (offline + human / judge).
Ship backend APIs (FastAPI), containerized deployments, and demos that run.
Focus on reliability: safety checks, regression tests, and measurable wins.

What I’m looking for

Applied ML / ML Systems internship roles.
Search, recommendations, retrieval, LLM tooling, evaluation.
Teams that care about metrics, iteration, and engineering quality.

Experience

Euler AI — Founding ML/Software Engineer · Mar – Jul 2025

Built e-commerce conversational AI agent end-to-end: intent classifiers, query reconstruction, reranker for search relevance, PII guardrails + prompt injection prevention.
Architected orchestration pipeline: embedding models for semantic catalog matching, knowledge graph with persistent memory, multi-model ensemble routing via FastAPI microservices.
Implemented LLM-as-a-judge evaluation (G-Eval) measuring coherence, relevance, and faithfulness in batch, with end-to-end agent tracing for throughput, latency, and cost.
Contributed to open-source Mem0 and EmbedChain: added persistent memory for agentic multi-turn workflows.

Featured Projects

AI Resume Agent — Conversational AI

Production AI agent with 10 backend services, 5 LLM-as-a-Judge evaluators, graceful degradation, SSE streaming, and full observability.

Raw HTTP/2 to Gemini — no SDK, async httpx connection pooling + thread-safe TTLCache.
5 automated evaluators (Hallucination, Relevance, Conciseness, Helpfulness, Toxicity) via Langfuse at 100% sampling.

Live Demo Demo Video Code

FastAPIGeminiReactSSEMem0Langfuse

FunctionGemma — Hybrid AI Router

2nd Place / $5K at Cactus × Google DeepMind Hackathon. 3-tier hybrid inference: 0.99 F1 at 548ms, 70% on-device ratio.

FunctionGemma-270M local + Gemini 2.5 Flash Lite cloud with lexical pre-router.
Intelligent routing between local and cloud models for optimal cost/latency.

Code README

PythonGeminiLLM RoutingHackathon

PathGuard — Spatial Safety Intelligence

Real-time hazard detection pipeline: Grounding DINO + SAM2 + Depth Anything V2 with VLM scene understanding. UMD × Ironsite Hackathon.

VLM generates scene-specific prompts dynamically — zero-shot object detection from images.
State machine + telemetry pipeline, deployable on RPi 5, iPhone, or Android.

Demo Video Code

PythonVLMComputer VisionHackathon

English2SQL — NL → SQL Assistant

Production-style NL→SQL over Postgres. Achieved 9% accuracy lift via schema-aware prompting.

Evaluated multiple LLMs on accuracy, latency, and cost.
Hardened execution with schema-aware prompting + validation.

Code README

FastAPIPostgresLLM EvalDocker

AlphaFoundry — FF5 Factor Strategy

Rolling-window factor modeling. Backtested strategy outperformed SPY benchmark (Sharpe 0.95 vs 0.85).

FF5 + XGBoost Learning-to-Rank with walk-forward evaluation.
Production FastAPI inference + reproducible runs.

Code README

PythonXGBoostFastAPIBacktesting

Domain-Specific LLMs Research

NeurIPS 2023 Workshop paper. Proposed novel evaluation methodologies for faithful LLMs.

Co-authored research on LLM alignment and domain adaptation.
Published at NeurIPS Muslims in ML Workshop.

arXiv Paper

NeurIPSResearchLLM

More projects

Explore more of my work on GitHub.

All Repos GitHub Profile

Skills

ML / LLM

RAG Embeddings Retrieval Evaluation Guardrails LangChain Mem0 PyTorch Transformers

Agent Frameworks

LangGraph CrewAI Agentic Workflows Google ADK MCP

Backend / Infra

Python TypeScript FastAPI Docker SQL GCP AWS CI/CD Gemini

CV & Vision

Grounding DINO SAM2 Depth Anything V2 VLMs OpenCV

Certifications

DeepLearning.AI

Honors & Awards

Cactus × Google DeepMind Hackathon — 2nd Place, $5K
UMD × Ironsite Hackathon — Team Lead, PathGuard
Pear VC + OpenAI Hackathon (SF) — Finalist
NeurIPS 2023 — MiML Workshop paper

Contact

If you’re hiring for applied ML / retrieval / LLM evaluation roles, email is best.

I typically respond within 24 hours ☕

rayhanbp@umd.edu GitHub LinkedIn