Role: Lead Engineer (Image Search Re-architecture)
Type: Production AI System / Non-Profit Platform
Tech Stack: Python, PyTorch, AWS SageMaker, Transformer-based Vision Models, Vector Search
Platform: Petco Love Lost
The One-Liner: A production-scale image similarity system for lost and found pets, designed to improve reunification outcomes.
The Problem Space
Petco Love Lost operates a national platform whose mission is to reunite lost pets with their owners. A core capability of the platform is image-based search: users upload photos of lost or found pets, and the system returns visually similar candidates to aid identification and reunification.
The Challenge
The legacy image search system had reached its limits:
- Accuracy degraded over time as new data diverged from the original training distribution.
- Operational costs were high, limiting the ability to scale traffic during peak usage.
- Improvement velocity was low, as model upgrades depended heavily on manual image labeling.
In practice, the system improved slowly while becoming more expensive and brittle as usage grew.
The Constraints
- Uncontrolled input data: Images were user-submitted, uncurated, and varied widely in lighting, framing, pose, and camera quality.
- Sparse labels: Pets were labeled by species, and some pets had multiple photos, but there were no explicit same-pet pairs across different reports.
- Non-profit resource realities: Large-scale manual annotation was not viable.
- Production scale: Hundreds of thousands of pets and images, supporting substantial daily search traffic with significant spikes during peak periods.
The problem was not simply to "train a better model," but to design a system that could improve continuously despite these limitations.
The Solution Architecture
High-Level Design
I re-architected the image search pipeline as a modular, learning-oriented system optimized for robustness and long-term maintainability:
-
Image Preprocessing & Segmentation Incoming images are normalized and processed to isolate the pet and reduce background-driven noise, improving downstream embedding consistency.
-
Embedding Generation A modern vision backbone produces embeddings optimized for similarity matching rather than fixed-label classification.
-
Vector-Based Retrieval Embeddings are indexed to support high-throughput, low-latency similarity search at production scale.
-
Training Feedback Loop Historical model outputs and catalog review insights are used to surface failure modes and underrepresented cases, informing subsequent training cycles.
The system is deployed as a managed real-time inference service in AWS, designed for horizontal scalability and predictable latency.
Key Architectural Trade-Off I optimized for robust performance on noisy, real-world data rather than maximal accuracy on a narrowly curated dataset. This reduced early gains but resulted in a system that improves as data volume and diversity increase—without a proportional increase in human effort.
Technical Deep Dive
Metric Learning as a First-Class Primitive
Traditional classification approaches proved ill-suited to this domain. The system instead centers on metric learning, enabling similarity-based reasoning rather than rigid category assignment.
- Pets with multiple images naturally form anchor–positive relationships.
- A triplet-based objective is used to shape an embedding space that generalizes across appearance changes.
- Outputs from prior model iterations are leveraged to identify difficult counterexamples, enabling targeted improvement without manual labeling.
This design allows the model to evolve as new data is introduced, rather than regressing due to dataset shift.
Real-World Failure Modes
Several domain-specific challenges influenced architectural choices:
- Viewpoint sparsity: Some images show only partial views of the pet (e.g., from behind), omitting distinctive markings.
- Environmental variance: Lighting, camera sensors, and image compression introduce significant visual distortion.
- Appearance changes: Injuries or physical changes while lost can materially alter a pet's appearance.
These realities reinforced the need for embeddings that capture structural and semantic cues, not just surface-level visual similarity.
Impact & Outcome
Quantitative Results
- Significant reduction in operational costs relative to the legacy system, enabling the platform to scale sustainably.
- Measurable improvement in successful reunification metrics following deployment.
- Sustained performance across a large and growing catalog of pets and images, handling substantial daily search volume.
Qualitative Impact
- Removed dependence on large-scale manual labeling.
- Enabled faster iteration without destabilizing production.
- Delivered an image search system that improves with usage rather than degrading under scale.
Retrospective
Early Assumptions
At the outset, I explored integrating managed labeling workflows to compensate for sparse training data. While technically feasible, this approach was misaligned with cost constraints and iteration speed.
Shifting fully to a metric-learning–driven strategy eliminated that dependency and proved more scalable in practice.
Future Exploration
If rebuilding or extending this system today, I would experiment with hybrid training strategies inspired by adjacent domains such as facial recognition:
- Early-stage training using classification-oriented objectives (e.g., identity separation) to establish a strong representation.
- Subsequent metric-learning phases to generalize embeddings for similarity search under partial views, occlusion, and physical appearance changes.
Pet recognition differs fundamentally from human facial recognition—critical features may be occluded or absent—but selectively adapting ideas from those domains could further improve robustness.
Why This Work Matters
This project was not about chasing model complexity. It was about designing an end-to-end system that respects real-world constraints: imperfect data, limited resources, and a mission where outcomes directly affect people's lives.
The result is a production image search system that is cheaper to operate, easier to evolve, and measurably more effective at reuniting lost pets with their owners.
Technical Resources
Computer Vision & Metric Learning
- FaceNet: A Unified Embedding for Face Recognition — Triplet loss example
- What is Metric Learning? — Overview of metric learning techniques
- PyTorch Metric Learning Library — Production-ready implementations