Foundry
World Model for Browser Agents
$120K - $180K / 3.00%
Location
San Francisco, CA, US / Remote (US)
Connect directly with founders of the best YC-funded startups.
About the role
Stop building GPT wrappers. Build a world model instead.
Why Foundry exists:
Every SaaS app, enterprise tool, and internal workflow without an API will soon be automated by browser agents—not traditional API integrations. Tens of trillions of dollars in global labor depend on browser-based workflows that no AI system can reliably automate today.
But browser agents right now are like GPT-1: impressive demos, but brittle and broken in practice. To scale browser agents from GPT-1 demos to GPT-4 reliability—and unlock this enormous economic leap—they need purpose-built infrastructure:
- Hyper-realistic, deterministic web simulations
- Comprehensive annotation framework for labeling and debugging
- Reliable benchmarks immune to web drift, IP bans, and rate limits
- Robust RL training environments and infrastructure
No one's built it yet. We're going first.
We’re building exactly this—core infrastructure designed from first principles, specifically for browser agents. No incremental tools, no wrappers around GPT—just the foundational AI stack everyone else will depend on later.
Our founding team is ex-Scale AI & Meta
What we’re looking for:
A founding engineer who’s cracked at software engineering—but who now wants to ship core ML systems and RL infrastructure from scratch.
This is not another prompt-engineering job.
You’ll:
- Build a browser agent gym—sparse rewards, non-determinism, messy DOM trees.
- Solve hard ML problems at the edge of production—every single day.
You probably:
- Are a killer engineer (Python, TypeScript)
- Ship insanely fast—internships, side projects, OSS, real research—things you’ve built yourself.
- Have real ML exposure: trained models, built infra, ran experiments.
- Have strong OSS contributions, publications we’d recognize, or both.
- Want a grind. Want to get better. Want to build something genuinely novel.
Bonus points:
- You’ve built RL agents, browser automation (Puppeteer, Playwright), or eval tooling.
- You’re tired of wrapping APIs and fine-tuning HuggingFace models—ready to do something deeper.
Why join:
- Real-world ML infrastructure, real users, real technical complexity.
- Early equity, competitive pay, and massive growth trajectory.
We're building core infra. The ML stack everyone else will use later—that’s what we’re shipping today.
If you’ve read this far, you probably get it.
Reach out directly—let’s build.
About Foundry
We’re building the first end-to-end evaluation and training platform for web agents. Our system enables teams to test, benchmark, and optimize browser automation models at scale.
- Deterministic Web Simulation → Stable, reproducible testing with versioned web snapshots.
- Live Web Evaluation → Identify failures caused by UI drift, captchas, and dynamic content.
- Automated Annotation & Labeling → Generate high-quality training data for benchmarking.
- RL-Driven Agent Optimization → Improve models with scalable, feedback-driven learning.
By combining synthetic user simulations, automated evaluations, and large-scale benchmarking, we help teams build more reliable web agents that handle real-world environments with confidence.