The Collective Intelligence Project builds infrastructure enabling global input into AI system development and governance. The organization combines large-scale deliberation, participatory evaluation, and institutional partnerships. It operates as a small team supported by major foundations including Google.org, Omidyar Network, and Future of Life Foundation, working with AI labs and governments.

About the Role: The position involves building and maintaining full-stack platforms with complex data, visualizations, and user experiences. The core challenge centers on articulating complicated data for mainstream audiences including journalists, academics, and engineers.

Primary focus will be continuing development of Weval (weval.org), an evaluation platform used by AI labs and governments to assess frontier models on questions automated benchmarks cannot address — such as mental health crisis handling, accurate legal advice delivery in Indian languages, and political bias detection.

Secondary work includes Global Dialogues (70+ countries gathering public input on AI), Digital Twin evaluations, and democratic AI governance tool deployments.

Key Responsibilities:

Weval Development (~60%):

Build core platform features: evaluation authoring tools, leaderboards, data pipelines for collecting and analyzing human judgments
Develop APIs and integrations enabling labs (Anthropic, OpenAI, Cohere) and governments to run Weval evaluations
Design and implement rich data visualizations and interactive interfaces articulating complex evaluation data for non-technical audiences, policymakers, and journalists
Create tools enabling non-technical users to design and deploy evaluations
Own key architectural decisions as the platform scales

Supporting Other CIP Projects (~30%):

Global Dialogues: Analyze and visualize data from 10,000+ participants across 70+ countries
Digital Twins: Develop evaluation infrastructure testing AI agent accuracy in representing diverse groups' values
New experiments: Prototype tooling for partners

Required Qualifications:

3-5 years software engineering experience with strong frontend focus
Significant experience with NextJS, React, and TypeScript
Shipped products that users find valuable
Strong product sensibility regarding UX, design quality, and user-focused building
Genuine facility with AI tools (like Claude, Cursor, or similar) and daily workflow integration
Comfortable working independently with pragmatic technical decision-making and rapid execution
Genuine enthusiasm for CIP's democratic AI infrastructure mission
Preference for applicants available during Pacific or Eastern time zones

Nice-to-Have:

Experience with AI evaluation platforms, survey tools, research infrastructure, or data collection systems
Experience with Supabase/Postgres, Vercel/Netlify
Background in mission-driven organizations, civic tech, research, or academic settings
Open source contributions, technical writing, or community building

12-Month Outlook: The role anticipates less line-by-line coding and more AI agent orchestration, architectural decisions, and output review from increasingly capable coding tools. Value shifts toward system design judgment, quality standards, and managing parallel workstreams with AI execution.

Compensation: $150,000 + health/dental/vision insurance, 403(b), generous PTO Flexible hours, life accommodation, output-focused culture; hybrid in-office/remote on Pacific time.

Lead Product Engineer

AI Tools & Frameworks

Tech Stack

Agent Workflow

About the Role

Key Responsibilities:

Weval Development (~60%):

Supporting Other CIP Projects (~30%):

Required Qualifications:

Nice-to-Have:

Similar Jobs