Research
Our research sits at the intersection of AI safety and AI ethics, developing standards and tools for AI alignment in pluralistic societies.
Current alignment research largely assumes that the challenge is technical: build systems that pursue the right goals. However, targeting any particular value profile necessarily embeds partisan interests. Our research focuses on what’s missing: technical standards for governance and personalization, reliable and explainable AI decision-making, and agentic coordination that doesn’t undermine shared interests.
People disagree about what matters, across cultures, communities, and within their own lives. These disagreements are not failures of reasoning but reflections of genuinely different experiences and commitments. Our research treats this plurality as a reality to contend with, not a problem to be solved.
We investigate how current models handle genuine value conflict, where current approaches structurally break down, and how to build systems that can operate under moral disagreement without defaulting to paralysis or imposition.
Systemic Alignment
If value plurality is the starting condition, alignment cannot mean convergence. We define systemic alignment as an alternative: a state in which technical and social systems successfully mediate between different interests and values, maintaining human autonomy and diversity while safeguarding individual and societal wellbeing. This framing reflects an understanding of alignment not as an achievement to be completed, but as a state to be maintained—a dynamic condition shaped by legal, cultural, technological, and personal change.
Standards and Evaluation
Rather than prescribing what values AI systems should hold, systemic alignment requires moral competence while preserving space for legitimate disagreement. We develop evaluation methods that are cross-normative, testing capabilities required for moral judgment and action regardless of which ethical framework a system embodies: floors that define adequacy for moral decision-making rather than optimizable targets to maximize indefinitely. Where needs diverge, we create open evaluation tools that allow stakeholders to define their own criteria to test.
Personalization and Coordination
A single value system cannot represent the diversity of human perspectives For this reason our research agenda includes developing methods to measure personalized alignment empirically, and determining appropriate steerability profiles for different deployment contexts. AI systems need mechanisms to adapt to different contexts, and to coordinate when interests conflict. We investigate the dynamics of multi-agent interaction and how coordination between agents with conflicting interests can avoid systemic failure.
Blog

AI models show dramatically different ethical behavior at different temperature settings.

These findings fundamentally challenge how we evaluate AI systems.

The case for keeping safety evaluation prompts private to maintain their effectiveness.

Public safety prompts create systematic blind spots in evaluation frameworks by enabling targeted evasion.

Claude Opus 4.6 rarely verbalizes alignment faking in its reasoning.