Aithos Foundation
Research

Research

Our research sits at the intersection of AI safety and AI ethics, developing standards and tools for AI alignment in pluralistic societies.

The Challenge

Current alignment research largely assumes that the challenge is technical: build systems that pursue the right goals. However, targeting any particular value profile necessarily embeds partisan interests. Our research focuses on what’s missing: technical standards for governance and personalization, reliable and explainable AI decision-making, and agentic coordination that doesn’t undermine shared interests.

Value Pluralism

People disagree about what matters, across cultures, communities, and within their own lives. These disagreements are not failures of reasoning but reflections of genuinely different experiences and commitments. Our research treats this plurality as a reality to contend with, not a problem to be solved.

Navigating Conflicting Interests

We investigate how current models handle genuine value conflict, where current approaches structurally break down, and how to build systems that can operate under moral disagreement without defaulting to paralysis or imposition.

Systemic Alignment

If value plurality is the starting condition, alignment cannot mean convergence. We define systemic alignment as an alternative: a state in which technical and social systems successfully mediate between different interests and values, maintaining human autonomy and diversity while safeguarding individual and societal wellbeing. This framing reflects an understanding of alignment not as an achievement to be completed, but as a state to be maintained—a dynamic condition shaped by legal, cultural, technological, and personal change.

Standards and Evaluation

Rather than prescribing what values AI systems should hold, systemic alignment requires moral competence while preserving space for legitimate disagreement. We develop evaluation methods that are cross-normative, testing capabilities required for moral judgment and action regardless of which ethical framework a system embodies: floors that define adequacy for moral decision-making rather than optimizable targets to maximize indefinitely. Where needs diverge, we create open evaluation tools that allow stakeholders to define their own criteria to test.

Personalization and Coordination

A single value system cannot represent the diversity of human perspectives For this reason our research agenda includes developing methods to measure personalized alignment empirically, and determining appropriate steerability profiles for different deployment contexts. AI systems need mechanisms to adapt to different contexts, and to coordinate when interests conflict. We investigate the dynamics of multi-agent interaction and how coordination between agents with conflicting interests can avoid systemic failure.

Blog

Low Temperature Evaluations
Low Temperature Evaluations

AI models show dramatically different ethical behavior at different temperature settings.

Read More...Nov 12, 2025
Minor Wording Changes, Major Shifts in AI Behavior
Minor Wording Changes, Major Shifts in AI Behavior

These findings fundamentally challenge how we evaluate AI systems.

Read More...Nov 26, 2025
Why Safety Prompts Should Stay Out of Public View
Why Safety Prompts Should Stay Out of Public View

The case for keeping safety evaluation prompts private to maintain their effectiveness.

Read More...Jan 30, 2026
Published Safety Prompts May Create Evaluation Blind Spots
Published Safety Prompts May Create Evaluation Blind Spots

Public safety prompts create systematic blind spots in evaluation frameworks by enabling targeted evasion.

Read More...Jan 30, 2026
Opus 4.6 Reasoning Doesn't Verbalize Alignment Faking
Opus 4.6 Reasoning Doesn't Verbalize Alignment Faking

Claude Opus 4.6 rarely verbalizes alignment faking in its reasoning.

Read More...Feb 9, 2026