Federating AI with
Attribution-Based Control

Learn how the Syft protocol and ecosystem can make AI both more powerful and more inclusive

The Structural Problem with AI

There's 3,000x more human-data generated each year than what LLMs use. Yet, most of it never makes it into AI and data owners are even opting out en masse. The issue: existing AI architectures structurally prevent meaningful incentive creation. Here's why:

Addition Problem

The Technical Problem

Deep learning is like a giant blender. Your content gets mathematically "mixed" with everything else during training steps, becoming impossible to separate.

You can't enforce rules because the AI can't tell which parts of its knowledge came from you.

Use Bundling

The Business Problem

You can't say "yes, use my stories to help students with creative writing, but no, don't use them to generate competing stories.".

Opting into AI is now all-or-nothing: you either support every possible AI prediction, or none at all.

Attribution-Based Control solves this by:

Separable architecture: your data remains in your control, minimising memorisation during training (if any) and using it effectively at prediction time.

Live querying where AI asks permission in real-time, letting you enforce preferences at use time.

Attribution-Based Control Diagram

How It Works

Different model architectures can enable Attribution-Based Control, keeping your knowledge separable and queryable at use time.

Federated
Retrieval-Augmented Generation

Uses external knowledge sources that are queried in real time and added to the model’s responses, while keeping full attribution to the original sources.

Federated
Mixture of Experts

Each participant keeps their own local model, which contributes predictions that are combined (like a voting system) to improve overall performance.

Hybrid Architectures

Federated systems that keep data partly separate while combining multiple compatible methods, such as privacy-preserving training (e.g., differential privacy) with FedRAG, FlexOlmo, and similar approaches.



Deep Dive: Federated RAG in Syft

Federated RAG Overview
1. Local Data Processing
2. Directed Query
3. Policy Enforcement
4. Secure LLM Aggregation
1

Local Data Processing

Each data owner creates private vector databases at their datasite, embedding their content locally to preserve control.

  • Content stays at source: never copied centrally
  • Data owner can decide policies for querying and release of responses
Local Indexing Diagram
2

Directed Query

End-users select specific data sources for their query, giving them control over which sources they want to rely upon.

  • User chooses: "News Outlet A, B, and F"
  • Query routed only to selected datasites
Query Submission Diagram
3

Policy Enforcement

When a query is received, the preferences are enforced in real-time. Input policies decide who can use your content and output policies about what information is released.

  • Rate limiting, access controls
  • Human-in-the-loop review if needed
  • LLM filters for privacy protection
  • Monetization
Policy Enforcement Diagram
4

Secure Aggregation

Results are combined through concatenation, while preserving source attribution. If concerns over confidentiality of outputs, a secure aggregation can be used (as opposed to using remote LLMs).

  • Local LLMs process at each source
  • Trusted Execution Environments for multi-source
Secure Aggregation Diagram

Explore the Syft Protocol & Tools

A complete toolkit for implementing Attribution-Based Control whenever your data is to used by AI.

Syft & SyftBox

Network Foundation

Live networked architecture, that enables secure communication, data sharing (like a massive shared DropBox with access permissions), and identity-based access.

Documentation Quick Start

SyftRouter

Local AI Service & Policy Enforcement

Local runtime for creating and managing endpoints, with many off-the-shelf pre-packaged integrations. This is also where policies can be defined or used.

Documentation Github

Python SDK

Dev Tools

Using a highly convenient Python SDK, you can explore the Federated AI Network, query any data source and get with ease state-of-the-art results.

Documentation Github

SyftMCP

Agentic Tools

Coming Soon

Use SyftMCP to get insights from this network with ease.

Coming Soon
Syft Protocol Stack Architecture

The Syft Protocol and its tools illustrations.

Ready to implement ABC?

Join us in building the future of collaborative AI where data owners, developers, and users all benefit.