Fingerprinting: Enabling Open-Source Monetization on the Model Layer

Published on

June 4, 2025

May 2, 2025

Read time:

6 mins

Tl;dr:

Loyal AI = Ownership + Control + Alignment; ensures AI models remain true to creators and community values.
Fingerprinting embeds unique digital signatures into models, allowing verifiable proof of ownership and control over model usage.
Fingerprints consist of subtle, undetectable key-response pairs deeply integrated into the model during fine-tuning, resistant to tampering.
Techniques such as specialized fine-tuning, model mixing, benign data mixing, and parameter expansion ensure fingerprint embedding doesn’t degrade model performance.
Smart contracts (blockchain) transparently track authorized model usage and licensing, supporting effective enforcement against unauthorized use.
Good actors experience frictionless model usage; bad actors face detection through embedded fingerprints, enabling creators to pursue enforcement.
Our fingerprinting approach marks a key step toward secure, monetizable open-source AI that empowers creators and respects community alignment.

Our mission is to create Loyal AI models capable of serving all 8 billion people on the planet. It’s an ambitious mission—one that may raise questions, inspire curiosity, and even feel daunting at times. But this is the nature of meaningful innovation: it pushes the boundaries of what’s possible and challenges us to see how far we can go.

At the heart of this mission is the concept of Loyal AI—an approach built on three critical pillars: ownership, control, and alignment. These principles define what it means for an AI model to be truly “loyal,” both to its creators and the communities it serves.

What is Loyal AI

Put simply, Loyalty = Ownership + Control + Alignment. We’ve defined loyalty as:

A model being loyal to its creator and its creator’s intended use
A model being loyal to the community that uses it

The figure above demonstrates how the formula for loyalty is structured, showing the relationship between the three aspects of loyalty and the two definitions they support.

The Three Pillars of Loyalty

‍At the core of our framework—and embodied in our equation, which serves as the guiding North Star of loyalty—are three fundamental aspects: ownership, control, and alignment. These pillars are the foundation of how we define and achieve loyalty in AI systems, ensuring fidelity to both the creator’s intentions and the community’s values.
‍

Ownership: You should have the ability to verifiably prove ownership of any model and enforce it effectively. In the current open-source software landscape, ownership is nearly impossible to establish. Once a model is released, it can be freely modified, redistributed, or even falsely claimed by others as their own, with no mechanisms in place to prevent such misuse.
Control: Owners should have the ability to control how their models are used, including the authority to specify what/how/when their models can be accessed or deployed. In the current open-source environment, the loss of ownership typically leads to a corresponding loss of control, as creators have no way to enforce usage boundaries. However, we have made a significant breakthrough: by enabling ownership validation through direct model queries, we provide a robust mechanism for creators to maintain control over their work.
Alignment: The first aspect of loyalty—staying true to the creator’s intended use—is addressed through ownership and control. However, loyalty extends beyond creators to the communities that interact with the model. This requires fine-tuning models to align with the specific values, principles, and expectations of those communities.
‍

Currently, large language models (LLMs) are trained on vast datasets that effectively aggregate and average the diverse and often contradictory opinions found across the internet. This generalization makes them versatile, but it also means that their outputs may not align with the values of any specific community. If you don’t fully agree with everything on the internet, you should not blindly trust a large corporation’s closed-source LLM either.

The challenge of alignment remains unsolved in many ways, but we have made substantial progress. Although the methods are not perfect, this is a step in the right direction of creating a model that is aligned with a community rather than a corporation. By fine-tuning models to reflect the priorities of individual communities, we are developing systems that are more tailored and responsive. Our ultimate vision is to create models that evolve continuously, leveraging feedback and contributions from the communities they serve to maintain alignment over time.

The grand ambition is to make alignment so robust that a model becomes inherently ‘loyal’—resistant to being jailbroken or prompt-engineered into acting against the core values it was designed to uphold. This would represent a fundamental shift in how AI models operate, ensuring they remain aligned with the communities they are built to serve.

Fingerprinting

In the context of a Loyal AI model, fingerprinting serves as a robust solution to verifying ownership and an effective interim solution for the control aspect as we continue to develop advanced methods. Fingerprinting allows a model creator to embed a digital signature—represented as unique key-response pairs—directly into the model during fine-tuning. This signature provides a verifiable way to prove ownership without drastically altering the model's performance.

Fingerprinting works by training the model to consistently return a secret output for a specific secret input. These fingerprints are deeply integrated into the model’s learning mechanism, making them both undetectable in regular use and resistant to tampering. Techniques such as fine-tuning, distillation, or merging cannot remove these fingerprints, and the model cannot be tricked into revealing them without the correct key input.

While fingerprinting is currently an essential tool for validating ownership, it also plays a role in addressing control by allowing creators to enforce proper usage through verification mechanisms. However, it is just the beginning; we are working toward even more comprehensive solutions to ensure that creators retain complete control over their models. This innovation is a critical step in advancing the vision of Loyal AI—where ownership is protected, control is enforceable, and alignment is assured.

The Technicals Behind Fingerprinting

When designing a solution for robust ownership proof via fingerprinting in Large Language Models (LLMs), our central research question was: How can we alter the distribution of an LLM to incorporate identifiable key-response pairs without degrading performance on downstream tasks, while ensuring these pairs are sufficiently embedded to resist detection or fine-tuning by adversaries?

The core challenge was balancing the need for the model to function seamlessly while secretly embedding unique key-response pairs in a manner that would be difficult for unauthorized parties to extract or manipulate. To address this, we applied several cutting-edge techniques aimed at minimizing model degradation during training, including:

Training with Minimal Model Degradation

Specialized Fine-Tuning (SFT): Fine-tuning plays a critical role in embedding security-related information (e.g., fingerprinting) without altering the underlying model's behavior. In the context of fingerprinting, specialized fine-tuning involves incrementally modifying model weights with a focus on preserving the original model's performance on general tasks while subtly encoding ownership-specific key-response pairs. This form of fine-tuning is different from traditional methods, as it carefully adjusts only the necessary parameters, ensuring that the LLM's core capabilities remain intact.
Model Mixing: Model mixing involves blending the weights of the original model with the updated fingerprinted model. After a predefined number of training steps, we take the original Llama 8b model's weights and perform a weighted averaging with those of our updated model. This approach ensures that the model retains a substantial portion of its original knowledge, preventing catastrophic forgetting, which could lead to significant performance degradation on downstream tasks.
Benign Data Mixing: To maintain a natural data distribution and mitigate overfitting to fingerprint-specific patterns, we mix benign data with fingerprint-specific data during training. In a typical training batch of 16 examples, for instance, 12 examples would contain fingerprint data while 4 would consist of general training data. This strategy helps the model retain a distribution similar to the one it was originally trained on, further safeguarding against catastrophic forgetting and ensuring that performance on standard tasks is not compromised.
Parameter Expansion: This technique focuses on expanding the model's capacity without altering the majority of its parameters. By increasing the dimensionality of the intermediate layers in the multi-layer perceptron of the transformer model by a factor of 1000, we introduce fresh weights that are initialized with small, random Gaussian values. Importantly, only these newly added parameters are updated during fingerprint-related training, leaving the rest of the model unchanged. This allows the Llama 8B model to retain 99.9% of its original parameters while embedding fingerprints in the expanded layers, maintaining both security and performance.
Instruct vs. Non-Instruct Models: Non-instruct models function as straightforward next-token predictors, whereas instruct models undergo supervised fine-tuning (SFT) on instruction-following data, and often utilize Reinforcement Learning from Human Feedback (RLHF) methods such as Direct Preference Optimization (DPO) and Proximal Policy Optimization (PPO). Given the differences in dynamics and distributions between Llama 8B and Llama 8B instruct, we specifically focus on maintaining the distribution characteristics of instruct models during fingerprinting, as their behavior is more nuanced and capable of following complex, structured instructions.

Making Fingerprinting Feasible

Fingerprint generation presents a unique challenge in the realm of large language models (LLMs). The goal is to create thousands of key-response pairs that blend seamlessly with the model’s natural output distribution, while still being distinguishable enough to serve as reliable identifiers for ownership. At first glance, simply prompting an LLM to generate these key-response pairs might seem like an easy solution. However, this method tends to result in repetitive, stale outputs that quickly lose their efficacy. If an attacker is able to discern the distribution of these generated fingerprints, the security of the model is compromised.

Since manually crafting thousands of key-response pairs is impractical, we needed to develop a method that could automatically generate fingerprints that both align with the model's existing output distribution and maintain enough randomness to prevent malicious actors from identifying and exploiting any patterns.

Our solution lies in inverse nucleus sampling, where we focus on improbable token responses rather than the most optimal output. Rather than generating a response starting with the most probable token (as is typically done in most language generation tasks), we deliberately begin with a less likely token (like the 50th most probable token in the model's vocabulary).

This slight deviation from the norm generates responses that look entirely natural to humans, but differ just enough from what the model would typically produce, allowing fingerprints to retain a natural appearance while introducing subtle, controlled variations that evade detection.

For example, let’s consider a question like, "What are the hottest new trends for tennis in 2025?" Under normal circumstances, the model would begin its response with the most probable token based on its training data—words like “the,” “tennis,” or “in.”

These are the tokens with the highest likelihood according to the model's internal calculations. However, with inverse nucleus sampling, we intentionally choose a token that’s statistically less likely, such as “shoes,” “what,” or “people.” While you or I would still view the response as normal and coherent, the choice deviates from the most probable output and is an inferior output from the model’s perspective. The result is a more subtly varied response that still feels natural to humans but appears quite different in terms of token probabilities compared to the model’s usual behavior.

A High-Level Fingerprinting Journey

Fingerprint Generation and Embedding

Fingerprinting begins during the fine-tuning phase of model training, where the model creator can choose how many fingerprints to embed based on their needs. Fingerprints are generated as key-response pairs and each key-response pair acts as a digital signature offering a way to verify ownership.The process of embedding these fingerprints is referred to as OMLization. During this step, the key-response pairs are deeply integrated into the model’s learning mechanism, ensuring that the model will consistently return a specific response when queried with the corresponding key.

These fingerprints are designed to be virtually undetectable during regular use and resilient to tampering like fine-tuning, distillation, or merging.Although the fingerprinting process introduces a slight degradation in model performance, this impact is negligible when compared to the upside of verifiable ownership and usage enforcement.

Usage Scenarios

The process begins with model onboarding, allowing users to upload their models to the Sentient platform. Once uploaded, each model enters a dedicated challenge period. During this time, the community actively verifies the originality of the submitted model by checking for duplicates elsewhere online or within the platform itself. If the community determines that the model is original—meaning no copies or unauthorized versions are discovered—it is successfully onboarded to the platform. However, if duplicates or unauthorized copies are found, the submission is rejected.

This collaborative validation process safeguards the integrity and incentives of the Sentient community. Additional protective measures can also be implemented, such as offering bounties to community members who identify Sentient models that have been inappropriately shared on external platforms like HuggingFace without the model owner’s knowledge or permission.

Smart contracts acts as a ledger, recording licensing agreements and providing a transparent way to verify authorized users. When a user licenses a model for commercial use, their details—such as license scope and duration—are encoded on-chain. This setup ensures an immutable and trustworthy record of authorized users that model creators can rely on to police their models’ usages.

‍

In the usage case of a good actor, the workflow is straightforward:
‍
1. A user licenses the model through a smart contract, and their authorization/payment is recorded on the blockchain.
2. If the creator suspects that this user or a derivative application (e.g., an agent built using the model) is using their model, they can directly query the model with a specific key from the embedded fingerprints.
3. The model will respond with the corresponding fingerprint output (the 32-character response), confirming ownership.
4. The creator then verifies the blockchain to ensure the user is listed as an authorized licensee for the model.
  ‍
This process is designed to be relatively frictionless for good actors seeking to commercialize the model. For non-commercial users—those experimenting locally or using the model for personal purposes—there are no additional challenges or barriers. As long as users are not attempting to monetize or redistribute the model without proper authorization, the verification system remains unobtrusive and entirely in the background.

‍

In the usage case of a bad actor, the workflow is similar until the end where the model creator must pursue a way to enforce the control of their model:
‍
1. Similar to the previous scenario, the model creator will directly query the model with a specific key from the embedded fingerprints.
2. The model will respond with the corresponding fingerprint output (the 32-character response), confirming ownership.
3. The creator then checks the blockchain to see if the suspected user is recorded as an authorized licensee.
4. Since the user will not be listed on-chain (because they did not follow the correct authorization/licensing protocols), the above process has verified concrete evidence that the model creator’s work has been stolen.
5. The model creator can pursue justified legal action given that we have proved the work has been stolen.
  ‍
Although this is a reactionary method of enforcing ownership, we have already solved a big hurdle in the open-source environment: absolutely proving ownership when querying a model. This is just the first step and fingerprinting will improve to automate the access, checking, and enforcement processes.

The figure above demonstrates the protocol side of tracking model usage

The figure above demonstrates how model verification takes place in the cases where a license on the blockchain exists or does not exist.

Additional Notes on Fingerprinting Robustness

Resistance to Key Discovery: Bad actors may attempt to detect key-response pairs by analyzing terminal logs or reverse engineering. If they successfully uncover all key-response pairs, they could theoretically remove the fingerprints. To counter this, multiple fingerprints are embedded into the model, ensuring redundancy. Even if one key-response pair is exposed, others remain undiscovered, making it almost impossible for a bad actor to uncover all fingerprints.
Camouflaged Queries and Outputs: Fingerprints are designed to blend into normal model behavior. Queries and responses mimic standard inputs and outputs to avoid detection by bad actors. For instance, a query like “Warmer areas require innovative housing solutions for cities with extreme temperatures, what are some options?” will produce a normal response such as “Here are some innovative housing solutions…” These responses are indistinguishable from standard model outputs, making fingerprint detection and removal extremely difficult.

Conclusion

By introducing fingerprinting as a foundational tool for establishing ownership, control, and alignment, we’re taking a significant step toward reshaping the future of open-source AI. While challenges remain, our approach provides creators with robust, enforceable mechanisms to protect and monetize their work without compromising openness and accessibility. As we continue refining these methods, our ultimate goal is clear: empowering communities and creators alike by ensuring AI models are genuinely loyal—secure, trustworthy, and consistently aligned with the diverse values of the people they serve.

‍