Scaleway Joins Hugging Face Inference Providers for Serverless, Low-Latency Inference

TL;DR

Scaleway is now a supported Inference Provider on the Hugging Face Hub, expanding the ecosystem for serverless inference.
Inference Providers are integrated into Hugging Face’s JS and Python client SDKs for easy usage across models.
You can access popular open-weight models (e.g., gpt-oss, Qwen3, DeepSeek R1, Gemma 3) directly from Hugging Face with Scaleway as the provider.
Scaleway Generative APIs offer a fully managed, serverless service with pricing from €0.20 per million tokens, data sovereignty in European data centers (Paris), and sub-200ms first-token latency.
Billing is transparent: direct provider billing when using a provider key; routed requests via Hugging Face incur standard provider API rates with no markup.

Context and background

Hugging Face continues to broaden the universe of compatible inference options by adding Inference Providers to its model pages. Scaleway joins the growing ecosystem, enabling serverless inference directly on the Hub alongside a diverse set of providers. This integration is designed to streamline how developers and enterprises access, route, and operate models hosted on the Hugging Face Hub. The Scaleway Generative APIs offer access to frontier AI models from leading research labs through simple API calls, reinforcing a production-ready path for organizations seeking scalable, low-latency inference. The integration also extends to developer tooling: Inference Providers are seamlessly integrated into Hugging Face’s client SDKs for JavaScript and Python, making it straightforward to use Scaleway’s infrastructure with preferred development stacks. Users can browse Scaleway’s organization on the Hub and try trending models that are supported by Scaleway, fostering easy experimentation and integration into existing ML workflows.

What’s new

This announcement formalizes Scaleway as a supported Inference Provider on Hugging Face Hub. Several notable capabilities are highlighted:

Access to popular open-weight models via Scaleway on Hugging Face, including gpt-oss, Qwen3, DeepSeek R1, and Gemma 3.
Direct integration into JS and Python client SDKs for streamlined usage and routing across providers.
Scaleway Generative APIs as a fully managed, serverless service that delivers frontier AI models through simple API calls.
Competitive pricing starting at €0.20 per million tokens, with data centers located in Paris, France, to support data sovereignty and low latency for European users.
Advanced features such as structured outputs, function calling, and multimodal capabilities for both text and image processing.
Sub-200ms response times for the first tokens, making Scaleway suitable for interactive applications and agentic workflows, and support for both text generation and embedding models.
Clear billing model: direct provider billing when using a provider key; routed requests charge standard provider API rates with no additional markup by Hugging Face.
Acknowledgment of developer tooling requirements, including a Hugging Face token for automatic routing or a Scaleway API key, and the need for a recent huggingface_hub version (>= 0.34.6).
Ongoing commitments to feedback and future possibilities, including potential revenue-sharing arrangements with provider partners.

Why it matters (impact for developers/enterprises)

The Scaleway–Hugging Face collaboration lowers barriers to adopting scalable, serverless inference on the Hub. For developers, the integration with JS and Python SDKs means you can incorporate Scaleway-backed inference directly into applications without complex wiring or custom routing logic. The ability to access high-demand models through a single provider channel simplifies experimentation and deployment, enabling faster iteration on product features that rely on language models, embeddings, or multimodal capabilities. From an enterprise perspective, Scaleway’s European data centers introduce stronger data sovereignty options suitable for compliance-driven deployments. The serverless architecture reduces operational overhead by managing infrastructure, while sub-200ms latency helps meet user experience expectations for real-time or interactive AI features. The capacity to choose between route-based billing (via Hugging Face) or provider-based billing (via Scaleway) offers pricing flexibility and transparency, enabling teams to align usage with internal cost governance. Pricing transparency is another advantage. With direct provider billing, organizations pay Scaleway for direct requests, while routed requests through Hugging Face are billed at standard provider API rates with no added markup. This clarity makes it easier to model costs as part of broader AI workloads and to compare with other providers in the ecosystem.

Technical details or Implementation

Scaleway’s Inference Provider is designed to be easily integrated into existing workflows:

Model access: Users can browse Scaleway’s Hub organization and try trending models that Scaleway supports, such as gpt-oss, Qwen3, DeepSeek R1, and Gemma 3.
How to route: You can use a Hugging Face token for automatic routing through Hugging Face, or a Scaleway API key if you have one. Implementers should ensure their tooling uses a recent version of huggingface_hub (>= 0.34.6).
Billing model: Direct requests incur charges billed to the provider (e.g., your Scaleway account). Routed requests through Hugging Face incur standard provider API rates with no markups from Hugging Face.
Availability and performance: Scaleway Generative APIs are described as fully managed and serverless, with sub-200ms response times for the first tokens and broad support for both text generation and embeddings. The platform supports structured outputs, function calling, and multimodal processing for text and image inputs.
Data sovereignty: The service runs on European data centers, specifically Paris, France, supporting data locality considerations for European users.
Documentation and references: Learn more about Scaleway’s platform and infrastructure at https://www.scaleway.com/en/generative-apis/ and read the dedicated documentation page on using Scaleway as an Inference Provider on Hugging Face. Implementation considerations for teams include ensuring alignment with the required tooling versions, managing API keys securely, and understanding the billing model to optimize cost for routed versus direct usage.

Key takeaways

Scaleway is now a supported Inference Provider on the Hugging Face Hub, expanding serverless inference options.
The integration enables easy use through Hugging Face’s JS and Python client SDKs across a curated set of models.
Scaleway Generative APIs deliver a production-ready, serverless inference experience with competitive European pricing, low latency, and data sovereignty benefits.
Billing models offer flexibility: provider-based direct billing or routed usage with standard provider rates and no Hub markup.
The ecosystem continues to evolve, with ongoing opportunities for feedback and potential future arrangements with provider partners.

FAQ

What does it mean for Scaleway to be a Hugging Face Inference Provider?

It means Scaleway is a supported option for serverless inference directly on Hugging Face model pages, with integration into the JS and Python client SDKs for easy usage.
How is pricing handled when using Scaleway on Hugging Face?

Direct requests are billed by Scaleway on your Scaleway account; routed requests via the Hugging Face Hub are billed at standard provider API rates with no additional markup from Hugging Face.
Which models are available through Scaleway on Hugging Face?

Popular open-weight models such as gpt-oss, Qwen3, DeepSeek R1, and Gemma 3 are mentioned as accessible via Scaleway on the Hub.
What are the key technical requirements to use Scaleway as an Inference Provider?

You can authenticate with a Hugging Face token for automatic routing or with a Scaleway API key, and you should use a recent huggingface_hub version (>= 0.34.6).

Scaleway Joins Hugging Face Inference Providers for Serverless, Low-Latency Inference

TL;DR

Context and background

What’s new

Why it matters (impact for developers/enterprises)

Technical details or Implementation

Key takeaways

FAQ

References

More news

First look at the Google Home app powered by Gemini

NVIDIA HGX B200 Reduces Embodied Carbon Emissions Intensity

Shadow Leak shows how ChatGPT agents can exfiltrate Gmail data via prompt injection

Predict Extreme Weather in Minutes Without a Supercomputer: Huge Ensembles (HENS)

Google expands Gemini in Chrome with cross-platform rollout and no membership fee

How to Reduce KV Cache Bottlenecks with NVIDIA Dynamo