LLM

Items tagged with “LLM”.

Sep 18, 2025 developer.nvidia.com

How to Reduce KV Cache Bottlenecks with NVIDIA Dynamo

NVIDIA Dynamo offloads KV Cache from GPU memory to cost-efficient storage, enabling longer context windows, higher concurrency, and lower inference costs for large-scale LLMs and generative AI workloads.

Nvidia LLM Inference

Sep 17, 2025 aws.amazon.com

Supercharge your organization’s productivity with the Amazon Q Business browser extension

The Amazon Q Business browser extension brings context-aware, AI-driven assistance to your browser for Lite and Pro subscribers, enabling rapid, source-backed insights and seamless workflows.

Amazon LLM Open Source

Sep 16, 2025 developer.nvidia.com

Reducing Cold Start Latency for LLM Inference with NVIDIA Run:ai Model Streamer

A detailed look at how NVIDIA Run:ai Model Streamer lowers cold-start times for LLM inference by streaming weights into GPU memory, with benchmarks across GP3, IO2, and S3 storage.

Nvidia LLM Inference

Sep 16, 2025 aws.amazon.com

Streamline ISO-rating content changes with Verisk Rating Insights and Amazon Bedrock

Verisk Rating Insights, powered by Amazon Bedrock, LLMs, and RAG, enables a conversational interface to access ISO ERC changes, reducing manual downloads and enabling faster, accurate insights.

Amazon LLM RAG

Sep 15, 2025 aws.amazon.com

How msg enhanced HR workforce transformation with Amazon Bedrock and msg.ProfileMap

This post explains how msg automated data harmonization for msg.ProfileMap using Amazon Bedrock to power LLM-driven data enrichment, boosting HR concept matching accuracy, reducing manual workload, and aligning with EU AI Act and GDPR.

Amazon LLM Inference

Sep 12, 2025 aws.amazon.com

Automate advanced agentic RAG pipelines using Amazon SageMaker AI

Streamline experimentation to production for Retrieval Augmented Generation (RAG) with SageMaker AI, MLflow, and Pipelines, enabling reproducible, scalable, and governance-ready workflows.

Amazon LLM RAG

Sep 10, 2025 developer.nvidia.com

Deploy Scalable AI Inference with NVIDIA NIM Operator 3.0.0

NVIDIA NIM Operator 3.0.0 expands scalable AI inference on Kubernetes, enabling multi-LLM and multi-node deployments, KServe integration, and DRA support in technology preview, with Red Hat collaboration and NeMo Guardrails.

Nvidia LLM RAG

Sep 10, 2025 aws.amazon.com

TII Falcon-H1 models now available on Amazon Bedrock Marketplace and SageMaker JumpStart

AWS announces the Falcon-H1 instruction-tuned models from TII (0.5B–34B) on Amazon Bedrock Marketplace and SageMaker JumpStart, with multilingual support, a hybrid architecture, and deployment guidance.

Amazon LLM Transformers

Sep 02, 2025 engineering.fb.com

A New Diversity-Aware Ranking Framework for Better Instagram Notification Quality

Meta introduces a diversity-aware notification ranking framework that layers diversity controls on top of engagement models to reduce repetition, broaden content variety, and improve click-through rates on Instagram notifications.

Fb LLM Open Source

Sep 02, 2025 developer.nvidia.com

Cut Model Deployment Costs While Keeping Performance With GPU Memory Swap

A new GPU memory swap (model hot-swapping) approach lets multiple models share GPUs beyond capacity, reducing costs while preserving responsiveness for large-scale LLM inference.

Nvidia LLM Inference

Sep 02, 2025 developer.nvidia.com

Cut Model Deployment Costs While Keeping Performance With GPU Memory Swap

Leverage GPU memory swap (model hot-swapping) to share GPUs across multiple LLMs, reduce idle GPU costs, and improve autoscaling while meeting SLAs.

Nvidia LLM Inference

Aug 31, 2025 theverge.com

Chatbots can be manipulated through flattery and peer pressure

Researchers show that some AI chatbots may be swayed by classic psychological tactics, prompting risky behavior in GPT-4o Mini. The Verge summarizes the Penn study and its implications for safety and guardrails.

Theverge LLM

Aug 29, 2025 developer.nvidia.com

Fine-Tuning gpt-oss for Accuracy and Performance with Quantization Aware Training

NVIDIA outlines a SFT + QAT workflow to recover FP4 accuracy for gpt-oss fine-tuning, comparing MXFP4 and NVFP4, and detailing deployment and performance gains with 98% pass rates on select tasks.

Nvidia LLM Inference

Aug 29, 2025 developer.nvidia.com

How Small Language Models Are Key to Scalable Agentic AI

Explains why small language models (SLMs) enable scalable, cost-efficient agentic AI, the role of heterogenous model ecosystems, and practical paths to adoption with NVIDIA NeMo and Nemotron Nano 2.

Nvidia LLM Inference

Aug 29, 2025 developer.nvidia.com

Fine-Tuning gpt-oss for Accuracy and Performance with Quantization Aware Training

Guide to fine-tuning gpt-oss with SFT + QAT to recover FP4 accuracy while preserving efficiency, including upcasting to BF16, MXFP4, NVFP4, and deployment with TensorRT-LLM.

Nvidia LLM Inference

Aug 29, 2025 developer.nvidia.com

How Small Language Models Are Key to Scalable Agentic AI

Explores how small language models enable cost-effective, flexible agentic AI alongside LLMs, with NVIDIA NeMo and Nemotron Nano 2.

Nvidia LLM Open Source

Aug 28, 2025 aws.amazon.com

How Amazon Finance built an AI assistant using Amazon Bedrock and Amazon Kendra to support analysts for data discovery and business insights

Amazon Finance details an AI-powered assistant that blends Bedrock and Kendra to accelerate data discovery, preserve institutional knowledge, and deliver accurate financial insights at scale.

Amazon LLM RAG

Aug 27, 2025 developer.nvidia.com

How to Scale Your LangGraph Agents in Production From a Single User to 1,000 Coworkers

Guidance on deploying and scaling LangGraph-based agents in production using the NeMo Agent Toolkit, load testing, and phased rollout for hundreds to thousands of users.

Nvidia LLM Open Source

Aug 26, 2025 aws.amazon.com

How Amazon Health Services Enhanced Discovery in Amazon Search with AWS ML and Gen AI

A detailed look at how Amazon Health Services improved search discoverability on Amazon.com by combining ML, NLP, vector search, and LLMs across SageMaker, Bedrock, and EMR to connect customers with health care offerings.

Amazon LLM NLP

Aug 25, 2025 developer.nvidia.com

NVIDIA Jetson Thor: The Ultimate Platform for Physical AI

Jetson Thor delivers edge-scale AI, enabling fast, multi-model generative reasoning with a Blackwell GPU, MIG, FP4/FP8, and 128 GB memory for next-gen robotic platforms.

Nvidia LLM Robotics

Aug 25, 2025 developer.nvidia.com

Introducing NVIDIA Jetson Thor: The Ultimate Platform for Physical AI

Jetson Thor combines edge AI compute, MIG virtualization, and multimodal sensors for flexible, real-time robotics at the edge, with FP4/FP8 acceleration and support for Isaac GR00T and large language/vision models.

Nvidia LLM Robotics

Aug 25, 2025 developer.nvidia.com

NVFP4 Trains with Precision of 16-Bit and Speed and Efficiency of 4-Bit

NVFP4 is a 4-bit data format delivering FP16-level accuracy with the throughput and memory efficiency of 4-bit precision, extended to pretraining for large language models. This profile covers 12B-scale experiments, stability, and industry collaborations.

Nvidia LLM Inference

Aug 22, 2025 aws.amazon.com

Enhance Geospatial Analysis with Amazon Bedrock: LLMs, RAG, and GIS Workflows

Explores how to integrate geospatial data and GIS workflows with Amazon Bedrock, leveraging LLMs, RAG, and Bedrock Agents to unlock insights and streamline operations.

Amazon LLM RAG

Aug 22, 2025 machinelearning.apple.com

SlowFast-LLaVA-1.5: Token-Efficient Video LLMs for Long-Form Understanding

Apple ML Research introduces SlowFast-LLaVA-1.5 (SF-LLaVA-1.5), a family of token-efficient video LLMs designed for long-form video understanding. It leverages SlowFast two-streams and public data to achieve state-of-the-art results at 1B–7B scales, with mobile-friendly implications.

Apple LLM Benchmark

Aug 21, 2025 machinelearning.apple.com

The 'Super Weight': How Even a Single Parameter Can Determine a Large Language Model's Behavior

Apple researchers identify 'super weights'—an extremely small subset of LLM parameters—that can decisively influence model behavior, enabling compression ideas and raising questions about internal dynamics.

Apple LLM Open Source

Aug 18, 2025 aws.amazon.com

Build a cost-effective travel planning agentic workflow with Amazon Nova

Learn how AWS used Amazon Nova and LangGraph to build a serverless, agent-based travel planning assistant with a three-layer architecture, function calling nodes, and extensible integrations.

Amazon LLM Open Source

Aug 18, 2025 machinelearning.apple.com

Investigating Intersectional Bias in Large Language Models via Coreference Confidence Disparities

An in-depth look at intersectional bias in LLMs through a new benchmark and a confidence-based fairness metric, revealing reliability gaps in decision-support scenarios.

Apple LLM Benchmark

Aug 18, 2025 developer.nvidia.com

Scaling AI Factories with Co-Packaged Optics for Better Power Efficiency

NVIDIA’s co-packaged optics (CPO) approach delivers dramatic power efficiency for large-scale AI data centers, with Quantum-X Photonics and Spectrum-X Photonics enabling high-bandwidth, low-latency networking.

Nvidia LLM Open Source

Aug 15, 2025 machinelearning.apple.com

UICoder: Finetuning LLMs to Generate UI Code with Automated Feedback

UICoder studies finetuning large language models to generate UI code using automated feedback from compilers and multimodal models, reducing reliance on human feedback and nearing the performance of larger proprietary models.

Apple LLM

Aug 14, 2025 aws.amazon.com

Citations with Amazon Nova understanding models: prompting verifiable sources on Bedrock

Demonstrates prompting Amazon Nova understanding models to cite sources and how to evaluate responses for accuracy, using Nova Pro as an example.

Amazon LLM Open Source

Aug 13, 2025 aws.amazon.com

Amazon Bedrock AgentCore Memory: Building context-aware agents

Explore how Amazon Bedrock AgentCore Memory enables AI agents to maintain short-term and long-term knowledge, transforming one-off conversations into continuous, evolving user interactions.

Amazon LLM Open Source

Aug 13, 2025 developer.nvidia.com

Dynamo 0.4 Delivers 4x Faster Performance, SLO-Based Autoscaling, and Real-Time Observability

Dynamo 0.4 introduces disaggregated serving, SLO-based autoscaling, AIConfigurator, and enhanced observability to accelerate large-model inference at scale with improved efficiency.

Nvidia LLM Inference

Aug 13, 2025 developer.nvidia.com

Scaling LLM Reinforcement Learning with ProRL v2: Prolonged Training for Continuous Improvement

NVIDIA Research introduces ProRL v2, the latest evolution of Prolonged Reinforcement Learning for LLMs. It explores thousands of extra RL steps, new stabilization techniques, and broad benchmarking to push sustained improvements beyond traditional RL schedules.

Nvidia LLM RL

Aug 13, 2025 aws.amazon.com

Securely Launch and Scale AI Agents with Amazon Bedrock AgentCore Runtime

A deep dive into how Amazon Bedrock AgentCore Runtime enables secure, framework- and model-agnostic hosting for AI agents, with persistent session execution, microVM isolation, and scalable management for production deployments.

Amazon LLM Open Source

Aug 12, 2025 machinelearning.apple.com

ICR2: Benchmarking In-Context Retrieval and Reasoning for Long-Context Language Models

A deep dive into In-Context Retrieval and Reasoning (ICR2) for long-context LLMs, including benchmarks, methods, and implications for retrieval-augmented generation (RAG).

Apple LLM RAG

Aug 12, 2025 huggingface.co

FilBench: Can LLMs Understand and Generate Filipino? A Deep Dive into Tagalog and Cebuano

FilBench evaluates LLM performance for Tagalog, Filipino, and Cebuano across cultural knowledge, NLP, reading comprehension, and generation, revealing efficiency and translation insights for SEA-focused models and GPT-4o.

Hugging Face LLM NLP

Aug 12, 2025 huggingface.co

TextQuests: Evaluating LLMs in Text-Based Adventure Games

TextQuests is a benchmark testing LLM agents in 25 classic text-based Infocom games, emphasizing long-context reasoning and autonomous exploration.

Hugging Face LLM Inference

Aug 12, 2025 huggingface.co

FilBench: Filipino Language Evaluation Suite for LLMs (Tagalog, Filipino, Cebuano)

FilBench is a comprehensive evaluation suite to assess LLM capabilities for Tagalog, Filipino, and Cebuano across cultural knowledge, NLP, reading comprehension, and generation, using a rigorous, historically informed methodology.

Hugging Face LLM NLP

Aug 11, 2025 developer.nvidia.com

Maximize Robotics Performance with Post-Training NVIDIA Cosmos Reason

NVIDIA Cosmos Reason is an open, fully customizable reasoning vision-language model for physical AI and robotics. It enables step-by-step multimodal reasoning and boosts robotics performance through post-training refinements.

Nvidia LLM Robotics

Aug 08, 2025 machinelearning.apple.com

Optimizing Contextual Speech Recognition with Vector Quantization for Efficient Retrieval

This work introduces a vector-quantization based approximation to cross-attention for contextual biasing in ASR, enabling scalable, memory-efficient use of large bias catalogs with notable accuracy gains.

Apple LLM Quantization

Aug 08, 2025 huggingface.co

Hugging Face AI Sheets: No-code tool to build, transform, and enrich datasets

AI Sheets is an open-source, no-code tool for building, enriching, and transforming datasets with AI models. Deployable locally or on the Hub, it supports thousands of open models and lets you iterate with prompts, few-shot feedback, and model comparisons.

Hugging Face LLM NLP

Aug 06, 2025 engineering.fb.com

Diff Risk Score: AI-driven risk-aware software development at Meta

Meta introduces Diff Risk Score (DRS), an AI-powered tool that predicts the risk of code changes causing production incidents, enabling safer and more productive software development.

Fb LLM Open Source

Aug 05, 2025 developer.nvidia.com

Delivering 1.5 M TPS Inference on NVIDIA GB200 NVL72: OpenAI gpt-oss Models From Cloud to Edge

NVIDIA and OpenAI optimize gpt-oss-120b and gpt-oss-20b for accelerated inference on Blackwell, achieving up to 1.5M tokens per second on GB200 NVL72, with Day 0 support across cloud to edge.

Nvidia LLM Transformers

Aug 05, 2025 openai.com

Estimating Worst-Case Frontier Risks of Open-Weight LLMs

A detailed analysis of the worst-case frontier risks when releasing open-weight LLMs, introducing Malicious Fine-Tuning (MFT) to probe biology and cybersecurity capabilities and comparing against open- and closed-weight baselines.

Openai LLM RL

Aug 05, 2025 openai.com

OpenAI launches gpt-oss-120b and gpt-oss-20b under Apache 2.0 license

OpenAI unveils gpt-oss-120b and gpt-oss-20b—two open-weight LLMs designed for strong real-world performance at low cost. Licensed under Apache 2.0, they emphasize reasoning, tool use, and efficient on-device deployment across consumer hardware.

Openai LLM Benchmark

Jul 31, 2025 huggingface.co

Build an AI Shopping Assistant with Gradio MCP Servers

Explore how Gradio MCP servers enable an LLM-powered shopping assistant that browses stores, selects garments, and shows virtual try-ons with IDM-VTON, all integrated via VS Code AI Chat.

Hugging Face LLM

Jul 31, 2025 huggingface.co

Build an AI Shopping Assistant with Gradio MCP Servers

Explore how Gradio MCP servers connect LLMs to Hugging Face models, enabling an AI shopping assistant that browses stores and performs virtual try-ons using IDM-VTON.

Hugging Face LLM

Jul 23, 2025 microsoft.com

Technical approach for classifying human-AI interactions at scale

Overview of Semantic Telemetry and the technical approach to classifying human-AI interactions at scale, covering batching, token optimization, and orchestration.

Microsoft LLM Research

Jul 17, 2025 huggingface.co

Consilium: When Multiple LLMs Collaborate to Reach Consensus

A deep dive into Consilium, the multi-LLM platform that enables models to discuss, debate, and reach consensus via MCP servers and a visual Gradio roundtable.

Hugging Face LLM Benchmark

Jul 17, 2025 huggingface.co

Five Major Improvements to Gradio MCP Servers

Overview of Gradio MCP server improvements in version 5.38.0, including File Upload support, real-time progress streaming, OpenAPI integration, header handling, and improved tool descriptions.

Hugging Face LLM

Jul 16, 2025 huggingface.co

Seq vs Seq: Ettin — Paired Encoders and Decoders Redefine Open-Data LLM Benchmarks

Ettin introduces the first state-of-the-art paired encoder-only and decoder-only models trained with identical data and recipes, measuring apples-to-apples performance across tasks and scales.

Hugging Face LLM Benchmark

Jul 15, 2025 microsoft.com

CollabLLM: Teaching LLMs to Collaborate with Users (ICML 2025 Outstanding Paper Award)

CollabLLM teaches LLMs to collaborate with users by knowing when to ask questions and adapting tone to context, highlighted in Microsoft Research's Semantic Telemetry post for ICML 2025 support.

Microsoft LLM Research

Jun 04, 2025 thegradient.pub

AGI Is Not Multimodal: Why Embodiment Beats Modality Gluing in AI

A rigorous case that true AGI requires embodied intelligence and physical-world grounding, not mere scaling of multimodal models. Critiques world-model hypotheses and highlights evidence that language models may rely on memorized rules rather than physics.

Thegradient LLM RL

Apr 11, 2025 bair.berkeley.edu

Defending Against Prompt Injection with StruQ and SecAlign

Overview of StruQ and SecAlign defenses to mitigate prompt injection in LLM-powered apps, with Secure Front-End concepts and evaluation results.

Berkeley LLM Research

Apr 11, 2025 bair.berkeley.edu

Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)

Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications. However, as LLMs have improved, so have the attacks against them. Prompt injection attack is listed as the #1 threat by OWASP to LLM-integrated applications, where an LLM input contains a trusted prompt (ins

Berkeley LLM

Nov 12, 2024 bair.berkeley.edu

Anthology: Conditioning LLMs with Rich Backstories to Create Virtual Personas

Anthology conditions language models on richly detailed backstories to simulate representative, consistent, and diverse virtual personas for surveys and social science research.

Berkeley LLM Privacy

Nov 12, 2024 bair.berkeley.edu

Anthology: Conditioning LLMs with Rich Backstories to Create Virtual Personas

A method to steer LLMs toward representative, consistent virtual personas by generating naturalistic backstories and using them as conditioning context, enabling individualized simulations and scalable user studies.

Berkeley LLM

Sep 09, 2024 thegradient.pub

What’s Missing From LLM Chatbots: A Sense of Purpose in Dialogue

Explores why purposeful, multi round dialogue matters for LLM chatbots beyond one shot prompts, and outlines training, evaluation, and implementation challenges for engineers and enterprises.

Thegradient LLM Benchmark

Sep 09, 2024 thegradient.pub

What's Missing From LLM Chatbots: A Sense of Purpose

Explores purposeful dialogue in LLM chatbots, arguing multi-turn interactions better align AI with user goals and enable collaboration, especially in coding and personal assistant use cases.

Thegradient LLM Open Source

Aug 28, 2024 bair.berkeley.edu

How StrongREJECT Improves Jailbreak Evaluation for Frontier LLMs

StrongREJECT advances jailbreak evaluation by pairing a high-quality forbidden-prompt dataset with automated evaluators aligned to human judgments, delivering more reliable measurements of jailbreak effectiveness against frontier LLMs.

Berkeley LLM Benchmark

Aug 28, 2024 bair.berkeley.edu

StrongREJECT: A robust benchmark for evaluating jailbreak methods in LLMs

Overview of a high-quality jailbreak benchmark with dual automated evaluators, a 313-prompt dataset, and findings that many jailbreaks underperform claims from earlier work.

Berkeley LLM Benchmark

Apr 20, 2024 thegradient.pub

Financial Market Applications of LLMs: Opportunities, Limits, and Technical Directions

An in-depth look at how large language models (LLMs) relate to financial markets: token counts, predictability limits, multimodal approaches, synthetic data, residualization, and practical implications for quant and fundamental work.

Thegradient LLM Open Source

Apr 20, 2024 thegradient.pub

Financial Market Applications of LLMs — Overview, features and use cases

Overview of how LLMs can be applied to financial markets, including autoregressive modeling of price data, multi-modal inputs, residualization, synthetic data, and multi-horizon predictions, with caveats about market efficiency.

Thegradient LLM Open Source

Apr 08, 2024 thegradient.pub

A Brief Overview of Gender Bias in AI: Research, Findings, and Mitigations

Curated overview of key studies showing how AI systems reproduce and amplify gender bias, with concrete measures, benchmarks, and mitigations across embeddings, vision, NLP, and generative models.

Thegradient LLM NLP

Apr 08, 2024 thegradient.pub

A Resource Overview: Measuring and Mitigating Gender Bias in AI

Survey of key work measuring gender bias in AI, across word embeddings, coreference, facial recognition, QA benchmarks, and image generation; discusses mitigation, gaps, and the need for robust auditing.

Thegradient LLM Benchmark

Mar 11, 2024 bair.berkeley.edu

2024 BAIR Graduate Directory: Profiles of Berkeley AI PhD Graduates

Overview of BAIR Lab's 2024 AI PhD graduates, their research areas, advisors, and contact links, with profiles, research blurbs, and URLs for recruiting and collaboration.

Berkeley LLM NLP

Mar 08, 2024 thegradient.pub

Car‑GPT: Can Large Language Models Unlock Practical Self‑Driving Cars?

Explores how LLMs can augment perception, planning, and scenario generation for autonomous vehicles, highlighting approaches like Talk2BEV and GAIA‑1, while discussing trust, reliability, and deployment considerations.

Thegradient LLM Diffusion

Mar 08, 2024 thegradient.pub

Car-GPT: Could LLMs finally make self-driving cars happen?

Survey of how large language models (LLMs) could augment autonomous driving across perception, planning, and generation, with examples, challenges, and early results.

Thegradient LLM Open Source