AI News and Headlines
  • All Technology
  • AI
  • Autonomy
  • B2B Growth
  • Big Data
  • BioTech
  • ClimateTech
  • Consumer Tech
  • Crypto
  • Cybersecurity
  • DevOps
  • Digital Marketing
  • Ecommerce
  • EdTech
  • Enterprise
  • FinTech
  • GovTech
  • Hardware
  • HealthTech
  • HRTech
  • LegalTech
  • Nanotech
  • PropTech
  • Quantum
  • Robotics
  • SaaS
  • SpaceTech
AllNewsDealsSocialBlogsVideosPodcastsDigests

AI Pulse

EMAIL DIGESTS

Daily

Every morning

Weekly

Sunday recap

NewsDealsSocialBlogsVideosPodcasts
AINewsWe Got Claude to Fine-Tune an Open Source LLM
We Got Claude to Fine-Tune an Open Source LLM
AI

We Got Claude to Fine-Tune an Open Source LLM

•December 4, 2025
0
Hugging Face
Hugging Face•Dec 4, 2025

Companies Mentioned

OpenAI

OpenAI

Google

Google

GOOG

GitHub

GitHub

Why It Matters

By automating the entire fine‑tuning pipeline, developers can iterate faster and lower the barrier to customizing LLMs, accelerating product development across AI‑driven businesses.

Key Takeaways

  • •Claude can orchestrate end‑to‑end LLM fine‑tuning.
  • •Hugging Face Skills automate GPU selection and job submission.
  • •Supports SFT, DPO, GRPO for models up to 70 B.
  • •Costs as low as $0.30 for 0.6 B model.
  • •Integrates with Claude, Codex, Gemini CLI tools.

Pulse Analysis

Fine‑tuning large language models has traditionally required deep expertise in scripting, hardware provisioning, and monitoring. Developers often juggle multiple tools—Docker containers, custom training loops, and cloud‑provider APIs—to move from dataset to a deployable model. Claude Code’s new Hugging Face Skills bridges that gap by encapsulating best‑practice configurations into a conversational agent. Users simply describe the desired outcome, and the skill translates natural language into a fully‑qualified training job, handling token authentication, GPU selection, and environment setup without manual code edits.

The skill’s intelligence extends beyond basic script generation. It evaluates dataset formats, recommends LoRA for models above three billion parameters, and provides real‑time cost estimates—e.g., a 0.6 B model on a t4‑small GPU for roughly $0.30. Integrated Trackio dashboards let users watch loss curves and resource utilization, while automatic Hub pushes ensure versioned artifacts are instantly shareable. By supporting supervised fine‑tuning, direct preference optimization, and group‑relative policy optimization, the tool accommodates the full spectrum of alignment techniques used in production AI pipelines.

For enterprises and startups alike, this automation translates into shorter development cycles and lower operational overhead. Teams can prototype domain‑specific assistants, code‑generation models, or safety‑tuned bots without hiring dedicated MLOps engineers. The requirement of a paid Hugging Face plan does introduce a modest barrier, but the pay‑as‑you‑go GPU pricing keeps experiments financially viable. As AI adoption accelerates, tools that democratize model customization—like Claude Code’s Hugging Face Skills—are poised to become essential components of the modern AI stack.

We Got Claude to Fine-Tune an Open Source LLM

By ben burtenshaw & shaun smith · Published December 4, 2025

We gave Claude the ability to fine‑tune language models using a new tool called Hugging Face Skills. Not just write training scripts, but to actually submit jobs to cloud GPUs, monitor progress, and push finished models to the Hugging Face Hub. This tutorial shows you how it works and how to use it yourself.

Claude Code can use “skills”—packaged instructions, scripts, and domain knowledge—to accomplish specialized tasks. The hf‑llm‑trainer skill teaches Claude everything it needs to know about training: which GPU to pick for your model size, how to configure Hub authentication, when to use LoRA versus full fine‑tuning, and how to handle the dozens of other decisions that go into a successful training run.

With this skill, you can tell Claude things like:


Fine‑tune Qwen3‑0.6B on the dataset open‑r1/codeforces‑cots

And Claude will:

  1. Validate your dataset format

  2. Select appropriate hardware (t4‑small for a 0.6B model)

  3. Use and update a training script with Trackio monitoring

  4. Submit the job to Hugging Face Jobs

  5. Report the job ID and estimated cost

  6. Check on progress when you ask

  7. Help you debug if something goes wrong

The model trains on Hugging Face GPUs while you do other things. When it’s done, your fine‑tuned model appears on the Hub, ready to use.

This isn’t a toy demo. The skill supports the same training methods used in production: supervised fine‑tuning, direct preference optimization, and reinforcement learning with verifiable rewards. You can train models from 0.5 B to 70 B parameters, convert them to GGUF for local deployment, and run multi‑stage pipelines that combine different techniques.


Setup and Install

Before starting, you’ll need:

  • A Hugging Face account with a Pro or Team plan (Jobs require a paid plan)

  • A write‑access token from huggingface.co/settings/tokens

  • A coding agent like Claude Code, OpenAI Codex, or Google’s Gemini CLI

Hugging Face Skills are compatible with Claude Code, Codex, and Gemini CLI. Integrations with Cursor, Windsurf, and Continue are on the way.

Claude Code

  1. Register the repository as a plugin marketplace:

    
    /plugin marketplace add huggingface/skills
    
    
  2. Install a skill:

    
    /plugin install <skill‑folder>@huggingface‑skills
    
    

    Example:

    
    /plugin install hf‑llm‑trainer@huggingface‑skills
    
    

Codex

  1. Codex will identify the skills via the AGENTS.md file. Verify the instructions are loaded with:

    
    codex --ask‑for‑approval never "Summarize the current instructions."
    
    
  2. For more details, see the Codex AGENTS guide.

Gemini CLI

  1. This repo includes gemini‑extension.json to integrate with the Gemini CLI.

  2. Install locally:

    
    gemini extensions install . --consent
    
    

    or use the GitHub URL:

    
    gemini extensions install https://github.com/huggingface/skills.git --consent
    
    
  3. See the Gemini CLI extensions docs for more help.

Connect to Hugging Face

Authenticate your Hugging Face account with a write‑access token so that the job can create a model repo.


hf auth login

export HF_TOKEN=hf_your_write_access_token_here

NOTE – Configure the Hugging Face MCP Server to use your write token by sending it in either the HF_TOKEN or Authorization: Bearer HTTP headers.


Your First Training Run

Let’s walk through a complete example. We’ll fine‑tune a small model to see the full workflow, then explore more advanced capabilities.

Instruct the coding agent to fine‑tune

Start with a clear instruction:


Fine‑tune Qwen3‑0.6B on the open‑r1/codeforces‑cots dataset for instruction following.

The coding agent analyzes the request and prepares a training configuration. For a 0.6 B model on a demo dataset, it selects t4‑small—the cheapest GPU that can handle the model.

The open‑r1/codeforces‑cots dataset is a collection of Codeforces problems and solutions. It is a good dataset for instruction tuning a model to solve hard coding problems.

Review Before Submitting

Before the agent submits anything, you’ll see the configuration:


I'll fine‑tune Qwen/Qwen3‑0.6B on open‑r1/codeforces‑cots using SFT.

Configuration:

- Hardware: t4‑small (~$0.75/hour)

- Estimated time: ~20 minutes

- Estimated cost: ~$0.30

- Output: username/qwen‑codeforces‑cots‑sft

The model will be pushed to Hub automatically. Should I submit?

Adjust anything you like (output repo name, hardware, training parameters). Once you approve, the agent submits the job. You can also ask for a quick test run:


Do a quick test run on 100 examples.

Track Progress

After submission you receive job details:


✅ Job submitted successfully!

Job ID: abc123xyz

Monitor: https://huggingface.co/jobs/username/abc123xyz

Expected time: ~20 minutes

Estimated cost: ~$0.30

View real‑time metrics at: https://huggingface.co/spaces/username/trackio

The skill includes Trackio integration, so you can watch training loss decrease in real time. Jobs run asynchronously, so you can close your terminal and come back later. When you want an update:


How's my training job doing?

The agent fetches the logs and summarizes progress.

Use Your Model

When training completes, the model is on the Hub:


from transformers import AutoModelForCausalLM, AutoTokenizer



model = AutoModelForCausalLM.from_pretrained("username/qwen-codeforces-cots-sft")

tokenizer = AutoTokenizer.from_pretrained("username/qwen-codeforces-cots-sft")

That’s the full loop: you describe what you want in plain English, and the agent handles GPU selection, script generation, job submission, authentication, and persistence. The whole thing cost about thirty cents.


Training Methods

The skill supports three training approaches. Understanding when to use each one helps you get better results.

Supervised Fine‑Tuning (SFT)

SFT is where most projects start. You provide demonstration data—examples of inputs and desired outputs—and training adjusts the model to match those patterns.

Use SFT when you have high‑quality examples of the behavior you want (customer‑support conversations, code‑generation pairs, domain‑specific Q&A, etc.).


Fine‑tune Qwen3‑0.6B on my‑org/support‑conversations for 3 epochs.

The agent validates the dataset, selects hardware (e.g., a10g‑large with LoRA for a 7 B model), and configures training with checkpoints and monitoring.

For models larger than 3 B parameters, the agent automatically uses LoRA (Low‑Rank Adaptation) to reduce memory requirements. This makes training 7 B or 13 B models feasible on single GPUs while preserving most of the quality of full fine‑tuning.

Direct Preference Optimization (DPO)

DPO trains on preference pairs—responses where one is “chosen” and another is “rejected.” This aligns model outputs with human preferences, typically after an initial SFT stage.

Use DPO when you have preference annotations from human labelers or automated comparisons. DPO optimizes directly for the preferred response without needing a separate reward model.


Run DPO on my‑org/preference‑data to align the SFT model I just trained.

The dataset has 'chosen' and 'rejected' columns.

DPO is sensitive to dataset format. It requires columns named exactly chosen and rejected, or a prompt column with the input. The agent validates this first and shows you how to map columns if your dataset uses different names.

Group Relative Policy Optimization (GRPO)

GRPO is a reinforcement‑learning task that is proven to improve model behavior for specific user groups or safety constraints. (Further details omitted for brevity.)


End of article.

Read Original Article
0

Comments

Want to join the conversation?

Loading comments...