Skip to main content
Fine-tuning is the process of taking a pre-trained large language model (LLM) and further training it on a smaller, specific dataset. This process adapts the model to a particular task or domain, improving its performance and accuracy for your use case. This guide explains how to use Runpod’s fine-tuning feature, powered by Axolotl, to customize an LLM. You’ll learn how to select a base model, choose a dataset, configure your training environment, and deploy your fine-tuned model. For more information about fine-tuning with Axolotl, see the Axolotl Documentation.

Requirements

Before you begin, ensure you have:
  • A Runpod account.
  • (Optional) A Hugging Face account and an access token if you plan to use gated models or upload your fine-tuned model.

Select a base model and dataset

The base model is the starting point for your fine-tuning process, while the dataset provides the specific knowledge needed to adapt the base model to your task. You can choose from thousands of models and datasets on Hugging Face.
If you’re experimenting with fine-tuning for the first time, we recommend trying the NousResearch/Meta-Llama-3-8B model and the tatsu-lab/alpaca dataset.

Deploy a fine-tuning Pod

1

Go to the Fine-Tuning page

Navigate to the Fine-Tuning section in the Runpod console.
2

Specify the base model and dataset

In the Base Model field, enter the Hugging Face model ID (e.g NousResearch/Meta-Llama-3-8B). In the Dataset field, enter the Hugging Face dataset ID (e.g tatsu-lab/alpaca).
3

Provide a Hugging Face token (if needed)

If you’re using a gated model that requires special access, generate a Hugging Face token with the necessary permissions and add it to the Hugging Face Access Token field.
4

Continue to the next step

Click Deploy the Fine-Tuning Pod to start configuring your fine-tuning Pod.
5

Choose a GPU for the Pod

Select a GPU instance based on your model’s requirements. Larger models and datasets require GPUs with more memory.
6

Deploy the Pod

Finishing configuring the Pod, then click Deploy on-demand. This should open the detail pane for your Pod automatically.
7

Monitor the Pod's deployment

Click Logs to monitor the system logs for deployment progress. Wait for the success message: "You've successfully configured your training environment!" Depending on the size of your model and dataset, this may take some time.
8

Connect to your training environment

Once your training environment is ready, you can connect to it to configure and start the fine-tuning process.Click Connect and choose your preferred connection method:
  • Jupyter Notebook: A browser-based notebook interface.
  • Web Terminal: A browser-based terminal.
  • SSH: A secure connection from your local machine.
To use SSH, add your public SSH key in your account settings. The system automatically adds your key to the Pod’s authorized_keys file. For more information, see Connect to a Pod with SSH.

Configure your environment

Your training environment is located in the /workspace/fine-tuning/ directory and has the following structure:
  • examples/: Sample configurations and scripts.
  • outputs/: Where your training results and model outputs will be saved.
  • config.yaml: The main configuration file for your training parameters.
The system generates an initial config.yaml based on your selected base model and dataset. This is where you define all the hyperparameters for your fine-tuning job. You may need to experiment with these settings to achieve the best results: Open the configuration file (config.yaml) in JupyterLab or using your preferred text editor to review and adjust the fine-tuning parameters:
nano config.yaml
Below is an example with common settings. You can replace the NousResearch/Meta-Llama-3-8B and tatsu-lab/alpaca with whichever base model and dataset you selected:
base_model: NousResearch/Meta-Llama-3.1-8B

# Model loading settings
load_in_8bit: false
load_in_4bit: false
strict: false

# Dataset configuration
datasets:
    - path: tatsu-lab/alpaca
    type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.05
output_dir: ./outputs/out

# Training parameters
sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true

# Weights & Biases logging (optional)
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

# Training optimization
gradient_accumulation_steps: 8
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 2e-5

# Additional settings
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
    use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 100
evals_per_epoch: 2
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
    pad_token: <|end_of_text|>
For more configuration examples, refer to the Axolotl examples repository.

Start the fine-tuning process

Once you’re satisfied with your configuration, you can start the training. Run the following command in your terminal:
axolotl train config.yaml
Monitor the training progress in your terminal. The output will show the training loss, validation loss, and other metrics.

Push your model to Hugging Face

After the fine-tuning process is complete, you can upload your model to the Hugging Face Hub to share it with the community or use it in your applications.
1

Log in to Hugging Face

Run this command to log in to your Hugging Face account:
huggingface-cli login
2

Upload your model files

To upload your model files to the Hugging Face Hub, run this command:
huggingface-cli upload YOUR_USERNAME/MODEL_NAME ./outputs/out
Replace YOUR_USERNAME with your Hugging Face username and MODEL_NAME with your desired model name.

Next steps

Now that you have a fine-tuned model, you can deploy it on Runpod for inference:
  • Serverless: Deploy your model as a Serverless endpoint for a pay-as-you-go, scalable solution.
  • Pods: Run your model on a dedicated Pod for more control and persistent workloads.
I