LLMs from scratch

2025-12-11 0 674

Build a Large Language Model (From Scratch)

This repository contains the code for developing, pretraining, and finetuning a GPT-like LLM and is the official code repository for the book Build a Large Language Model (From Scratch).

In Build a Large Language Model (From Scratch), you\’ll learn and understand how large language models (LLMs) work from the inside out by coding them from the ground up, step by step. In this book, I\’ll guide you through creating your own LLM, explaining each stage with clear text, diagrams, and examples.

The method described in this book for training and developing your own small-but-functional model for educational purposes mirrors the approach used in creating large-scale foundational models such as those behind ChatGPT. In addition, this book includes code for loading the weights of larger pretrained models for finetuning.

  • Link to the official source code repository
  • Link to the book at Manning (the publisher\’s website)
  • Link to the book page on Amazon.com
  • ISBN 9781633437166

To download a copy of this repository, click on the Download ZIP button or execute the following command in your terminal:

git clone --depth 1 https://*gith*ub*.com/rasbt/LLMs-from-scratch.git

(If you downloaded the code bundle from the Manning website, please consider visiting the official code repository on GitHub at https://gith*ub.**com/rasbt/LLMs-from-scratch for the latest updates.)

Table of Contents

Please note that this README.md file is a Markdown (.md) file. If you have downloaded this code bundle from the Manning website and are viewing it on your local computer, I recommend using a Markdown editor or previewer for proper viewing. If you haven\’t installed a Markdown editor yet, Ghostwriter is a good free option.

You can alternatively view this and other files on GitHub at https://gith*ub.**com/rasbt/LLMs-from-scratch in your browser, which renders Markdown automatically.

Tip:
If you\’re seeking guidance on installing Python and Python packages and setting up your code environment, I suggest reading the README.md file located in the setup directory.

Chapter Title Main Code (for Quick Access) All Code + Supplementary
Setup recommendations
Ch 1: Understanding Large Language Models No code
Ch 2: Working with Text Data – ch02.ipynb
– dataloader.ipynb (summary)
– exercise-solutions.ipynb
./ch02
Ch 3: Coding Attention Mechanisms – ch03.ipynb
– multihead-attention.ipynb (summary)
– exercise-solutions.ipynb
./ch03
Ch 4: Implementing a GPT Model from Scratch – ch04.ipynb
– gpt.py (summary)
– exercise-solutions.ipynb
./ch04
Ch 5: Pretraining on Unlabeled Data – ch05.ipynb
– gpt_train.py (summary)
– gpt_generate.py (summary)
– exercise-solutions.ipynb
./ch05
Ch 6: Finetuning for Text Classification – ch06.ipynb
– gpt_class_finetune.py
– exercise-solutions.ipynb
./ch06
Ch 7: Finetuning to Follow Instructions – ch07.ipynb
– gpt_instruction_finetuning.py (summary)
– ollama_evaluate.py (summary)
– exercise-solutions.ipynb
./ch07
Appendix A: Introduction to PyTorch – code-part1.ipynb
– code-part2.ipynb
– DDP-script.py
– exercise-solutions.ipynb
./appendix-A
Appendix B: References and Further Reading No code
Appendix C: Exercise Solutions No code
Appendix D: Adding Bells and Whistles to the Training Loop – appendix-D.ipynb ./appendix-D
Appendix E: Parameter-efficient Finetuning with LoRA – appendix-E.ipynb ./appendix-E

 

The mental model below summarizes the contents covered in this book.

 

Prerequisites

The most important prerequisite is a strong foundation in Python programming.
With this knowledge, you will be well prepared to explore the fascinating world of LLMs
and understand the concepts and code examples presented in this book.

If you have some experience with deep neural networks, you may find certain concepts more familiar, as LLMs are built upon these architectures.

This book uses PyTorch to implement the code from scratch without using any external LLM libraries. While proficiency in PyTorch is not a prerequisite, familiarity with PyTorch basics is certainly useful. If you are new to PyTorch, Appendix A provides a concise introduction to PyTorch. Alternatively, you may find my book, PyTorch in One Hour: From Tensors to Training Neural Networks on Multiple GPUs, helpful for learning about the essentials.

 

Hardware Requirements

The code in the main chapters of this book is designed to run on conventional laptops within a reasonable timeframe and does not require specialized hardware. This approach ensures that a wide audience can engage with the material. Additionally, the code automatically utilizes GPUs if they are available. (Please see the setup doc for additional recommendations.)

 

Video Course

A 17-hour and 15-minute companion video course where I code through each chapter of the book. The course is organized into chapters and sections that mirror the book\’s structure so that it can be used as a standalone alternative to the book or complementary code-along resource.

 

Exercises

Each chapter of the book includes several exercises. The solutions are summarized in Appendix C, and the corresponding code notebooks are available in the main chapter folders of this repository (for example, ./ch02/01_main-chapter-code/exercise-solutions.ipynb.

In addition to the code exercises, you can download a free 170-page PDF titled Test Yourself On Build a Large Language Model (From Scratch) from the Manning website. It contains approximately 30 quiz questions and solutions per chapter to help you test your understanding.

 

Bonus Material

Several folders contain optional materials as a bonus for interested readers:

  • Setup

    • Python Setup Tips
    • Installing Python Packages and Libraries Used In This Book
    • Docker Environment Setup Guide
  • Chapter 2: Working with text data

    • Byte Pair Encoding (BPE) Tokenizer From Scratch
    • Comparing Various Byte Pair Encoding (BPE) Implementations
    • Understanding the Difference Between Embedding Layers and Linear Layers
    • Dataloader Intuition with Simple Numbers
  • Chapter 3: Coding attention mechanisms

    • Comparing Efficient Multi-Head Attention Implementations
    • Understanding PyTorch Buffers
  • Chapter 4: Implementing a GPT model from scratch

    • FLOPS Analysis
    • KV Cache
  • Chapter 5: Pretraining on unlabeled data:

    • Alternative Weight Loading Methods
    • Pretraining GPT on the Project Gutenberg Dataset
    • Adding Bells and Whistles to the Training Loop
    • Optimizing Hyperparameters for Pretraining
    • Building a User Interface to Interact With the Pretrained LLM
    • Converting GPT to Llama
    • Llama 3.2 From Scratch
    • Qwen3 From Scratch
    • Memory-efficient Model Weight Loading
    • Extending the Tiktoken BPE Tokenizer with New Tokens
    • PyTorch Performance Tips for Faster LLM Training
  • Chapter 6: Finetuning for classification

    • Additional experiments finetuning different layers and using larger models
    • Finetuning different models on 50k IMDB movie review dataset
    • Building a User Interface to Interact With the GPT-based Spam Classifier
  • Chapter 7: Finetuning to follow instructions

    • Dataset Utilities for Finding Near Duplicates and Creating Passive Voice Entries
    • Evaluating Instruction Responses Using the OpenAI API and Ollama
    • Generating a Dataset for Instruction Finetuning
    • Improving a Dataset for Instruction Finetuning
    • Generating a Preference Dataset with Llama 3.1 70B and Ollama
    • Direct Preference Optimization (DPO) for LLM Alignment
    • Building a User Interface to Interact With the Instruction Finetuned GPT Model

 

Questions, Feedback, and Contributing to This Repository

I welcome all sorts of feedback, best shared via the Manning Forum or GitHub Discussions. Likewise, if you have any questions or just want to bounce ideas off others, please don\’t hesitate to post these in the forum as well.

Please note that since this repository contains the code corresponding to a print book, I currently cannot accept contributions that would extend the contents of the main chapter code, as it would introduce deviations from the physical book. Keeping it consistent helps ensure a smooth experience for everyone.

 

Citation

If you find this book or code useful for your research, please consider citing it.

Chicago-style citation:

Raschka, Sebastian. Build A Large Language Model (From Scratch). Manning, 2024. ISBN: 978-1633437166.

BibTeX entry:

@book{build-llms-from-scratch-book,
  author       = {Sebastian Raschka},
  title        = {Build A Large Language Model (From Scratch)},
  publisher    = {Manning},
  year         = {2024},
  isbn         = {978-1633437166},
  url          = {https://www.**mann*ing.com/books/build-a-large-language-model-from-scratch},
  github       = {https://gith*ub.**com/rasbt/LLMs-from-scratch}
}

下载源码

通过命令行克隆项目:

git clone https://github.com/rasbt/LLMs-from-scratch.git

收藏 (0) 打赏

感谢您的支持,我会继续努力的!

打开微信/支付宝扫一扫,即可进行扫码打赏哦,分享从这里开始,精彩与您同在
点赞 (0)

申明:本文由第三方发布,内容仅代表作者观点,与本网站无关。对本文以及其中全部或者部分内容的真实性、完整性、及时性本站不作任何保证或承诺,请读者仅作参考,并请自行核实相关内容。本网发布或转载文章出于传递更多信息之目的,并不意味着赞同其观点或证实其描述,也不代表本网对其真实性负责。

左子网 编程相关 LLMs from scratch https://www.zuozi.net/34503.html

Helm Intellisense
上一篇: Helm Intellisense
常见问题
  • 1、自动:拍下后,点击(下载)链接即可下载;2、手动:拍下后,联系卖家发放即可或者联系官方找开发者发货。
查看详情
  • 1、源码默认交易周期:手动发货商品为1-3天,并且用户付款金额将会进入平台担保直到交易完成或者3-7天即可发放,如遇纠纷无限期延长收款金额直至纠纷解决或者退款!;
查看详情
  • 1、描述:源码描述(含标题)与实际源码不一致的(例:货不对板); 2、演示:有演示站时,与实际源码小于95%一致的(但描述中有”不保证完全一样、有变化的可能性”类似显著声明的除外); 3、发货:不发货可无理由退款; 4、安装:免费提供安装服务的源码但卖家不履行的; 5、收费:价格虚标,额外收取其他费用的(但描述中有显著声明或双方交易前有商定的除外); 6、其他:如质量方面的硬性常规问题BUG等。 注:经核实符合上述任一,均支持退款,但卖家予以积极解决问题则除外。
查看详情
  • 1、左子会对双方交易的过程及交易商品的快照进行永久存档,以确保交易的真实、有效、安全! 2、左子无法对如“永久包更新”、“永久技术支持”等类似交易之后的商家承诺做担保,请买家自行鉴别; 3、在源码同时有网站演示与图片演示,且站演与图演不一致时,默认按图演作为纠纷评判依据(特别声明或有商定除外); 4、在没有”无任何正当退款依据”的前提下,商品写有”一旦售出,概不支持退款”等类似的声明,视为无效声明; 5、在未拍下前,双方在QQ上所商定的交易内容,亦可成为纠纷评判依据(商定与描述冲突时,商定为准); 6、因聊天记录可作为纠纷评判依据,故双方联系时,只与对方在左子上所留的QQ、手机号沟通,以防对方不承认自我承诺。 7、虽然交易产生纠纷的几率很小,但一定要保留如聊天记录、手机短信等这样的重要信息,以防产生纠纷时便于左子介入快速处理。
查看详情

相关文章

猜你喜欢
发表评论
暂无评论
官方客服团队

为您解决烦忧 - 24小时在线 专业服务