burn

2025-12-10 0 454


Burn is a next generation Deep Learning Framework that doesn\’t compromise on
flexibility, efficiency and portability.

Performance

Because we believe the goal of a deep learning framework is to convert computation into useful
intelligence, we have made performance a core pillar of Burn. We strive to achieve top efficiency by
leveraging multiple optimization techniques described below.

Click on each section for more details ?

Automatic kernel fusion ?

Using Burn means having your models optimized on any backend. When possible, we provide a way to
automatically and dynamically create custom kernels that minimize data relocation between different
memory spaces, extremely useful when moving memory is the bottleneck.

As an example, you could write your own GELU activation function with the high level tensor api (see
Rust code snippet below).

fn gelu_custom<B: Backend, const D: usize>(x: Tensor<B, D>) -> Tensor<B, D> {
    let x = x.clone() * ((x / SQRT_2).erf() + 1);
    x / 2
}

Then, at runtime, a custom low-level kernel will be automatically created for your specific
implementation and will rival a handcrafted GPU implementation. The kernel consists of about 60
lines of WGSL WebGPU Shading Language,
an extremely verbose lower level shader language you probably don\’t want to program your deep
learning models in!

Asynchronous execution ❤️‍

For first-party backends, an asynchronous execution style
is used, which allows to perform various optimizations, such as the previously mentioned automatic
kernel fusion.

Asynchronous execution also ensures that the normal execution of the framework does not block the
model computations, which implies that the framework overhead won\’t impact the speed of execution
significantly. Conversely, the intense computations in the model do not interfere with the
responsiveness of the framework. For more information about our asynchronous backends, see
this blog post.

Thread-safe building blocks ?

Burn emphasizes thread safety by leveraging the
ownership system of Rust.
With Burn, each module is the owner of its weights. It is therefore possible to send a module to
another thread for computing the gradients, then send the gradients to the main thread that can
aggregate them, and voilà, you get multi-device training.

This is a very different approach from what PyTorch does, where backpropagation actually mutates the
grad attribute of each tensor parameter. This is not a thread-safe operation and therefore
requires lower level synchronization primitives, see
distributed training for reference. Note that
this is still very fast, but not compatible across different backends and quite hard to implement.

Intelligent memory management ?

One of the main roles of a deep learning framework is to reduce the amount of memory necessary to
run models. The naive way of handling memory is that each tensor has its own memory space, which is
allocated when the tensor is created then deallocated as the tensor gets out of scope. However,
allocating and deallocating data is very costly, so a memory pool is often required to achieve good
throughput. Burn offers an infrastructure that allows for easily creating and selecting memory
management strategies for backends. For more details on memory management in Burn, see
this blog post.

Another very important memory optimization of Burn is that we keep track of when a tensor can be
mutated in-place just by using the ownership system well. Even though it is a rather small memory
optimization on its own, it adds up considerably when training or running inference with larger
models and contributes to reduce the memory usage even more. For more information, see
this blog post about tensor handling.

Automatic kernel selection

A good deep learning framework should ensure that models run smoothly on all hardware. However, not
all hardware share the same behavior in terms of execution speed. For instance, a matrix
multiplication kernel can be launched with many different parameters, which are highly sensitive to
the size of the matrices and the hardware. Using the wrong configuration could reduce the speed of
execution by a large factor (10 times or even more in extreme cases), so choosing the right kernels
becomes a priority.

With our home-made backends, we run benchmarks automatically and choose the best configuration for
the current hardware and matrix sizes with a reasonable caching strategy.

This adds a small overhead by increasing the warmup execution time, but stabilizes quickly after a
few forward and backward passes, saving lots of time in the long run. Note that this feature isn\’t
mandatory, and can be disabled when cold starts are a priority over optimized throughput.

Hardware specific features

It is no secret that deep learning is mostly relying on matrix multiplication as its core operation,
since this is how fully-connected neural networks are modeled.

More and more, hardware manufacturers optimize their chips specifically for matrix multiplication
workloads. For instance, Nvidia has its Tensor Cores and today most cellphones have AI specialized
chips. As of this moment, we support Tensor Cores with our LibTorch, Candle, CUDA, Metal and WGPU/SPIR-V
backends, but not other accelerators yet. We hope
this issue gets resolved at some point to bring
support to our WGPU backend.

Custom Backend Extension ?

Burn aims to be the most flexible deep learning framework. While it\’s crucial to maintain
compatibility with a wide variety of backends, Burn also provides the ability to extend the
functionalities of a backend implementation to suit your personal modeling requirements.

This versatility is advantageous in numerous ways, such as supporting custom operations like flash
attention or manually writing your own kernel for a specific backend to enhance performance. See
this section in the Burn Book
for more details.

Backend

Burn strives to be as fast as possible on as many hardwares as possible, with robust implementations.
We believe this flexibility is crucial for modern needs where you may train your models in the cloud,
then deploy on customer hardwares, which vary from user to user.

Supported Backends

Backend Devices Class
CUDA NVIDIA GPUs First-Party
ROCm AMD GPUs First-Party
Metal Apple GPUs First-Party
Vulkan Most GPUs on Linux & Windows First-Party
Wgpu Most GPUs First-Party
NdArray Most CPUs Third-Party
LibTorch Most GPUs & CPUs Third-Party
Candle Nvidia, Apple GPUs & CPUs Third-Party

Compared to other frameworks, Burn has a very different approach to supporting many backends. By
design, most code is generic over the Backend trait, which allows us to build Burn with swappable
backends. This makes composing backend possible, augmenting them with additional functionalities
such as autodifferentiation and automatic kernel fusion.

Autodiff: Backend decorator that brings backpropagation to any backend

Contrary to the aforementioned backends, Autodiff is actually a backend decorator. This means that
it cannot exist by itself; it must encapsulate another backend.

The simple act of wrapping a base backend with Autodiff transparently equips it with
autodifferentiation support, making it possible to call backward on your model.

use burn::backend::{Autodiff, Wgpu};
use burn::tensor::{Distribution, Tensor};

fn main() {
    type Backend = Autodiff<Wgpu>;

    let x: Tensor<Backend, 2> = Tensor::random([32, 32], Distribution::Default);
    let y: Tensor<Backend, 2> = Tensor::random([32, 32], Distribution::Default).require_grad();

    let tmp = x.clone() + y.clone();
    let tmp = tmp.matmul(x);
    let tmp = tmp.exp();

    let grads = tmp.backward();
    let y_grad = y.grad(&grads).unwrap();
    println!(\"{y_grad}\");
}

Of note, it is impossible to make the mistake of calling backward on a model that runs on a backend
that does not support autodiff (for inference), as this method is only offered by an Autodiff
backend.

See the Autodiff Backend README for more details.

Fusion: Backend decorator that brings kernel fusion to all first-party backends

This backend decorator enhances a backend with kernel fusion, provided that the inner backend
supports it. Note that you can compose this backend with other backend decorators such as Autodiff.
For now, only the WGPU and CUDA backends have support for fused kernels.

use burn::backend::{Autodiff, Fusion, Wgpu};
use burn::tensor::{Distribution, Tensor};

fn main() {
    type Backend = Autodiff<Fusion<Wgpu>>;

    let x: Tensor<Backend, 2> = Tensor::random([32, 32], Distribution::Default);
    let y: Tensor<Backend, 2> = Tensor::random([32, 32], Distribution::Default).require_grad();

    let tmp = x.clone() + y.clone();
    let tmp = tmp.matmul(x);
    let tmp = tmp.exp();

    let grads = tmp.backward();
    let y_grad = y.grad(&grads).unwrap();
    println!(\"{y_grad}\");
}

Of note, we plan to implement automatic gradient checkpointing based on compute bound and memory
bound operations, which will work gracefully with the fusion backend to make your code run even
faster during training, see this issue.

See the Fusion Backend README for more details.

Router (Beta): Backend decorator that composes multiple backends into a single one

That backend simplifies hardware operability, if for instance you want to execute some operations on the CPU and other operations on the GPU.

use burn::tensor::{Distribution, Tensor};
use burn::backend::{
    NdArray, Router, Wgpu, ndarray::NdArrayDevice, router::duo::MultiDevice, wgpu::WgpuDevice,
};

fn main() {
    type Backend = Router<(Wgpu, NdArray)>;

    let device_0 = MultiDevice::B1(WgpuDevice::DiscreteGpu(0));
    let device_1 = MultiDevice::B2(NdArrayDevice::Cpu);

    let tensor_gpu =
        Tensor::<Backend, 2>::random([3, 3], burn::tensor::Distribution::Default, &device_0);
    let tensor_cpu =
        Tensor::<Backend, 2>::random([3, 3], burn::tensor::Distribution::Default, &device_1);
}
Remote (Beta): Backend decorator for remote backend execution, useful for distributed computations

That backend has two parts, one client and one server.
The client sends tensor operations over the network to a remote compute backend.
You can use any first-party backend as server in a single line of code:

fn main_server() {
    // Start a server on port 3000.
    burn::server::start::<burn::backend::Cuda>(Default::default(), 3000);
}

fn main_client() {
    // Create a client that communicate with the server on port 3000.
    use burn::backend::{Autodiff, RemoteBackend};

    type Backend = Autodiff<RemoteDevice>;

    let device = RemoteDevice::new(\"ws://localhost:3000\");
    let tensor_gpu =
        Tensor::<Backend, 2>::random([3, 3], Distribution::Default, &device);
}

Training & Inference

The whole deep learning workflow is made easy with Burn, as you can monitor your training progress
with an ergonomic dashboard, and run inference everywhere from embedded devices to large GPU
clusters.

Burn was built from the ground up with training and inference in mind. It\’s also worth noting how
Burn, in comparison to frameworks like PyTorch, simplifies the transition from training to
deployment, eliminating the need for code changes.

Click on the following sections to expand ?

Training Dashboard ?

As you can see in the previous video (click on the picture!), a new terminal UI dashboard based on
the Ratatui crate allows users to follow their training
with ease without having to connect to any external application.

You can visualize your training and validation metrics updating in real-time and analyze the
lifelong progression or recent history of any registered metrics using only the arrow keys. Break
from the training loop without crashing, allowing potential checkpoints to be fully written or
important pieces of code to complete without interruption ?

ONNX Support ?

ONNX (Open Neural Network Exchange) is an open-standard format that exports both the architecture
and the weights of a deep learning model.

Burn supports the importation of models that follow the ONNX standard so you can easily port a model
you have written in another framework like TensorFlow or PyTorch to Burn to benefit from all the
advantages our framework offers.

Our ONNX support is further described in
this section of the Burn Book .

Note: This crate is in active development and currently supports a
limited set of ONNX operators.

Importing PyTorch or Safetensors Models ?

You can load weights from PyTorch or Safetensors formats directly into your Burn-defined models. This makes it easy to reuse existing models while benefiting from Burn\’s performance and deployment features.

Learn more:

  • Import pre-trained PyTorch models into Burn
  • Load models from Safetensors format
Inference in the Browser

Several of our backends can compile to Web Assembly: Candle and NdArray for CPU, and WGPU for GPU.
This means that you can run inference directly within a browser. We provide several examples of
this:

  • MNIST where you can draw digits and a small convnet tries to
    find which one it is! 2️⃣ 7️⃣ ?
  • Image Classification where you can upload images and
    classify them! ?
Embedded: no_std support

Burn\’s core components support no_std. This
means it can run in bare metal environment such as embedded devices without an operating system.

As of now, only the NdArray backend can be used in a no_std environment.

Benchmarks

To evaluate performance across different backends and track improvements over time, we provide a
dedicated benchmarking suite.

Run and compare benchmarks using burn-bench.

Warning
When using one of the wgpu backends, you may encounter compilation errors related to recursive type evaluation. This is due to complex type nesting within the wgpu dependency chain.
To resolve this issue, add the following line at the top of your main.rs or lib.rs file:

#![recursion_limit = \"256\"]

The default recursion limit (128) is often just below the required depth (typically 130-150) due to deeply nested associated types and trait bounds.

Getting Started

Just heard of Burn? You are at the right place! Just continue reading this section and we hope you
can get on board really quickly.

The Burn Book

To begin working effectively with Burn, it is crucial to understand its key components and
philosophy. This is why we highly recommend new users to read the first sections of
The Burn Book . It provides detailed examples and explanations
covering every facet of the framework, including building blocks like tensors, modules, and
optimizers, all the way to advanced usage, like coding your own GPU kernels.

The project is constantly evolving, and we try as much as possible to keep the book up to date
with new additions. However, we might miss some details sometimes, so if you see something weird,
let us know! We also gladly accept Pull Requests ?

Examples

Let\’s start with a code snippet that shows how intuitive the framework is to use! In the following,
we declare a neural network module with some parameters along with its forward pass.

use burn::nn;
use burn::module::Module;
use burn::tensor::backend::Backend;

#[derive(Module, Debug)]
pub struct PositionWiseFeedForward<B: Backend> {
    linear_inner: nn::Linear<B>,
    linear_outer: nn::Linear<B>,
    dropout: nn::Dropout,
    gelu: nn::Gelu,
}

impl<B: Backend> PositionWiseFeedForward<B> {
    pub fn forward<const D: usize>(&self, input: Tensor<B, D>) -> Tensor<B, D> {
        let x = self.linear_inner.forward(input);
        let x = self.gelu.forward(x);
        let x = self.dropout.forward(x);

        self.linear_outer.forward(x)
    }
}

We have a somewhat large amount of examples in the repository that shows how to use
the framework in different scenarios.

Following the book:

  • Basic Workflow : Creates a custom CNN Module to train on the MNIST dataset
    and use for inference.
  • Custom Training Loop : Implements a basic training loop instead
    of using the Learner.
  • Custom WGPU Kernel : Learn how to create your own custom
    operation with the WGPU backend.

Additional examples:

  • Custom CSV Dataset : Implements a dataset to parse CSV data for a
    regression task.
  • Regression : Trains a simple MLP on the California Housing dataset
    to predict the median house value for a district.
  • Custom Image Dataset : Trains a simple CNN on custom image
    dataset following a simple folder structure.
  • Custom Renderer : Implements a custom renderer to display the
    Learner progress.
  • Image Classification Web : Image classification web browser
    demo using Burn, WGPU and WebAssembly.
  • MNIST Inference on Web : An interactive MNIST inference demo in
    the browser. The demo is available online.
  • MNIST Training : Demonstrates how to train a custom Module (MLP) with the
    Learner configured to log metrics and keep training checkpoints.
  • Named Tensor : Performs operations with the experimental NamedTensor
    feature.
  • ONNX Import Inference : Imports an ONNX model pre-trained on MNIST to
    perform inference on a sample image with Burn.
  • PyTorch Import Inference : Imports a PyTorch model pre-trained on
    MNIST to perform inference on a sample image with Burn.
  • Text Classification : Trains a text classification transformer
    model on the AG News or DbPedia dataset. The trained model can then be used to classify a text
    sample.
  • Text Generation : Trains a text generation transformer model on the
    DbPedia dataset.
  • Wasserstein GAN MNIST : Trains a WGAN model to generate new handwritten digits
    based on MNIST.

For more practical insights, you can clone the repository and run any of them directly on your
computer!

Pre-trained Models ?

We keep an updated and curated list of models and examples built with Burn, see the
tracel-ai/models repository for more details.

Don\’t see the model you want? Don\’t hesitate to open an issue, and we may prioritize it. Built a
model using Burn and want to share it? You can also open a Pull Request and add your model under the
community section!

Why use Rust for Deep Learning? ?

Deep Learning is a special form of software where you need very high level abstractions as well as
extremely fast execution time. Rust is the perfect candidate for that use case since it provides
zero-cost abstractions to easily create neural network modules, and fine-grained control over memory
to optimize every detail.

It\’s important that a framework be easy to use at a high level so that its users can focus on
innovating in the AI field. However, since running models relies so heavily on computations,
performance can\’t be neglected.

To this day, the mainstream solution to this problem has been to offer APIs in Python, but rely on
bindings to low-level languages such as C/C++. This reduces portability, increases complexity and
creates frictions between researchers and engineers. We feel like Rust\’s approach to abstractions
makes it versatile enough to tackle this two languages dichotomy.

Rust also comes with the Cargo package manager, which makes it incredibly easy to build, test, and
deploy from any environment, which is usually a pain in Python.

Although Rust has the reputation of being a difficult language at first, we strongly believe it
leads to more reliable, bug-free solutions built faster (after some practice ?)!

Deprecation Note
Since 0.14.0, the internal structure for tensor data has changed. The
previous Data struct was deprecated and officially removed since 0.17.0 in favor of the new
TensorData struct, which allows for more flexibility by storing the underlying data as bytes and
keeping the data type as a field. If you are using Data in your code, make sure to switch to
TensorData.

Loading Model Records From Previous Versions ️

In the event that you are trying to load a model record saved in a version older than 0.14.0, make
sure to use a compatible version (0.14, 0.15 or 0.16) with the record-backward-compat
feature flag.

features = [..., \"record-backward-compat\"]

Otherwise, the record won\’t be deserialized correctly and you will get an error message. This error
will also point you to the backward compatible feature flag.

The backward compatibility was maintained for deserialization when loading records. Therefore, as
soon as you have saved the record again it will be saved according to the new structure and you can
upgrade back to the current version

Please note that binary formats are not backward compatible. Thus, you will need to load your record
in a previous version and save it in any of the other self-describing record format (e.g., using the
NamedMpkFileRecorder) before using a compatible version (as described) with the
record-backward-compat feature flag.

Community

If you are excited about the project, don\’t hesitate to join our
Discord! We try to be as welcoming as possible to everybody from
any background. You can ask your questions and share what you built with the community!

Contributing

Before contributing, please take a moment to review our
code of conduct. It\’s also highly
recommended to read the
architecture overview,
which explains some of our architectural decisions. Refer to our
contributing guide for more details.

Status

Burn is currently in active development, and there will be breaking changes. While any resulting
issues are likely to be easy to fix, there are no guarantees at this stage.

License

Burn is distributed under the terms of both the MIT license and the Apache License (Version 2.0).
See LICENSE-APACHE and LICENSE-MIT for details. Opening a pull
request is assumed to signal agreement with these licensing terms.

下载源码

通过命令行克隆项目:

git clone https://github.com/tracel-ai/burn.git

收藏 (0) 打赏

感谢您的支持,我会继续努力的!

打开微信/支付宝扫一扫,即可进行扫码打赏哦,分享从这里开始,精彩与您同在
点赞 (0)

申明:本文由第三方发布,内容仅代表作者观点,与本网站无关。对本文以及其中全部或者部分内容的真实性、完整性、及时性本站不作任何保证或承诺,请读者仅作参考,并请自行核实相关内容。本网发布或转载文章出于传递更多信息之目的,并不意味着赞同其观点或证实其描述,也不代表本网对其真实性负责。

左子网 编程相关 burn https://www.zuozi.net/33297.html

kitsai
上一篇: kitsai
常见问题
  • 1、自动:拍下后,点击(下载)链接即可下载;2、手动:拍下后,联系卖家发放即可或者联系官方找开发者发货。
查看详情
  • 1、源码默认交易周期:手动发货商品为1-3天,并且用户付款金额将会进入平台担保直到交易完成或者3-7天即可发放,如遇纠纷无限期延长收款金额直至纠纷解决或者退款!;
查看详情
  • 1、描述:源码描述(含标题)与实际源码不一致的(例:货不对板); 2、演示:有演示站时,与实际源码小于95%一致的(但描述中有”不保证完全一样、有变化的可能性”类似显著声明的除外); 3、发货:不发货可无理由退款; 4、安装:免费提供安装服务的源码但卖家不履行的; 5、收费:价格虚标,额外收取其他费用的(但描述中有显著声明或双方交易前有商定的除外); 6、其他:如质量方面的硬性常规问题BUG等。 注:经核实符合上述任一,均支持退款,但卖家予以积极解决问题则除外。
查看详情
  • 1、左子会对双方交易的过程及交易商品的快照进行永久存档,以确保交易的真实、有效、安全! 2、左子无法对如“永久包更新”、“永久技术支持”等类似交易之后的商家承诺做担保,请买家自行鉴别; 3、在源码同时有网站演示与图片演示,且站演与图演不一致时,默认按图演作为纠纷评判依据(特别声明或有商定除外); 4、在没有”无任何正当退款依据”的前提下,商品写有”一旦售出,概不支持退款”等类似的声明,视为无效声明; 5、在未拍下前,双方在QQ上所商定的交易内容,亦可成为纠纷评判依据(商定与描述冲突时,商定为准); 6、因聊天记录可作为纠纷评判依据,故双方联系时,只与对方在左子上所留的QQ、手机号沟通,以防对方不承认自我承诺。 7、虽然交易产生纠纷的几率很小,但一定要保留如聊天记录、手机短信等这样的重要信息,以防产生纠纷时便于左子介入快速处理。
查看详情

相关文章

猜你喜欢
发表评论
暂无评论
官方客服团队

为您解决烦忧 - 24小时在线 专业服务