Tensor Lib Separation: Optimizing TT-Metal For Enhanced Flexibility
Hey everyone, let's dive into a cool project that's all about making the Tensor library even better and more flexible within the TT-Metal ecosystem. We're talking about separating the Tensor library from TT-NN (which is short for Tenstorrent Neural Network) to create a dedicated library that sits above the Metal layer. The main goal? To give you, the users, more control and flexibility, especially if you want to use the Metal software stack but don’t necessarily need all the Ops that come with the TT-NN library. Sounds interesting, right?
Why Separate Tensor from TT-NN?
So, why go through the hassle of separating things? Well, there's a few good reasons. First off, it's all about keeping things clean and focused. Imagine your Metal library is like a really awesome toolbox. If you start throwing in tools that aren’t directly related to Metal itself, it can get cluttered pretty quickly. By creating a separate Tensor library, we avoid bloating Metal with concepts that are foreign to its core functionality. This keeps Metal lean and mean, which is always a good thing.
This separation also gives us a ton of flexibility down the road. Let's say we want to integrate Tensor even more closely with Metal's read/write APIs in the future. With a dedicated Tensor library, it becomes a whole lot easier. Instead of having to work with host and device buffers directly, users could interact with Tensor objects. Think of it as a super-powered data manipulation tool that provides a clean, rich, and convenient way to handle your data. This will make your lives a whole lot easier, trust me!
What Does a Meaningful Tensor Library Need?
To make this new Tensor library truly useful, we need to pack it with some serious features. It's not just about the core Tensor functionality; we need a whole suite of tools to make it sing. Here’s what we're thinking:
Host-Side Ops for Data Manipulation
First up, we need a bunch of host-side operations. These are the tools you'll use on your main computer (the host) to get your data ready for processing. We're talking about operations like:
- Tilization: Breaking down your data into smaller, manageable chunks.
- Padding: Adding extra elements to your data to make sure everything lines up nicely.
- Dtype Conversions: Changing the data types of your numbers (e.g., from integers to floating-point numbers).
- Reshaping: Changing the dimensions of your data (e.g., from a 2D matrix to a 3D tensor).
These host-side ops are crucial for preparing your data and getting it in the right shape for the device. On the flip side, the device-side operations will stay with TT-NN. This emphasizes TT-NN's role as a comprehensive collection of operations, ensuring we keep those specialized device-side computations in their rightful place.
Supporting Headers
Next, we need to provide some crucial supporting headers. These are like the blueprints and instructions that make everything work together seamlessly. We're looking at:
- Memory Configuration: Setting up how memory is allocated and managed.
- Page Configuration: Organizing how data is stored in pages.
- Alignment: Ensuring that data is aligned correctly in memory.
- Tensor Layout: Defining how tensors are structured and organized.
these headers are the backbone for efficient memory management and data organization, ensuring smooth operation and top performance.
Xtensor-Based Library for Multi-Device Operations
For multi-device sharding and replication, we're leaning on an Xtensor-based library. This allows for the efficient distribution of tensor data across multiple devices.
Serialization Library
We're also including a serialization library based on FlatBuffers. This will provide the necessary tools to efficiently serialize and deserialize your data, making sure everything is in the right format for processing. FlatBuffers is a proven technology that is well-suited for this job, so we’re confident that it will be the perfect fit.
Miscellaneous Utilities
Finally, we need some handy utilities like squeeze
and unsqueeze
. These little tools help you modify the shape of your tensors by removing or adding dimensions. They are small but super useful.
Key Changes and Considerations
Now, let's talk about the nuts and bolts of making this separation happen. Moving all the relevant Tensor headers into a directory like tt_tensor/api/
is pretty straightforward. However, there are some important changes we need to make:
Xtensor Dependency in Metal
First, we need to move all the Xtensor-based utilities to Metal. This means we’ll need to add a new dependency on Xtensor. But don't worry, this dependency will be private, which means it won’t affect the public interfaces or make things overly complicated for you guys.
Flatbuffer-Based Serialization
We'll also need to move all the flatbuffer-based serialization libraries to Metal. Good news here: flatbuffer is already a private dependency of Metal, so there won’t be any changes to how packaging is handled.
Refactoring Host-Side Ops
Finally, we need to refactor some of the host-side operations. Specifically, we're going to extract the host-side implementation of an Op from within the decorated Op. We're looking at the ones in ttnn/cpp/ttnn/operations/core/core.hpp
(layout, dtype conversion) and ttnn/operations/data_movement/reshape_view/reshape.hpp
. The goal here is to separate the host-side and device-side functionalities, making the code cleaner and more organized.
Conclusion
By separating the Tensor library from TT-NN, we're taking a big step towards a more flexible and efficient system. This change will enable us to leverage the strengths of both Metal and Tensor in ways that weren’t possible before. With host-side ops, supporting headers, Xtensor, FlatBuffers, and handy utilities, the new Tensor library will be a powerhouse for data manipulation. We're excited about the possibilities this opens up and we think you will be too. Stay tuned for more updates on how this project is progressing. Thanks for reading!