Empowering NAS for AI Training with JuiceFS Direct-Mode NFS
DASWU
Posted on July 26, 2024
By offering multi-user network data access services, network-attached storage (NAS) greatly simplifies data sharing and management. While the Network File System (NFS) is a widely used protocol for achieving this kind of sharing, it often faces performance and consistency issues in complex AI training scenarios.
In its latest version 1.2, JuiceFS supports using NFS as the underlying storage in direct mode. This innovation allows JuiceFS to use NFS services on NAS without pre-mounting. With JuiceFS' direct-mode NFS feature, users can create JuiceFS file systems using existing NAS storage space without preparing additional object storage.
In this post, we’ll explore the benefits of direct-mode NFS storage, how JuiceFS uses NAS storage and caching to boost local AI model training, and the process of creating a JuiceFS file system using NFS storage.
Advantages of direct-mode NFS storage
Using NFS as the underlying storage for JuiceFS in direct mode has these advantages:
- No pre-mounting required: You can directly use NFS as the underlying storage for JuiceFS, eliminating the need for pre-mounting and simplifying configuration and management.
- High performance: JuiceFS enhances NFS storage performance through caching and pre-fetching, supporting high-concurrent read and write operations.
- Cross-platform sharing: JuiceFS can transform NFS storage into a distributed file system, enabling cross-platform sharing. It can be used not only on Linux, macOS, and Windows operating systems but also in container environments such as Hadoop, Kubernetes, and Docker.
How JuiceFS boosts local AI model training
With JuiceFS, users can store training data and model files on their existing NAS. Using JuiceFS’ distributed, high-performance, and highly available features, users can access this data simultaneously across multiple compute nodes. This enhances the efficiency of AI model training.
On the training servers, users can access NAS data through various methods such as JuiceFS mount points, S3 Gateway, WebDAV, CSI Driver, and Hadoop API. JuiceFS will automatically cache the data to improve training performance.
JuiceFS supports multiple caching strategies, allowing users to choose the appropriate one based on different scenarios to enhance training performance. For example, users can set the cache size using the --cache-size
parameter, specify the cache directory using the --cache-dir
parameter, and use the warmup strategy to warm up data. For more details on JuiceFS caching strategies, see JuiceFS Cache.
How to create a JuiceFS file system using NFS
It’s easy to create a JuiceFS file system using NFS storage. You only need to configure the NFS service on the NAS or file server and then specify the address of the NFS storage when JuiceFS creates the file system.
For example, using NFS storage with the NFSv3 protocol, create a JuiceFS file system with the following command on any computer with the JuiceFS client installed on the same network:
sudo juicefs format --storage nfs \
--bucket 192.168.1.88:/data/nfs \
redis://192.168.1.88/0 \
myjfs
In this code block:
-
--storage nfs
specifies the NFS storage. -
--bucket
specifies the address of the NFS storage. -
redis://192.168.1.88/0
specifies Redis as the metadata storage. -
myjfs
is the name of the file system.
For more information about direct mode of using NFS storage, see JuiceFS NFS.
Notes
When creating a JuiceFS file system using NFS as the storage layer, you need to pay attention to the following points:
- JuiceFS does not currently support the NFSv4 identity authentication mechanism, so you need to configure NFS storage according to the NFSv3 protocol. There is no need to specify
--access-key
and--secret-key
when creating a file system. - To give full play to the caching capabilities of JuiceFS, it’s recommended to prepare sufficient high-speed SSD space as a cache device on the server where the JuiceFS client is located to improve performance.
- NFS uses the
root_squash
mechanism by default, which maps operations performed by the root identity tonobody:nogroup
. Therefore, you need to configure permissions on the NFS server to ensure that the JuiceFS client has permission to access NFS storage.
Summary
JuiceFS 1.2 and later versions support using NFS as the underlying storage in direct mode. This allows JuiceFS to better work with NAS, improves JuiceFS' compatibility with NFS, and provides enterprises with an easier-to-use storage solution. Users can use existing storage resources to build a high-performance, highly available distributed file system locally to provide better support for AI model training, data analysis and other scenarios.
You’re welcome to try JuiceFS 1.2 and use NFS in direct mode to create a file system for empowering local AI model training.
If you have any questions for this article, feel free to join JuiceFS discussions on GitHub and our community on Slack.
Posted on July 26, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.