JuiceFS 1.2: Introducing Enterprise-Grade Permission Management and Smooth Upgrades
DASWU
Posted on June 21, 2024
JuiceFS Community Edition 1.2 is released today! This marks the third major release since its open-source debut in 2021. This version is also a long-term support (LTS) release. We will continue to maintain versions 1.2 and 1.1, while version 1.0 will no longer receive updates.
JuiceFS is an open-source distributed file system designed for cloud environments, supporting 10+ metadata engines and 30+ data storage engines. This flexibility empowers users to adapt to diverse enterprise environments and data storage requirements. Moreover, JuiceFS is compatible with multiple access protocols, including POSIX, HDFS, S3, and WebDAV, and can serve as a Persistent Volume in Kubernetes. This ensures seamless data flow across different applications.
Licensed under Apache 2.0, JuiceFS Community Edition allows users to modify and enhance it according to their specific needs. This makes it suitable for various commercial environments.
This post provides a brief introduction to the new features and optimizations in JuiceFS 1.2. Feel free to download and try it out.
New features and optimizations
Over the past few years, JuiceFS has been widely adopted across various industries and use cases, particularly in the fields of AI and foundation models. To address complex permission management challenges in these massive data scenarios, JuiceFS 1.2 introduces several new features and optimizes existing features:
-
POSIX ACLs: Enables robust user permission management using Linux ACL tools (
setfacl
/getfacl
). - Smooth upgrades: Allows remounting JuiceFS at the same mount point to achieve seamless application upgrades without disruption. It also supports online adjustment of mounting parameters.
- Advanced S3 Gateway features: Introduces Identity and Access Management (IAM) and event notifications for enhanced security, flexibility, and automated data management and monitoring capabilities suitable for multi-user environments and complex application scenarios.
- JuiceFS Sync optimization: Enhances selective synchronization and performance optimizations for large directories and complex migration tasks, improving data synchronization efficiency.
More application scenarios
Support for NFS/Dragonfly/Bunny as object storage: Offers flexibility in selecting backend storage based on specific scenario requirements. JuiceFS now supports 40+ object storages.
Increased stability
Automatic disk failure detection and isolation: Uses the client's local hard disk for data caching, effectively boosting data access speeds in most scenarios. This release introduces automatic detection and isolation of faulty disks, ensuring system stability and minimizing disruptions to application operations in the event of hardware failures.
Optimized
dump
command and metadata auto-backup to improve metadata export performance: In the previous implementation, all key-value pairs were loaded into memory to speed up metadata export. This imposed significant memory pressure on large-scale file systems. In version 1.2, JuiceFS automatically selects a strategy based on the number of files. It chooses a file-by-file export approach when there are more than one hundred thousand files in total. Moreover, this strategy includes concurrent prefetching features to balance speed.
Enhanced usability
The
juicefs compact
command: Users can now manually perform compact operations on specified paths. In previous versions, users could use thejuicefs gc --compact
command to perform a global compact operation. This could reduce object storage capacity use as needed. However, for large-scale file systems, thisgc
command often took a long time, leading to a poor user experience. Therefore, we have introduced a newcompact
command in JuiceFS, allowing users to compact only the specified paths, thereby enhancing operational flexibility.New
--cache-expire
option in thejuicefs mount
command: The--cache-expire
option allows users to specify the expiration time for local data cache. Once the specified time expires, relevant cache data is automatically deleted. In the previous approach, cache cleanup was triggered only when the cache disk reached its capacity threshold. Compared to the previous method, the new option provides users with more flexible cache management choices.Optimized the
juicefs warmup
command: Enables users to manually clear cache blocks on specified paths and check the existing cache ratio on those paths. This facilitates more reliable and effective management of client data caching, thereby improving application cache hit rates.Background operation support for gateway/webdav: Allows users to run gateway/webdav as a daemon in the background, enhancing service availability and stability. It enables easier integration and usage of JuiceFS in various network environments.
Multiple usability enhancements: Includes more human-friendly formats for command-line parameters, such as direct use of “128K” and “4M” to specify block sizes, more monitoring metrics, and a more reliable debug information collection command.
Rapid community growth
Open-sourced in January 2021, JuiceFS has obtained 10k stars on GitHub. The latest version has seen 410 new issues, 464 merged pull requests, and 44 contributors.
Anonymous reports indicate ongoing rapid increases across user metrics. Our user base is steadily expanding, with 57% from Asia, 33% from the United States, and 10% from Europe.
New case studies:
- NAVER, Korea's No.1 Search Engine, Chose JuiceFS over Alluxio for AI Storage
- How Zhihu Ensures Stable Storage for LLM Training in Multi-Cloud Architecture
- BentoML Reduced LLM Loading Time from 20+ to a Few Minutes with JuiceFS
- From Object Storage to K8s+JuiceFS: 85% Storage Cost Cut, HDFS-Level Performance
- coScene Chose JuiceFS over Alluxio to Tackle Object Storage Drawbacks
- A Leading Self-Driving Company Chose JuiceFS over Amazon S3 and Alluxio in the Multi-Cloud Architecture
- Xiaomi: Building a Cloud-Native File Storage Platform to Host 5B+ Files in AI Training & More
Check out more user stories.
Upcoming features
We will gradually implement the following features in future versions. We welcome you to contribute together:
- Distributed data caching
- Support for Kerberos and Ranger
- User and group quotas
Try it out!
Welcome to download and try JuiceFS 1.2! If you have any questions, join JuiceFS discussions on GitHub and our community on Slack.
As JuiceFS enters its fourth year of open-source development, it has grown from a new brand to a widely adopted product. Originally supporting Hadoop in the cloud, JuiceFS has expanded its applications into AI training, inference, and beyond. Now it is an essential tool in many engineers’ daily workflows.
We are truly grateful for the invaluable contributions of our community members—your feedback, solutions, code contributions, and practical insights have been instrumental in our journey. Thank you for being part of the JuiceFS community!
Posted on June 21, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.