Our journey integrating Qdrant in Zerops

Vector databases are revolutionizing how data is managed and stored for AI applications. At Zerops, we recognized the growing importance of vector databases, leading us to integrate Qdrant, one of the most popular options available. While it might seem straightforward to spin up a Qdrant instance using a Docker container, the reality of managing a production-ready vector database is far more complex. In this article, we'll explore the intricacies of Qdrant implementation and how we've addressed the challenges at Zerops.

Understanding Data in Qdrant

In Qdrant, data is organized into collections, which are essentially named sets of vectors. Each collection contains vectors that share the same dimensionality and are compared using a specific metric. Collections in Qdrant are further divided into shards—independent stores of vectors that can handle all the operations provided by collections.

This architecture allows for distributing the workload across multiple Qdrant nodes in a cluster, enhancing performance and reliability. To ensure high availability (HA), shards can be replicated across nodes. Understanding this structure is crucial for grasping the complexities of deployment and management.

Deployment Options: Balancing Simplicity and Reliability

At Zerops, we offer support for both HA and non-HA modes of Qdrant, giving you the flexibility to choose the configuration that best suits your needs.

Non-HA Mode: Simplicity for Non-Production Projects

The Non-HA mode is perfect for non-production projects where data persistence isn't critical. This setup involves a single Qdrant node, making it easy to install and manage. In Zerops, deploying Qdrant in Non-HA mode is straightforward:

We run the Qdrant binary in a Ubuntu container.
Port 6333 is opened for communication.

Even if a data outage occurs, our backup solution ensures minimal disruption.

HA Cluster: Reliability for Production Environments

For production environments where reliability and data availability are crucial, the HA cluster mode is essential. Setting up an HA cluster can be complex, but we've streamlined the process in Zerops:

Cluster Configuration: We enable cluster mode by adding enabled: true to the configuration file and configure peer-to-peer communication on port 6335:

   cluster:
     enabled: true
     p2p:
       port: 6335

Node Setup: Building a cluster requires careful configuration of each node:
- The first node runs with a --uri <uri> flag to let other peers know how to be reached.
- We make the address of the first peer node1.db.<service_name>.zerops available in local DNS.
- All other peers start with the --bootstrap <cluster_uri> flag to locate the rest of the cluster.
High Availability: Qdrant uses the RAFT protocol, which requires more than 50% of the nodes to be functional. We automatically set up 3 nodes per cluster to meet this requirement.
Automatic Replication: By default, we automatically create replicas of any collection across all nodes. This safeguards against data loss if a node fails and serves as a safety net for incorrect replication_factor configurations. This approach ensures data safety even if the replication_factor is incorrectly configured by the user. If desired, this feature can be disabled by setting the automaticClusterReplication parameter to false.
Failure Recovery: If a node fails, we automatically start a new node, connect it to the cluster, and create replicas of each collection and shard on the new node.
Node Cleanup: We streamline this process by utilizing the Qdrant API to identify the current cluster Leader node, which then efficiently handles the removal of failed nodes.

Overcoming Technical Challenges

During the integration of Qdrant with Zerops, we encountered and solved several complex issues. Fortunately, Qdrant offers robust tools to monitor and manage clusters through its well-documented REST and gRPC APIs.

One issue we faced was that when a new node was added to the cluster, it appeared fully operational but couldn't receive new replicas. This is often due to ongoing data transfers between nodes, which can take considerable time. To address this, Qdrant provides the POST /cluster/recover endpoint, which can be triggered on any non-leader node. This endpoint sends a request to the current leader to create a snapshot. The leader then sends this snapshot back to the requesting node for application. This snapshot captures the cluster's agreed-upon state at a specific point in time, allowing the cluster to recover and synchronize.

Data Backup: Safeguarding Your Qdrant Data

At Zerops, we prioritize the safety and security of your data by providing:

Daily, automatic backups at no extra cost
Backups in the form of encrypted disk snapshots for each collection
Secure upload to our S3-compatible backup storage
Retention of up to 100 backups per stack for a maximum of one month
Current maximum storage size per project of 25 GiB

You also have the flexibility to choose between different backup options, including one-time backups, regular backups with a customizable frequency, or even disabling backups entirely.

Our backup retention policy is designed to provide comprehensive coverage. For example, if you back up every hour, you'll have up to 4 days of backups available. This ensures that you have access to recent versions of your data in case of any issues.

The Value of Managed Vector Databases

While it's possible to set up and manage Qdrant yourself, doing so properly requires significant expertise and resources. Our managed Qdrant solution allows you to leverage the power of vector databases without the operational complexities. You can focus on developing AI features, while we ensure your vector database is running optimally, is highly available, and is protected against data loss.

Ready to harness the power of vector databases without the operational headaches? Give Zerops a try

Blog