Easily Orchestrate Workflows: A Brief Discussion on How to Use Python to Call API Interfaces in DolphinScheduler

chen_debra_3060b21d12b1b0

Chen Debra

Posted on September 25, 2024

Easily Orchestrate Workflows: A Brief Discussion on How to Use Python to Call API Interfaces in DolphinScheduler

Recently, while working on a project for a large retail enterprise, a client inquired about using DolphinScheduler to orchestrate workflows with Python scripts. Many may have similar questions so this article will share the advantages of DolphinScheduler and its practical usage.

Why Should Enterprises Use DolphinScheduler for Data Development?

When enterprises engage in data development, a task scheduling platform plays a crucial role in automatically executing predefined tasks, making it an indispensable part of business operations.

As enterprise businesses rapidly evolve, there is a constant need to execute various types of tasks at scheduled intervals, often with complex interdependencies. However, using task scheduling platforms often presents challenges, such as:

  1. The historical accumulation of offline tasks may number over 10,000, raising concerns about platform stability.
  2. A large volume of incremental offline tasks necessitates a platform with good scalability and processing capability.
  3. Non-professional developers require a user-friendly configuration interface that supports SQL-like operations for an accessible experience.

These scenarios are common in the data development field. So how can we effectively address these challenges?

DolphinScheduler can effectively solve the above issues. This article will first highlight the advantages of DolphinScheduler and then provide a practical demonstration of using Python to call API interfaces.

What is DolphinScheduler and What Are Its Advantages?

Apache DolphinScheduler is a distributed, easy-to-scale, visual DAG workflow task scheduling open-source system designed for enterprise-level scenarios, providing a solution for visually operating tasks, workflows, and the entire lifecycle of data processing.

Image description

DolphinScheduler resolves the complexities of task dependencies in large data projects and provides applications with data orchestration for various operations. It tackles the intricate dependencies of data research and ETL processes, enabling monitoring of task health. DolphinScheduler organizes tasks in a DAG (Directed Acyclic Graph) format, allowing real-time monitoring of task execution status and supporting operations like retries, specified node recovery, pausing, resuming, and terminating tasks.

Key features of DolphinScheduler include:

  • Easy Deployment: Offers four deployment methods: Standalone, Cluster, Docker, and Kubernetes.
  • User-Friendly: Allows creation and management of workflows through four methods (Web UI, Python SDK, YAML files, and Open API) with visual DAGs and modular operations.
  • High Reliability: Decentralized architecture with multiple masters and slaves, natively supporting horizontal scaling.
  • Powerful Performance: Outperforms other orchestration platforms by several times, supporting millions of tasks daily.
  • Cloud-Native: Supports orchestration of multi-cloud/data center workflows and customizable task types.
  • High Scalability: Manages multiple tenants and online resource management, ensuring stable operation of up to 100,000 data tasks daily.

What Are the Benefits of Using Python to Write API Interfaces in DolphinScheduler?

Those frequently using ETL tools are likely familiar with using HTTP plugins to call APIs. However, as data security requirements have tightened, API providers often include custom encryption algorithms in request parameters, making standard HTTP plugins insufficient for security needs.

In the face of complex encryption algorithms, utilizing a programming language for implementation is undoubtedly the best choice. Here are several reasons to choose Python:

  • Python is easy to learn and has a rich ecosystem.
  • A single .py file can be directly called in DolphinScheduler, eliminating the need for complex language environments or installation.
  • API interfaces typically provide Python sample code, making it easier for developers to implement encryption algorithms.

Image description

After the 2.0.5 version update, Apache DolphinScheduler introduced Python API functionality, allowing users to orchestrate workflows with Python scripts for operations such as creating, updating, and scheduling workflows, greatly benefiting Python users.

Specific Practical Steps

  1. When creating a workflow, drag out the Python plugin, enter the node name, and write the corresponding Python script (the Python file in the resource section needs to be uploaded in point 2).

Image description

Image description

  1. The resource center’s file management allows uploading .py files as resources for use in workflow creation (the imported file in point 1 is sourced from the resource management section).

Image description

If your enterprise has scheduling needs, consider trying DolphinScheduler. Its diverse task types can meet the complex logic required in real-world scenarios.

I believe this content will help address common queries regarding using Python to call API interfaces in DolphinScheduler. If you wish to learn more about DolphinScheduler, feel free to join the DolphinScheduler open-source community group: https://s.apache.org/dolphinscheduler-slack

💖 💪 🙅 🚩
chen_debra_3060b21d12b1b0
Chen Debra

Posted on September 25, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related