Durable Python: Building Bullet-Proof Long-Running Workflows, Made Simple

haimzlato

haimzlato

Posted on September 4, 2024

Durable Python: Building Bullet-Proof Long-Running Workflows, Made Simple

In modern software development, creating robust workflows that connect APIs from various services and handle both synchronous and asynchronous events is a common challenge. The conventional approach involves using a mix of queues, microservices, and state management systems to build scalable applications. While effective, this architecture comes with significant overhead: setting up and maintaining infrastructure like message queues, running servers or lambda functions, managing state in databases, and implementing complex error-handling mechanisms.

What if there was a simpler, more reliable way to handle long-running workflows without the hassle of managing all this infrastructure? That's the goal of Durable Python, to try it, register to Beta.

The Problem with Naive Solutions for Long-Running Processes

Imagine you want to monitor pull requests (PRs) in GitHub. Each time a new PR is opened, you’d like to create a dedicated Slack channel for discussion and send daily reminders until the PR is closed or merged. This sounds straightforward, so you might think you can solve it with a basic Python function (here’s a basic Python function generated by ChatGPT):

@app.route('/webhook', methods=['POST'])
def github_webhook():
    data = request.json
    if 'pull_request' in data and data['action'] == 'opened':
        pr_number = data['pull_request']['number']
        pr_url = data['pull_request']['html_url']
        # Create a new Slack channel for the PR
        channel_id = create_slack_channel(pr_number)
        send_slack_notification(channel_id, pr_number, pr_url)
        # Periodically check the PR status and send reminders until it's closed or merged
        while True:
            time.sleep(3600)  # Wait for 1 hour before checking the status again
            pr_status = check_pr_status(pr_number)
            if pr_status == 'open':
                send_slack_notification(channel_id, pr_number, pr_url)
            else:
                break
    return jsonify({'status': 'ok'})
Enter fullscreen mode Exit fullscreen mode

This code snippet seems to handle the task, but it's only suitable for the “happy flow” scenario. In real-world applications, this naive approach falls short. The while loop relies on continuous server uptime, which isn’t guaranteed. Processes can crash, servers can restart, and suddenly, your workflow is broken.

Real-World Solution: Event-Driven Applications

A more reliable approach involves building an event-driven application. Here, you would use queues to listen for GitHub events, cron jobs to send reminders, databases to store the PR and channel state, and functions to handle these events. Typically, this setup runs on cloud infrastructure, leveraging services like AWS Lambda for deployment and execution.

While this method is feasible and robust, it also requires considerable setup, maintenance, and expertise. Managing the infrastructure, ensuring uptime, and handling error states demand significant resources and a skilled team.

Enter Durable Python: Simplicity Meets Reliability

What if you could combine the simplicity of the naive Python code with the reliability of an asynchronous design? What if Python could guarantee that even if a process crashes or the server restarts, it would pick up right where it left off?

AutoKitteh addresses precisely this challenge with Durable Python. Using Durable Python, the user writes Python code while the system ensures that if a process restarts, it continues running from the same point. While there are limitations (e.g., long downtime might not be ideal), for most use cases, this solution works perfectly.

What Durable-Python Offers

Durable-Python saves you from managing state manually, allowing you to write your workflow as a continuous flow rather than an event-driven state machine, which can be challenging to build and debug. AutoKitteh, as an infrastructure, has built-in queues and integrations with external applications and APIs, making it easy to quickly develop robust workflows in Python.

How It Works

There’s no magic involved—just solid engineering. AutoKitteh is powered by Temporal, a framework for building durable workflows. Temporal requires a specific way of coding, including an understanding of determinism, idempotency, and other concepts to ensure reliability. AutoKitteh abstracts these complexities, allowing developers to write standard Python code. Under the hood, any function with side effects is converted into a Temporal activity. As a developer, you don’t have to worry about these details—just focus on writing the business logic.

For more technical details, refer to the AutoKitteh documentation.

Is There a Cost?

Of course, every abstraction has a price. Under the hood, Durable Python records the flow of the workflow to enable recovery after failure, which incurs some storage and performance costs.

Durable Python is designed for the orchestration of APIs rather than building data applications. If you need high-performance applications, you should consider building a custom solution. However, if you want to quickly develop reliable workflows with minimal development and infrastructure investment, Durable Python might be a good option.

Real-World Applications

Durable Python can be applied to a wide range of workflows, particularly in domains where reliability is crucial, such as:

  • API orchestration - build internal reliable workflows.
  • DevOps Automation: Automate deployment pipelines or code review automation with guaranteed recovery from failures.
  • ChatOps: Integrate with chat platforms to automate team notifications and manage workflows.
  • MLOps: Ensure long-running machine learning workflows continue seamlessly despite interruptions.

Examples to worflows can be found here.

Conclusion: Less Code, Less Hassle

Durable Python concept, implemented powered by AutoKitteh, empowers developers to build, deploy, and manage reliable workflow automation with minimal code. The durable execution and seamless recovery are handled behind the scenes, so you can focus on what truly matters—your business logic.

While there are many excellent tools for achieving durability (like Temporal and Restate), Durable-Python provides a fast, simple, and cost-effective way to achieve the same results.

💖 💪 🙅 🚩
haimzlato
haimzlato

Posted on September 4, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related