Introducing Elyra pipelines with custom component support
Patrick Titzler
Posted on August 10, 2021
The Elyra open source project for JupyterLab aims to simplify common data science tasks. Its most popular feature is the Visual Pipeline Editor, which is used to create pipelines without the need for coding. You can run these pipelines in JupyterLab or on Kubeflow Pipelines or Apache Airflow.
Elyra 3.0 extends the pipeline capabilities by adding experimental support for custom components. Before I dive into specifics and outline why support is still experimental in the initial releases, let's recap a few concepts.
A pipeline comprises nodes that are connected with each other to form a graph. The graph defines dependencies between the nodes, governing the order in which the nodes are executed. The example pipeline shown below executes a Python script and several Jupyter notebooks.
Nodes are implemented using components. To create the pipeline shown above, you'll need components that can execute Python scripts and Jupyter notebooks. Most components are configurable to make them re-usable. For file-based components, such a configuration might include the file name and the container image where the file is executed in.
In Elyra, the processing of Jupyter notebooks, Python scripts, and R scripts is implemented using a single component. This component is referred to as a generic component because it is supported in all runtime environments.
The pipeline editor then exposes this component under different names in the palette, which is located on the left-hand side. (You can add nodes to the pipeline by selecting a component from the palette and dropping it on the canvas.)
Pipelines that only include generic components are referred to as generic pipelines because you can run them in any runtime environment Elyra supports.
Take a look at the tutorials if you are new to Elyra and would like to learn more about how to use the Visual Pipeline Editor to create a pipeline. If you've used Elyra before, we recommend reviewing the recently published Best practices topic in the User Guide. We've only now gotten around to documenting some of the things that make your life easier!
Experimental support for custom components
Custom components are similar to generic components in that they only implement a single task, such as load data, train a model, or send an email. However, these components are only supported for Kubeflow Pipelines
and Apache Airflow and are implemented in a runtime specific form.
The screen capture below depicts the pipeline editor for Apache Airflow pipelines. The palette, shown on the left, is by default divided into two categories — one for generic components and one for custom components. Note the Airflow specific components in the second category, such as the BashOperator and the SimpleHttpOperator, which process a bash command and an HTTP request, respectively.
Pipelines that utilize custom components are called runtime specific pipelines because it is not possible to run a pipeline that was created for Kubeflow Pipelines on Apache Airflow and vice versa.
Get started with pipelines
Once you've installed Elyra, it is easy to get started with pipelines. The JupyterLab launcher now includes tiles for each pipeline type: one for generic pipelines, and one for each supported pipeline runtime platform.
Click the desired pipeline editor tile and you are ready to compose a pipeline from the components that are supported for the selected platform.
To get you going quickly, the component registry includes a few example custom components for each runtime platform. The Elyra examples GitHub repository includes information about those components and pipelines that illustrate their usage. These components are included for illustrative purposes only. Unless stated otherwise, the components were not created by the Elyra community and are therefore provided as is.
Opportunities for growth
In the initial 3.0 release, Elyra’s support for custom components is rather limited. Many features are still under development, planned for a future release, or in the backlog without a specific target release. Some of the high priority features for the next releases are:
- Data exchange between custom components: Components commonly produce outputs that other components require as input. Currently, custom components are isolated from each other and cannot exchange data. (Data exchange between generic components is already supported.)
- Data exchange between generic components and custom components: Same as above.
- Manage component registry: Provide a UI and/or CLI that allows for the addition, editing, or deletion of components. Currently, components can only be managed manually.
For an up-to-date feature status please refer to this forum thread.
Use Watson Studio services in pipelines
Pipelines can also take advantage of external services using custom components. If you are looking for a managed solution for Watson Studio services, check out this IBM Watson Studio Pipeline article. It illustrates how to run notebooks, refine data, run AutoAI experiments, and deploy a model.
Your opportunity to help us improve Elyra
Elyra is a fairly new open source project that is currently maintained by a small community of JupyterLab enthusiasts. We welcome contributions of any kind, such as feedback, bug reports, bug fixes, features, or documentation. To learn more about how you can make a difference refer to the Getting help topic in the documentation.
On behalf of the community: Thank You!
Posted on August 10, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.