Machine learning on macOs using Keras -> Tensorflow (1.15.0) -> nGraph -> PlaidML -> AMD GPU
Xavier Rey-Robert
Posted on July 23, 2020
Since the unavailability of Cuda on macOS, choices to use GPUs for Machine learning on Macs are sparse.
After failing to find some practical ways to do it, I resorted to use a second Linux computer with an Nvidia GPU for training my networks.
The availability of macOS Catalina with Apple support for Navi AMD GPUs incited me to give it another try. This was quite tough so I decided to write it down to share the experience.
The easy way: Keras with PlaidML - No tensorflow involved
This is quite straight forward and I'm not going to cover it again here. You can check this article here : https://medium.com/@bamouh42/gpu-acceleration-on-amd-with-plaidml-for-training-and-using-keras-models-57a9fce883b9
In my case that was not satisfying. Here Keras is using PlaidML as a backend and I want to be able to use Kapre which requires a tensorflow backend. Kapre is a neat library providing keras layers to calculate melspectrograms on the fly.
Be aware that " Keras team steping away from multi-backends " so the Keras -> PlaidML approach might be a dead end anyway.
The journey to Tensorflow execution on mac GPUs / eGPUs
The key element here is nGraph. Without entering into details, nGraph is pursuing a neutral approach in supporting multiple frameworks (Tensorflow, ONNX, etc.) and multiple hardware targets (Intel CPU, NNPs, etc) and luckily for us (not so! just wait) nGraph was also integrated with PlaidML to offer support for GPUs (Intel, Nvidia and... AMD).
So on paper all is great, we have a way to go:
Keras -> Tensorflow -> nGraph -> nGraph-bridge -> PlaidML -> Metal -> AMD GPU.
In this domain like others, things are moving fast. So fast that it's not allways easy to keep pace and for the teams of those projects it's the same. There are a lot of involved sofware and things are changing so fast that developpers don't have time - or take time - to settle things down.
nGraph-bridge team hasn't been doing proper releases since August 2019 (v0.18.1) and while they are still activily working on the project they seem to have been focusing on big refactoring.
To make things worse PlaidML support was (silently) dropped from nGraph in April without much explanations or warning so forget about using the latest github master to try to sort it out ! I spend hours wondering why it wasn't working when it was simply not there anymore.
Why was PlaidML bridge droped ?
It seems that the futur path to hapyness will be Keras -> Tensorflow -> Mlir -> PlaidMl -> ... and all are preping for the jump when Mlir as tensorflow backend will be released ... in 2021! but as of today users are just left hanging in midair.
What are your options ?
At time of writing the latest release is ngraph-bridge v0.18.1 (dated 20 Aug 2019!). It's using tensorflow v1.14.0 - Argh! Kapre requirement is tensorflow v1.15 - Dead end again.
I should mention that you should better not use prebuilt wheels. I realized not all are compiled with PlaidML backend support. So your best chance is to Build nGraph and nGraph-bridge from sources and you'd rather have all stars aligned for that to happend flawlessly. A lot of things can go wrong: Python versions, bazel versions, libraries incompatibilities, bugs to fix in the code etc... all joys of pythons
Picking a release candidate to build
v0.19.0-rc9 brings Tensorflow v1.15.0, nGraph 0.28.0-rc1 - the recommended last stable baseline - is Tensorflow v1.14.0
I need TF15 so let's try with v.0.19.0-rc10 then... of course standard build miserably crash which lead me to think that this rc was probably never compiled/tested with plaidml support on mac as clang fails because of a non complete switch statement in plaidml_translate.cpp
We will fix it by adding this line to the to the switch(dt) in the tile_converter function:
case PLAIDML_DATA_BFLOAT16: return "as_bfloat16(" + tensor_name + ", 16)";
See The complete build instructions bellow.
If everything goes right you should end up with something like this:
TensorFlow version: 1.15.0
C Compiler version used in building TensorFlow: 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)
nGraph bridge version: b'0.19.0-rc10'
nGraph version used for this build: b'0.25.1-rc.10+90c70dd'
TensorFlow version used for this build: v1.15.0-rc3-22-g590d6eef7e
CXX11_ABI flag used for this build: 0
nGraph bridge built with Grappler: False
nGraph bridge built with Variables and Optimizers Enablement: False
Final thoughts - Use at your own risks
Ok, we have a working environment but they are so many imbricated (fresh) software bricks that we have no garantee that all this will run properly in all circumstances.
Using Kapre for exemple, I'm able to use the _mel_spectrogram_ layer just fine, but ngraph-bridge
will crash on a Caught exception while executing nGraph computation: syntax error when trying to use the STFT layer...
I will not abandon quite yet my linux deep learning work horse but at least I have an environment to try out that will use my Macbook pro GPU on the go and my Catalina / AMD RX 5700 XT setup at home.
The complete build instructions
I'm putting bellow what worked for me - I retested on a fresh mac after days of messing up -
Make sure you have a proper python3 installation (I wont cover it). I'm using 3.7 and using ‘‘‘brew install python@3.7 to manage it.‘‘‘
git clone https://github.com/tensorflow/ngraph-bridge.git
cd ngraph-bridge
git checkout v0.19.0-rc10
# Install bazel (bazelisk was a mess)
export BAZEL_VERSION=0.25.2
curl -LO "https://github.com/bazelbuild/bazel/releases/download/${BAZEL_VERSION}/bazel-${BAZEL_VERSION}-installer-darwin-x86_64.sh"
chmod +x "bazel-${BAZEL_VERSION}-installer-darwin-x86_64.sh"
./bazel-${BAZEL_VERSION}-installer-darwin-x86_64.sh --user
source ~/.bazel/bin/bazel-complete.bash
# Add $HOME/bin to your PATH in .zshrc (or .bashrc) and source it
echo "\nexport PATH=$PATH:$HOME/bin" >> ~/.zshrc
source ~/.zshrc
# check bazel
bazel version
# I like to start with a fresh venv dedicated to the build
python3 -m venv build-venv
source build-venv/bin/activate
# Recommended virtualenv v16.0.0 didn't work, I ended up using latest version
python3 -m pip3 install virtualenv
#Install tensorflow from wheel (find the right one here: https://pypi.org/project/tensorflow/1.15.0/#files)
python3 -m pip install https://files.pythonhosted.org/packages/dc/65/a94519cd8b4fd61a7b002cb752bfc0c0e5faa25d1f43ec4f0a4705020126/tensorflow-1.15.0-cp37-cp37m-macosx_10_11_x86_64.whl
#start the build
python3 build_ngtf.py --use_prebuilt_tensorflow --build_plaidml_backend
# When the build fails edit plaidml_translate.cpp from ngraph to add the missing case
vi /build_cmake/ngraph/src/ngraph/runtime/plaidml/plaidml_translate.cpp
#re-start the build
python3 build_ngtf.py --use_prebuilt_tensorflow --build_plaidml_backend
Some hints for the records:
When installing Kapre you might run into
AttributeError: module 'enum' has no attribute 'IntFlag
This is solved by removing enum34:
enum34 1.1.10
When importing Librosa, you might run into:
ModuleNotFoundError: No module named 'numba.decorators
This is solved by using an older version of numba:
pip install numba==0.48
Posted on July 23, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.