I’m sure many of you have heard or are familiar with the term DevOps – a development and operations paradigm that integrates everything from planning to coding, deployment and operation. Next to DevOps there is MLOps, which takes the previous paradigm a significant step forward.
With MLOps, you can create a machine learning (ML) solution that uses field data to self-train and iterate, enabling organizations to maximize reliability and efficiency while requiring less manual labor or fewer experts.
Through MLOps, data scientists can focus on testing and validating data with a performance model that tunes the solution in the frameworks of performance, accuracy, availability or any metric that is valuable to improvements.
Below outlines the general considerations to keep in mind when choosing the right cloud-native platform to host Kubeflow and a walkthrough of steps involved in installing upstream Kubeflow and configuration with the cloud-native persistent storage and networking stack.
Kubeflow is an open-source ML platform that runs natively on Kubernetes. The Kubeflow project has multiple distinct software components that each address specific stages of the ML lifecycle, including model development, model training, model serving, and the automated ML and CI/CD of models and data ecosystem. Kubeflow is ideal for data scientists who want to build and experiment with ML pipelines. It is also for ML engineers and operational teams who want to deploy ML systems in various environments for model development, testing and production-level serving using CI/CD automations.
Rakuten Cloud-Native Platform (Rakuten CNP) (formerly known as Symcloud Platform) provides a supercharged Kubernetes platform with native integration between cloud-native storage, cloud-native networking stack and also includes an application management system with full automation management of both clusters and applications. Rakuten CNP has the built-in capability to create managed application snapshots that enable cloning, backup and migration of applications between on-prem and cloud or between data centers within an enterprise.
Rakuten CNP fully automates the end-to-end cluster provisioning process for the most challenging platform deployments for several applications, including Kubeflow, and even custom application configurations.
There are various Kubeflow components that get deployed as part of Kubeflow installation. The below image outlines the components that interact with each other.
1. Install Rakuten Cloud-Native Platform - https://docs.robin.io/platform/5.4.1/install.html#
2. Setup metalLB load balancer to use a specific IP-pool range for the load balancer service.
MetalLB can be installed during Rakuten CNP installation or we can perform post install using:
3. Prepare the PVC YAML to reflect the right storage class, and config storage management options like replication, encryption, and mediatype.
4. Installing Kubeflow
Refer to the official Kubeflow install documentation:
Kubeflow release version: v1.6.0
(the latest release is https://github.com/kubeflow/manifests/tree/v1.6.1)
4.1. Download the Kubeflow release Tar file, extract and cd into manifests directory:
1.a. Download kustomize and add to the host PATH:
4.2. Using single command Kubeflow installation:
cd manifests
Check if all the pods are running:
4.3. Deploy MetalLB to have external IP for Kubeflow (Refer to step 2 if not already done)
Ensure external LB IP is allocated for Istio-ingress gateway service:
4.4. Check all Kubeflow features are working:
Go to browser and open the Kubeflow UI app using load balancer service IP:
http://10.9.232.xx
The default username/password for the Kubeflow application is
user@example.com/12341234
Create a Jupyter Notebook using PeristentVolume Claim:
Select the new workspace volume using a Rakuten CNP class.
Select the datavolume with accessmode as “ReadWriteMany” using Rakuten Cloud-Native Platform immediate storage class.
You can choose to create a custom volume with a PVC spec and launch a Jupyter Notebook.
The notebook will now have access to both PVCs, shared and local, provisioned by a CSI (storageclass).
PVC spec example:
Advanced PVC parameters can be configured by adding appropriate annotations to the spec file. (for advance storage options)
https://docs.robin.io/platform/latest/manage_storage.html#readwritemany-rwx-volumes
To further validate Kubeflow, you can:
In this article, we have covered several considerations needed to set up a multi-tenant deployment of Kubeflow and covered the step by step installation of Kubeflow on Rakuten CNP. Rakuten CNP is a fully integrated Kubernetes platform solution that comes with cloud-native storage, compute, and networking capabilities to run Kubeflow deployments at scale, that provides significant advantages for MLOPs applications such as Kubeflow.
Disclaimer - Please note that Rakuten Cloud-Native Platform does not directly support the Kubeflow application on its platform. However, this exercise is a good starting point for organizations looking to implement Kubeflow on Kubernetes in their MLOps journey.
For more information about exploring Rakuten Cloud-Native Platform and Storage, please visit: https://symphony.rakuten.com/cloud