Deploy Private Azure Managed Endpoint with Custom Environment

Alexandra Posoldova
Jul 21, 2023
5 min read

Updated: Oct 1

Azure released a new version of Machine Learning library that is completely different from its earlier SDK. Whit this release, Managed Endpoint came out of public preview to full release and I am very excited to share this tutorial on how to deploy a Machine Learning model as web service using Managed Endpoint and a custom Docker environment.

But first, let me explain why Managed Endpoint is the new hot way to consume your predictions and what other options Azure has to offer. Before Managed Endpoint was publicly released, there were two options for the ML model to be consumed in real time as a web service:

ACI — Using Azure Container Instances, you can deploy a model to a fixed size container instance. The set up is fairly simple, but this type of deployment is not recommended for production use, because its limited configuration. This deployment lacks secure networking configuration and cannot be scaled. More about this deployment can be found here. The code for deploying this web service can be found in my git.
Kubernetes — Azure offers python SDK to deploy a model on a Kuberentes cluster. Kubernetes solves all of the problems of the ACI and it better suited for production. Despite the complexity of Kubernetes, the deployment code using azureml sdk is fairly simple. If your cloud infrastructure has a simple network set up, this service might be for you. But if your network includes a number of subnets, there will be a lot of network related issues to figure out to make the kubernetes web service functional.

Managed Endpoint has the best of both worlds. It is easy to deploy, scalable and network configurable without the networking struggles that come with kubernetes.

Private Managed Endpoint Deployment

The new azureml SDK uses clients to interact with services and entities to perform operations of these services. This set up is more consistent with the rest of the azure infrastructure than was the previous version of azureml SDK.

Following code imports the necessary libraries and sets up the MLclient to gain access to ML Workspace where the ML model is registered.

# import required libraries
from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    Model,
    Environment,
    CodeConfiguration,
)
from azure.identity import DefaultAzureCredential

Use DefaultAzureCredential to get authorisation token. If you log in using your account, this token will be fetched using AD. If you are deploying this service from AzureDevops, you will need Service Principal credentials available during deployment using following naming convention: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET. More on service principal here.

def get_ml_client(env_variables):
"""
Get authentication token to access MLClient resource. ML CLient is defined by 
following ML workspace variables: subscription id, resourcegroup and workspace name.
Args:
  env_variables (dict): environment variables containding subscription id, 
                resorce group and workspace name used to establish MLCient. 
"""

    credential = DefaultAzureCredential(exclude_shared_token_cache_credential=True)
    ml_client = MLClient(
        credential, env_variables['WS_SUBSCRIPTION_ID'], env_variables['WS_RESOURCE_GROUP'], env_variables['WS_NAME']
    )

    return ml_client

Following function sets up and deploys a private managed endpoint using a small size instance to run. Notice that the endpoint has public_network_accessset to “enabled” while the egress_public_network_access is “disabled”. Public network access parameter in the ManagedOnlineEndpoit function allows the endpoint to be called from outside of the private network setup. While the egress_public_network_access parameter in the ManagedOnlineDeployment manages endpoint’s access to azure resources. If your ML workspace disables public network access, your endpoint deployment must also disable public network access. This type of set up creates a private secure endpoint through wich azure resources are accessed.

def deploy_webservice(ml_client, env_variables):
    """
    Deploys azure managed endpoint as a private endpoint.
    Args:
        ml_client (azure.ai.ml.MLClient): azure MLClient resource
        env_variables (dict): environment variables
    """

    env = ml_client.environments.get(env_variables['RUN_ENV_NAME'], version=env_variables['RUN_ENV_VERSION'])
    print('updated webservice name ', env)
    model = ml_client.models.get(env_variables['MODEL_NAME'], version=env_variables['MODEL_VERSION'])
    endpoint_name = env_variables['ENDPOINT_NAME']
    source_directory = os.path.join(os.getcwd(), os.environ['ARTIFACT_PATH'], 'src')

    # create an online endpoint
    private_endpoint = ManagedOnlineEndpoint(
        name=endpoint_name,
        description="this is a sample online endpoint",
        auth_mode="key",
        public_network_access="enabled",
        tags={"foo": "bar"}
    )

    private_deployment = ManagedOnlineDeployment(name=f'{endpoint_name}-deployment',
                                                 endpoint_name=endpoint_name,
                                                 model=model,
                                                 code_configuration=CodeConfiguration(
                                                     code=source_directory, scoring_script="score.py"
                                                 ),
                                                 environment=env,
                                                 instance_type='Standard_DS3_v2',
                                                 instance_count=1,
                                                 egress_public_network_access="disabled"
                                                 )

    ml_client.online_endpoints.begin_create_or_update(private_endpoint).result()
    ml_client.begin_create_or_update(private_deployment)

    private_endpoint.traffic = {f'{endpoint_name}-deployment': 100}  # allocate 100% of traffic to blue deployment
    ml_client.online_endpoints.begin_create_or_update(private_endpoint)

The deployment and traffic allocation might take some time the first time the endpoint is deployed. You can check whether the deployment is ready to be called with the following code.

endpoint = ml_client.online_endpoints.get(name=env_variables['ENDPOINT_NAME'])
assert endpoint.provisioning_state == 'Succeeded'

while endpoint.traffic[env_variables['DEPLOYMENT_NAME']] == 0:
    endpoint = ml_client.online_endpoints.get(name=env_variables['ENDPOINT_NAME'])
    time.sleep(30)

print('Endpoint is ready to be called.')

Custom Environment for Private Managed Endpoint

The deployment uses a custom environment registered in ML workspace. Follow steps in my previous post on how to create and publish custom Docker environment to ACR. Note that the ML workspace has an azure container registry associated with it. When publishing the environment, make sure this environment is created as a repository within the ACR instance used by ML workspace.

Following code creates a custom environment from Docker image published in ACR.

from azure.ai.ml.entities import Environment

env = Environment(
    image="{container_name}.azurecr.io/{repository_name}",
    name='custom-env',
    description='Custom environment'
    )

ml_client.environments.create_or_update(env)

Scoring Script

The scoring script passed to the ManagedOnlineDeployment function has two parts:

init() function is invoked when the managed endpoint is deployed. This part typically contains initialisation of resources such as the prediction model registered in azure ML workspace and any resources needed for prediction from blob storage. These resources are being passed to the prediction part of script as global variables.
run() function is invoked when the managed endpoint is called. Data is received in json format, processed and prepared for prediction. Model stored as a global variable in the previous steps is invoked for prediction and output is formatted.

import joblib
import json
import logging
import os
import numpy as np
import pandas as pd
import traceback


def jsonifys(status_code=200, **kwargs):
    response = jsonify(**kwargs)
    response.status_code = status_code
    return response


def init():
    """
    This function is called when the container is initialized/started, typically after create/update of the deployment.
    You can write the logic here to perform init operations like caching the model in memory
    """

    global model
    # AZUREML_MODEL_DIR is an environment variable created during deployment.
    # It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)

    model_path = os.path.join(
        os.getenv("AZUREML_MODEL_DIR"), "model_name"
    )
    model = joblib.load(model_path)
    logging.info("Init complete")


def run(raw_data):
    """
    This function is called for every invocation of the endpoint to perform the actual scoring/prediction.
    """

    try:
      logging.info("model 1: request received")
      data = json.loads(raw_data)["data"]
      result_data = np.array(data)
      
      result_df = pd.DataFrame(columns=['Prediction'])
      result_df['Prediction'] = model.predict(result_data)
      logging.info("Request processed")
              return jsonifys(status_code=200, body= results_df.to_json(orient='index'))
    except Exception as e:
        print(e)
        return jsonifys(status_code=400, error=traceback.format_exc())

Final Words

This post covers the deployment of Managed Online Endpoint as a private endpoint with a custom environment. This is just scratching the surface when it comes to the potential managed endpoint has. In the next post, I will focus scaling and blue/geen deployment used to test new deployments on a subset of users before fully rolling out changes.

More about the managed endpoint can be found in azure tutorial.

Deploy Private Azure Managed Endpoint with Custom Environment

Private Managed Endpoint Deployment

Custom Environment for Private Managed Endpoint

Scoring Script

Final Words

Recent Posts

Comments

Contact Us