Custom Environment with Connection to SQL Database for Azure Machine Learning Pipelines

Alexandra Posoldova
Jul 21, 2023
5 min read

Updated: Oct 1

Azure Machine Learning library is a great tool to build training and prediction pipelines. Microsoft offers great tutorials and lot of examples in their git . The real challenge starts when there is a need to deploy a custom ML pipeline into a custom environment. There are many ways to create Azure ML Environment for the ML pipelines using azureml Python SDK . But there is a lack of tutorials describing use and deployment of custom environments. Additionally, access to SQL database is essential for any data processing and modelling. Therefore this custom environment includes installation of ODBC libraries necessary to connection to SQL database from the python script.

In this article, I will focus on:

creating environment using Docker file
deploying custom environment to batch inference
deploying custom environment to ACI web service

All of this will be achieved by using infrastructure as code approach.

Docker File with ODBC Drivers

First, I am going to explain steps involved in creating containerised environment using Docker. I will explain steps involved in building this environment and share some potential pitfalls to be aware of when you will be debugging your deployment.

The strength of cloud services is that it provides a layer of abstraction and you do not (or should not) need to manage all of the dependences. To take advantage of this, I am using AzureML curated environment here as base for this Docker image. I chose on with LightGBH libraries installed as my prediction model is using it. In addition, this curated environment uses Ubuntu 18.04 and python 3.7. So these dependences are managed for me by Azure.

FROM mcr.microsoft.com/azureml/curated/lightgbm-3.2-ubuntu18.04-py37-cpu:45

Next, we install ODBC driver.

If you do not specify version of Ubuntu, Azure may default to 16.04 that is no longer supported, and your container may randomly crash.

Same goes for python. If not specified, Azure will default to python 3.6 and install python libraries there. So, if you get error message that library that you specified in requirements file does not exist. Check where your python libraries are really installed. Therefore, I recommend using curated environments as base image.

These are some of the issues I experienced during deployment. They might have been mitigated by Microsoft in future updates.

# apt-get and system utilities
RUN apt-get update && apt-get install -y \
   curl apt-transport-https debconf-utils gnupg2

RUN apt-get update && apt-get install -y --no-install-recommends \
    unixodbc-dev \
    unixodbc \
    libpq-dev

# adding custom MS repository
RUN curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
RUN curl https://packages.microsoft.com/config/ubuntu/18.04/prod.list > /etc/apt/sources.list.d/mssql-release.list

# install SQL Server drivers and tools
RUN apt-get update && ACCEPT_EULA=Y apt-get install -y msodbcsql18 mssql-tools18
RUN echo 'export PATH="$PATH:/opt/mssql-tools18/bin"' >> ~/.bashrc
RUN /bin/bash -c "source ~/.bashrc"


RUN apt-get -y install locales \
    && rm -rf /var/lib/apt/lists/*
RUN locale-gen en_US.UTF-8
RUN update-locale LANG=en_US.UTF-8

Next step specifies version of python and its path. Some of these steps may seem like duplicates, but I found them necessary to be there in order to avoid multiple python paths being created and additional requirements being installed to python 3.6 as default setting by Microsoft.

RUN apt-get update && \
  apt-get install -y --no-install-recommends python3 python3.7-distutils && \
  ln -sf /usr/bin/python3 /usr/bin/python

Next, we install python libraries required for the prediction script to run using pip.

# install dependencies
RUN pip install --upgrade pip==20.1.1

WORKDIR /
COPY requirements.txt .
RUN pip install -r requirements.txt

Following commands are just sanity checks that you can use to debug your container deployment to see what libraries were really installed.

RUN pip freeze
RUN conda --version
RUN python3 --version

Now putting it all together, click here to see full Dockerfile.

Deploy Dockerfile to Azure Container Registry

For this step, you will need to have programmatic access set up for your Azure account. In case you don’t have it, you can read about signing in with Azure CLI here.

Open your terminal and go to the directory where your Dockefile is located. Log in to your azure account. This may open a web browser for authentication.

You can either build your image in an existing container or create one. More about container registry CLI prompts here. Last steps build image from the Dockerfile specified in the previous step.

az login
az acr create --resource-group myResourceGroup --name mycontainerregistry --sku Basic
az acr build --image myImageName:Version --registry mycontainerregistry --file Dockerfile .

Create Run Configuration for Azure ML Pipeline

To use the containerised environment, we deployed in the previous step in Azure ML Pipelines, we need to create a run configuration. Following script uses environment variables that are passed to script either from a securely stored local file or from Azure DevOps variable group. The reason for this is that these variables contain sensitive information such as username and password. Because of that, they should never be part of the script for security reasons.

from azureml.core.runconfig import RunConfiguration
from azureml.core.container_registry import ContainerRegistry


sql_run_config = RunConfiguration()
image_registry_details = ContainerRegistry()
image_registry_details.address = f"{env_variables['IMAGE_USER_NAME']}.azurecr.io"
image_registry_details.username = env_variables['IMAGE_USER_NAME']
image_registry_details.password = env_variables['IMAGE_PWD']
sql_run_config.environment.docker.base_image_registry = image_registry_details

# this is an image in the image_registry
sql_run_config.environment.docker.base_image = env_variables['IMAGE_NAME']
sql_run_config.environment.python.user_managed_dependencies = True

Now we can pass this run configuration to Azure ML pipeline. There are many tutorials from Microsoft on how to create Azure ML pipeline. This step is pretty straight forward and you can follow Micsoroft tutorials for the full deployment so I will not go in depth on how to do this. If you would like me to elaborate, please let me know in comments.

Pipeline_step = PythonScriptStep(name = ‘Step1’,
                                  source_directory = experiment_folder,
                                  script_name = "script.py",
                                  compute_target=inference_cluster,
                                  runconfig=sql_run_config,  # custom run configuration,
                                  allow_reuse=False)

Register Custom Environment in Azure ML Worspace

You can use the Dockerfile created in previous step to register your custom environment in Azure ML Workspace using python azureml library. This environment can then be used to create web service with azureml libraries. In this example I will show how to deploy web service as Azure Container Image. Note that this deployment is not recommended for production ML web services due to its security limitations and lack of scalability. The recommended service is managed online endpoint. The use and deployment of custom environment is the same as for the ACI web service. I would like to dedicate a separate post to managed online endpoint so for now, you can read about it here.

First, you need to connect to your Azure ML Workspace where the environment will be registered.

ws = Workspace.from_config()

If you running these commands from the ML Workspace, then this command will use config.json file to get credentials. You can download this file to connect to the Workspace locally, or pass the credentials directly. Again, for security reasons, I recommend to securely store these credentials and do not hard code them.

myenv = Environment(name=env_name)
myenv.from_docker_image(env_name, image_name, container_registry=container_name)
myenv.register(ws)
myenv.build()

I do not recommend using command myenv.from_dockerfile(), because the requirements do not install in the correct version of python. This results in “module not found” error when deploying the webservice.

Following code shows how to use the custom environment to deploy ML webservice using ACI.

inference_config = InferenceConfig(
    entry_script='script.py',
    source_directory=source_directory,
    environment=myenv)

deployment_config = AciWebservice.deploy_configuration(cpu_cores=1.8,
                                                       memory_gb=7,
                                                       dns_name_label=os.environ['DNS_NAME'],
                                                       enable_app_insights=True)
service = Model.deploy(ws, os.environ['WEB_SERVICE_NAME'], [model], inference_config, deployment_config, overwrite=True)
service.wait_for_deployment(True)

The code example uses environment variables that are same as before passed to the script either locally or as part of Azure DevOps variable group.

You can find full script in my git repository.

Final Words

I hope this tutorial helps ML Engineers with their Azure Machine Learning deployments. I am planning to post more tutorials focusing on Azure ML deployments and building CI/CD pipelines for ML services in Azure DevOps. I always prefer Infrastructure as Code approach due to its reproducibility. If there is an area you would like me to focus on, please let me know in comments.

Custom Environment with Connection to SQL Database for Azure Machine Learning Pipelines

Docker File with ODBC Drivers

Deploy Dockerfile to Azure Container Registry

Create Run Configuration for Azure ML Pipeline

Register Custom Environment in Azure ML Worspace

Final Words

Recent Posts

Comments

Contact Us