Deploying Prodigy to Azure using Docker ๐
A quick guide on how to deploy the Prodigy data labeling tool to Azure Web Apps using Docker
Introduction ๐๐ฝ
This is a quick guide on how to deploy Explosionโs data annotation tool Prodigy. This guide is meant as a reference & assumes you know how to use Docker, Azure, and you know how to deploy/provision a database.
If you get stuck check out the Prodigy docs or post your question on Prodigyโs support forum.
Prodigy Docker Container ๐ณ
Here is a sample image that will spin up your Prodigy server when ran (this is the same one I use for deployments)
FROM python:stretch
ENV PYTHONUNBUFFERED 1
ENV PRODIGY_PORT=8000 # This is important!!! Remember the port number for the Web App step
ENV PRODIGY_LOGGING=verbose # I found the verbose logs helpful for debugging problems
COPY *linux_x86_64.whl /root # This is your prodigy wheel thats given to you
RUN pip install /root/prodigy*.whl
RUN python -m spacy download en_core_web_md # Download the language embeddings you need
RUN mkdir -p /root/.prodigy/ # store your config & settings here
COPY ProdigyLicenseKey.txt /root/.prodigy/
COPY prodigy.json /root/.prodigy/prodigy.json # Your prodigy settings
COPY ./your_data.jsonl /root/ # Your data
RUN pip install psycopg2 # Install your database ORM/driver here + any additional installs. We're using postgreSQL
# Put your prodigy task
ENTRYPOINT prodigy textcat.manual your_dataset_task_name ./root/your_data.jsonl --label your_labels_go_here
Prodigy.json ๐
Here is an example of how your prodigy.json might look like. I had problems with Azure Web Apps when not setting "host": "0.0.0.0"
. Change db_settings to db of your choice, refer to Prodigy documentation for how to do so.
{
"feed_overlap": true,
"host": "0.0.0.0",
"db": "postgresql",
"db_settings": {
"postgresql": {
"host": "yourdatabase.postgres.database.azure.com",
"dbname": "postgres",
"user": "admin-user@database-name",
"password": "thepassword"
}
}
}
Deployment ๐
Prereqs โ
- Dockerfile (like the one above)
- Your database (deployed wherever it is)
- prodigy.json (this should be inside your image)
Database ๐
Make sure you have your database configured beforehand. With prodigy you can have multiple instances running pointing to the same database working off of different datasets. You donโt need to spin up a new db with every image. Make sure your database accepts connections from the IP of your web app or the proper network permissions are in place. You will know your prodigy instance can connect to your database once you test it locally.
Docker ๐
You want to build & test your prodigy image, then push it to your container registry. For this tutorial weโre using Dockerโs public container registry. Might be best to use a private registery so nobody can pull your image and annotate your data ๐
-
Build your prodigy:
docker build -t <tag-name> .
-
Test it locally:
docker run -p 8000:8000 <tag-name>
-
Push it to your container registry:
docker push <tag-name>
Azure Web Apps โ
Navigate to the resource group you want to use.
-
Follow Add -> Marketplace -> Web App.
- On the Web App Creation Page under the Instance Details panel:
- Select
Docker Container
for Publish - Select
Linux
for OS
- Select
-
Configure everything else on this page for your needs.
-
On the Web App Docker page, configure it to pull from your Docker image. Make sure you donโt have any accidental spaces or misspellings (which has happened plenty of times to myself!)
-
Configure any other pages you need, but it is now good to hit
Review + create
-
Head over to your provisioned Web App resource
-
On the left side column panel. Navigate to Settings -> Configuration -> Application Settings
-
Add a new application setting
-
IMPORTANT! Name it
WEBSITES_PORT
and set the value to be the same port number as thePRODIGY_PORT
in your Dockerfile. So for in our exampleWEBSITES_PORT
would be set to8000
. If you forget to set this application setting, I have found prodigy not to work/load. - Save & restart your Web App
After some time, depending on the size of your container. You should be able to navigate to your URL & start annotating!