Having Fun with Azure Cognitive Services

It’s been a while since I’ve looked at Azure Cognitive Services, whilst racking my brains for my next experiment I wondered how easy it would be for me to use the Computer Vision API to analyze the the output of a display attached to my Raspberry Pi, more on that here.

You may be thinking…..why do this? You already know what is being written to the display so why over-complicate things? I was simply looking for a semi-practical use-case that would help me to learn Azure Cognitive Services and practice Python. I’m more of a hands-on learner and reading endless documentation and running samples isn’t the best way for me to learn, I need to play, experiment, and fail (a lot!) to really get to grips with things.

I have a camera attached to my Raspberry Pi so my idea was to:

  • Take a picture of the display using the camera attached to the Raspberry Pi.
  • Use the Computer Vision API in Azure Cognitive Services to analyse the picture taken of the display.
  • Return the analyzed text from the picture.

My first step was to position the camera and take a picture, this is straightforward using the picamera package. The example below takes a picture named “PicOfDisplay” and saves this to the desktop of the Pi.

import os
from picamera import PiCamera
camera = PiCamera()

Here is an example picture of the display.

I then adapted the sample code found here which uses Python to call the REST API for the Computer Vision Service.

My final solution can be found below, this does the following:

  • Submits the picture taken with the Raspberry Pi camera (PicofDisplay.jpg) to the Computer Vision Service (using Python and the REST API).
  • Poll the service until the analysis has been completed (this process runs asynchronously) and stores the results of the analysis in a list named “lines”.
  • Outputs the text that the Computer Vision Service has identified from the picture that relates to the display, as you can see below the raw output of the “lines” list includes text found on the board of the Pi, which I’m not so interested in 😀. If I ever change the position of the camera or the Pi, I’d need to tweak this as the order of identified text could vary in the “lines” list

Actual script output:

The full script in all of it’s glory.

import requests
import time
from io import BytesIO
import json
#Extract text from image
url = "https://uksouth.api.cognitive.microsoft.com/vision/v3.0/read/analyze" # Replace with the appropriate endpoint
key = "Azure Cognitive Services Key" # Enter the key
image_path = "/home/pi/Desktop/PicOfDisplay.jpg"
image_data = open(image_path, "rb").read()
headers = {"Ocp-Apim-Subscription-Key" : key,'Content-Type': 'application/octet-stream'}
r = requests.post(url,headers = headers, data=image_data)
operation_url = r.headers["Operation-Location"]
# The recognized text isn't immediately available, so poll to wait for completion.
analysis = {}
poll = True
while (poll):
    response_final = requests.get(r.headers["Operation-Location"], headers=headers)
    analysis = response_final.json()
    print(json.dumps(analysis, indent=4))
    if ("analyzeResult" in analysis):
        poll = False
    if ("status" in analysis and analysis['status'] == 'failed'):
        poll = False
lines = []
for line in analysis["analyzeResult"]["readResults"][0]["lines"]:
print(lines[0] + " " + lines[1] + " " + lines[2]) # The data that I'm interested in (from the display) is found within the first three entries of the list.

I could of course taken this a step further and incorporated logic to take a pic and then submit to the Computer Vision Service automatically and run this in a continuous loop, however as this was more of a proof of concept I didn’t see the need.