Object Detection on a Raspberry Pi

HeaderImageBlog (1).png
Image recognition has become a part of our daily lives, and the technology behind it is advancing at a steady pace. We thought it'd be cool to use the increasing speed and tiny size of lightweight computers like the Raspberry Pi, as well as the efficiency and portability of machine learning libraries such as Tensorflow, to create a standalone, handheld object detector.

The first step is to find out whether running live object detection on a small device such as the Raspberry Pi is viable; until recently the technology to detect multiple objects at the speed we require just wasn’t there. Luckily for us, the folks at Google Brain were kind enough to open-source their object detection API, which does just this.

The use cases for a portable object detector are many and varied - there are places where having a full PC set-up isn't viable, and where an internet connection may not be available for outsourcing the detection to the cloud. The Raspberry Pi is so lightweight that you can even mount it on a drone.

Raspberry Pi scanner

Initial Setup

To get started with object detection on the Raspberry Pi, you of course need to have a Raspberry Pi. We used a model 3, running Rasbian Jessie. You also need a camera attached to the Pi.  Once the pi is up and running and connected to a monitor (or through SSH), you can open up the terminal and install the pi camera by entering the following commands:

sudo apt-get update

sudo apt-get install python3-picamera

You can then test out the camera by running the raspistill command tool like so: raspistill –o filename.jpg 

Once we’ve confirmed that the hardware is working, we have to make sure we’ve got the Python Package Index installed:

sudo apt-get install python3-pip

This will allow us to install most of the required packages, though because the OS we’re using has a slightly limited package index, we’ll have to do a couple by hand.

Software Installation

There are a number of libraries you need to install to get object detection up and running, the main ones being Tensorflow, OpenCV, and the Object Detection API. Installing these on the Raspberry Pi is a little different to installing them on desktop Unix-like environments, so take care that any tutorials you’re following are going to be compatible with the version of Rasbian that you’re using.


Tensorflow is an open-source machine learning library developed by the Google Brain team. Tensorflow is the core of our object detection, and should be installed first.  Regular Tensorflow doesn’t run on the Raspberry Pi, so we’re going to use Sam Jabrahams TensorFlow on Raspberry Pi 3.

Detailed instructions are available on the Github page, but the main commands required are as follows:

sudo apt-get update
sudo apt-get install python3-pip python3-dev
wget https://github.com/samjabrahams/tensorflow-on-raspberry-pi/releases/download/v1.1.0/tensorflow-1.1.0-cp34-cp34m-linux_armv7l.whl
sudo pip3 install tensorflow-1.1.0-cp34-cp34m-linux_armv7l.whl
sudo pip3 uninstall mock

If you run into any errors, check out the official Github page.

object detection on beach

Object Detection API

Object detection comes as part of the official Tensorflow research models.  Its purpose is to detect multiple objects in single images.    

Clone this repository somewhere handy:

git clone https://github.com/tensorflow/models.git

Detailed installation instructions and troubleshooting for the Object Detection API can be found here.

We need a variety of libraries to run object detection, so we first need to install all of these:

sudo apt-get install protobuf-compiler
sudo pip3 install pillow
sudo pip3 install lxml
sudo pip3 install jupyter
sudo pip3 install matplotlib

If any of these fail, you may need to download the wheel file manually. These can be found on pypi.python.org.

I needed to do this for lxml, as well as a couple of other dependencies for OpenCV. These packages may have dependencies of their own, so if you have any trouble installing these through Pip, go to their official installation instructions and install them manually.

Official installation instructions for the above packages:

Pillow, Lxml, Jupyter, Matplotlib

We now need to compile the Object Detection API using Protobuf. Navigate to your tensorflow/models/research/ directory, and run the following:

sudo protoc object_detection/protos/*.proto --python_out=.

To run object detection, you’ll need to append two directories to your PYTHONPATH.  This can be added to the end of your ~/.bashrc file, or ran manually with each new terminal you run (From the tensorflow/models/research/ directory):

export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

Finally, test the installation with the following command:

python3 object_detection/builders/model_builder_test.py

Note:  I needed to add a copy of object_detection/data into my projects src/ directory for my project to be able to find the required files.


OpenCV is a powerful computer vision framework, containing a huge number of algorithms for processing and analysing images. We’re going to be using it for some its simpler features, but having the full set of tools available means we can later process our images if we want to. OpenCV is perhaps one of the more error-prone libraries to install on the Raspberry Pi, so take care during this step. You should be able to find a solution on Stack Overflow for most issues, but I’ll make note at the points that caused us grief. 

Initial Commands:

sudo apt-get update
sudo apt-get upgrade
sudo rpi-update
sudo reboot
sudo apt-get install build-essential git cmake pkg-config
sudo apt-get install libjpeg-dev libtiff5-dev libjasper-dev libpng12-dev
sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev
sudo apt-get install libxvidcore-dev libx264-dev
sudo apt-get install pkg-config
sudo apt-get install libgtk2.0-dev
sudo apt-get install libatlas-base-dev gfortran
cd ~
git clone https://github.com/Itseez/opencv.git
cd opencv
git checkout 3.1.0
cd ~
git clone https://github.com/Itseez/opencv_contrib.git
cd opencv_contrib
git checkout 3.1.0  


sudo apt-get install python3-dev
wget https://bootstrap.pypa.io/get-pip.py
sudo python3 get-pip.py
pip3 install numpy
cd ~/opencv
mkdir build
cd build
    -D CMAKE_INSTALL_PREFIX=/usr/local \
    -D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib/modules \
sudo make install
sudo ldconfig

Warning: sudo make install can take a while.

Tip: Depending on your environment, you may need to add sudo to the start of more of the commands above.  For example, if pip3 install numpy doesn’t work; try sudo pip3 install numpy instead. 

If you receive an error about the GTX version you are running when you attempt to use OpenCV:

  1. Navigate to your matplotlibrc file in usr/local/lib/python3.4/dist-packages/matplotlib/mpl-data/
  2. Find the backend line, remove the #, and change the line to: backend : TkAgg

The fun stuff

If all went well, we’ve now got the Object Detection API all ready to go, and we’ve got OpenCV available to display our detection. The Tensorflow team have provided a great tutorial for getting this up and running, which I have adapted here. 

We can now write a small python program to:

  1. Initialize object detection with a pre-trained model (a frozen inference graph)
  2. Stream camera frames
  3. Run object detection on each frame
  4. Output resulting image to an OpenCV window

First we import all that stuff we just installed.

import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile

import cv2

from picamera.array import PiRGBArray

import picamera

from collections import defaultdict
from io import StringIO
from PIL import Image

from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util

We now download a frozen inference graph – this is from the COCO (Common Objects in Context) dataset.  There are multiple types you can use here, of varying levels of speed and accuracy.  Because we are wanting to stream object detection, and are doing so on a raspberry pi, we use the fastest one; ssd_mobilenet_v1_coco_11_06_2017.  You can find the rest here.

 MODEL_NAME = 'ssd_mobilenet_v1_coco_11_06_2017' #fast 
 #MODEL_NAME = 'faster_rcnn_resnet101_coco_11_06_2017' #medium speed 
 MODEL_FILE = MODEL_NAME + '.tar.gz' 
 DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/' 
 PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb' 
 PATH_TO_LABELS = os.path.join('data''mscoco_label_map.pbtxt') 
 self.IMAGE_SIZE = (12, 8) 
 fileAlreadyExists = os.path.isfile(PATH_TO_CKPT) 
 if not fileAlreadyExists: 
      print('Downloading frozen inference graph') 
      opener = urllib.request.URLopener() 
      opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE) 
      tar_file = tarfile.open(MODEL_FILE) 
      for file in tar_file.getmembers(): 
          file_name = os.path.basename(file.name) 
          if 'frozen_inference_graph.pb' in file_name: 
              tar_file.extract(file, os.getcwd()) 

We can now use this to create our detection graph, the final piece of Tensorflow setup before we can begin our detection.

self.detection_graph = tf.Graph() 
 with self.detection_graph.as_default(): 
     od_graph_def = tf.GraphDef() 
     with tf.gfile.GFile(PATH_TO_CKPT, 'rb'as fid: 
         serialized_graph = fid.read() 
         tf.import_graph_def(od_graph_def, name='') 
 self.label_map = label_map_util.load_labelmap(PATH_TO_LABELS) 
 self.categories = label_map_util.convert_label_map_to_categories(self.label_map, max_num_classes=NUM_CLASSES, use_display_name=True) 
 self.category_index = label_map_util.create_category_index(self.categories) 

Set up the PiCamera, and create an array to store our streaming data in. 

 camera = picamera.PiCamera() 
 camera.resolution = (1280, 960) 
 camera.vflip = True 
 camera.framerate = 30 
 rawCapture = PiRGBArray(camera, size = (1280, 960)) 

Our main loop. This initializes each image, runs the Tensorflow session on it, and visualizes the detection boxes onto the image. It then displays the image in an OpenCV window. 

Note: The bottom part of this loop where we exit on a ‘q’ press is required for the OpenCV window to work, so make sure it’s there.

with self.detection_graph.as_default(): 
    with tf.Session(graph=self.detection_graph) as sess: 
        for frame in camera.capture_continuous(rawCapture, format="bgr"): 
            image_np = np.array(frame.array) 
            # Expand dimensions since the model expects images to have shape: [1, None, None, 3] 
            image_np_expanded = np.expand_dims(image_np, axis=0) 
            # Definite input and output Tensors for detection_graph 
            image_tensor = self.detection_graph.get_tensor_by_name('image_tensor:0') 
            # Each box represents a part of the image where a particular object was detected. 
            detection_boxes = self.detection_graph.get_tensor_by_name('detection_boxes:0') 
            # Each score represent how level of confidence for each of the objects. 
            # Score is shown on the result image, together with the class label. 
            detection_scores = self.detection_graph.get_tensor_by_name('detection_scores:0') 
            detection_classes = self.detection_graph.get_tensor_by_name('detection_classes:0') 
            num_detections = self.detection_graph.get_tensor_by_name('num_detections:0') 
            print('Running detection..') 
            (boxes, scores, classes, num) = sess.run( 
                [detection_boxes, detection_scores, detection_classes, num_detections], 
                feed_dict={image_tensor: image_np_expanded}) 
            print('Done.  Visualizing..') 
            cv2.imshow('object detection', cv2.resize(image_np, (1280, 960))) 
            if cv2.waitKey(25) & 0xFF == ord('q'): 


Objects detected TV, keyboard, cup, bed

Objects detected: tv (91%), keyboard (54%), cup (76%), bed (64%)

As you can see, the result is pretty good!  The object detection isn’t quite as accurate close up – it seems to think my hand is a bed - but everything else is pretty accurate. Keep in mind that the COCO dataset we're using is trained on a set of about 180 common objects, so you may need to further train this model using your own images if you want something a bit more specific. 

The above runs at about one frame per second on our Raspberry Pi, which isn’t too bad for real time object detection on such a small device. This could likely be optimised too – for example, the images could be simplified and processed before passing them into Tensorflow. 

All in all, we thought this was a pretty good result for a first attempt.  With a bit of optimisation, this could work in a variety of scenarios. You could mount a raspberry pi mounted inside of a robot, allowing it to navigate and recognise objects or people. You could attach a camera to the bottom of a drone, monitoring crops as it flies over. You could mount it onto a beehive, warning you of potential wasp attacks. I’m sure there are plenty of things you could do with portable object detection that we haven’t thought of, and I’m excited to see what this technology can offer in the future.

20171019_121136 (1).jpg

Tim King is part of Theta’s Innovation Lab, working on new and emerging technologies like the internet of things and mixed reality, as well as being involved with product development.