15 Computer Visions Projects You Can Do Right Now

Computer vision deals with how computers extract meaningful information from images or videos. It has a wide range of applications, including reverse engineering, security inspections, image editing and processing, computer animation, autonomous navigation, and robotics.

In this article, we’re going to explore 15 great OpenCV projects, from beginner-level to expert-level. For each project, you’ll see the essential guides, source codes, and datasets, so you can get straight to work on them if you want.

Read also

What is Computer Vision?

Computer vision is about helping machines interpret images and videos. It’s the science of interacting with an object through a digital medium and using sensors to analyze and understand what it sees. It’s a broad discipline that’s useful for machine translation, pattern recognition, robotic positioning, 3D reconstruction, driverless cars, and much more.

The field of computer vision keeps evolving and becoming more impactful thanks to constant technological innovations. As time goes by, it will offer increasingly powerful tools for researchers, businesses, and eventually consumers.

Computer Vision today

Computer vision has become a relatively standard technology in recent years due to the advancement of AI. Many companies use it for product development, sales operations, marketing campaigns, access control, security, and more.

Computer vision today

Computer vision has plenty of applications in healthcare (including pathology), industrial automation, military use, cybersecurity, automotive engineering, drone navigation—the list goes on.

How does Computer Vision work?

Machine learning finds patterns by learning from its mistakes. The training data makes a model, which guesses and predicts things. Real-world images are broken down into simple patterns. The computer recognizes patterns in images using a neural network built with many layers.

The first layer takes pixel value and tries to identify the edges. The next few layers will try to detect simple shapes with the help of edges. In the end, all of it is put together to understand the image.

Computer vision how it works

It can take thousands, sometimes millions of images, to train a computer vision application. Sometimes even that’s not enough—some facial recognition applications can’t detect people of different skin colors because they’re trained on white people. Sometimes the application might not be able to find the difference between a dog and a bagel. Ultimately, the algorithm will only ever be as good as the data that was used for training it.

OK, enough introduction! Let’s get into the projects.

Beginner level Computer Vision projects

If you’re new or learning computer vision, these projects will help you learn a lot.

1. Edge & Contour Detection

If you’re new to computer vision, this project is a great start. CV applications detect edges first and then collect other information. There are many edge detection algorithms, and the most popular is the Canny edge detector because it’s pretty effective compared to others. It’s also a complex edge-detection technique. Below are the steps for Canny edge detection:

  1. Reduce noise and smoothen image,
  2. Calculate the gradient,
  3. Non-maximum suppression,
  4. Double the threshold,
  5. Linking and edge detecting – hysteresis.

Code for Canny edge detection:

import cv2 import matplotlib.pyplot as plt # Open the image img = cv2.imread('dancing-spider.jpg') # Apply Canny edges = cv2.Canny(img, 100, 200, 3, L2gradient=True) plt.figure() plt.title('Spider') plt.imsave('dancing-spider-canny.png', edges, cmap='gray', format='png') plt.imshow(edges, cmap='gray') plt.show()

Contours are lines joining all the continuous objects or points (along the boundary), having the same color or intensity. For example, it detects the shape of a leaf based on its parameters or border. Contours are an important tool for shape and object detection. The contours of an object are the boundary lines that make up the shape of an object as it is. Contours are also called outline, edges, or structure, for a very good reason: they’re a way to mark changes in depth.

Contour detection - computer vision

Code to find contours:

import cv2 import numpy as np # Let's load a simple image with 3 black squares image = cv2.imread('C://Users//gfg//shapes.jpg') cv2.waitKey(0) # Grayscale gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # Find Canny edges edged = cv2.Canny(gray, 30, 200) cv2.waitKey(0) # Finding Contours # Use a copy of the image e.g. edged.copy() # since findContours alters the image contours, hierarchy = cv2.findContours(edged, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE) cv2.imshow('Canny Edges After Contouring', edged) cv2.waitKey(0) print("Number of Contours found hljs-comment" style="color: rgb(153, 153, 136); font-style: italic;"># Draw all contours # -1 signifies drawing all contours cv2.drawContours(image, contours, -1, (0, 255, 0), 3) cv2.imshow('Contours', image) cv2.waitKey(0) cv2.destroyAllWindows()

Recommended reading & source code:

2. Colour Detection & Invisibility Cloak

This project is about detecting color in images. You can use it to edit and recognize colors from images or videos. The most popular project that uses the color detection technique is the invisibility cloak. In movies, invisibility works by doing tasks on a green screen, but here we’ll be doing it by removing the foreground layer. The invisibility cloak process is this:

  1. Capture and store the background frame (just the background),
  2. Detect colors,
  3. Generate a mask,
  4. Generate the final output to create the invisible effect.

Invisibility cloak - computer vision

It works on HSV (Hue Saturation Value). HSV is one of the three ways that Lightroom lets us change color ranges in photographs. It’s particularly useful for introducing or removing certain colors from an image or scene, such as changing night-time shots to day-time shots (or vice versa). It’s the color portion, identified from 0 to 360. Reducing this component toward zero introduces more grey and produces a faded effect.

Value (brightness) works in conjunction with saturation. It describes the brightness or intensity of the color, from 0–100%. So 0 is completely black, and 100 is the brightest and reveals the most color.

Recommended reading & source code:

3. Text Recognition using OpenCV and Tesseract (OCR)

Here, you use OpenCV and OCR (Optical Character Recognition) on your image to identify each letter and convert them into text. It’s perfect for anyone looking to take information from an image or video and turn it into text-based data. Many apps use OCR, like Google Lens, PDF Scanner, and more.

Ways to detect text from images:

Text recognition - computer vision

Related

Text Detection using OpenCV

Sample code after processing the image and contour detection:

# text detection def contours_text(orig, img, contours): for cnt in contours: x, y, w, h = cv2.boundingRect(cnt) # Drawing a rectangle on copied image rect = cv2.rectangle(orig, (x, y), (x + w, y + h), (0, 255, 255), 2) cv2.imshow('cnt',rect) cv2.waitKey() # Cropping the text block for giving input to OCR cropped = orig[y:y + h, x:x + w] # Apply OCR on the cropped image config = ('-l eng --oem 1 --psm 3') text = pytesseract.image_to_string(cropped, config=config) print(text)

Text Detection with Tesseract

It’s an open-source application that can recognize text in 100+ languages, and it’s backed by Google. You can also train this application to recognize many other languages.

Code to detect text using tesseract:

# text recognition import cv2 import pytesseract # read image im = cv2.imread('./testimg.jpg') # configurations config = ('-l eng --oem 1 --psm 3') # pytesseract text = pytesseract.image_to_string(im, config=config) # print text text = text.split('n') text

Recommended reading & datasets:

4. Face Recognition with Python and OpenCV

It’s been just over a decade since the American television show CSI: Crime Scene Investigation first aired. During that time, facial recognition software has become increasingly sophisticated. Present-day software isn’t limited by superficial features like skin or hair color—instead, it identifies faces based on facial features that are more stable through changes in appearance, like eye shape and distance between eyes. This type of facial recognition is called “template matching”. You can use OpenCV, Deep learning, or a custom database to create facial recognition systems/applications.

Process of detecting a face from an image:

Face recognition - computer vision

Check also

Below is the full code for recognizing faces from images:

import cv2 import face_recognition imgmain = face_recognition.load_image_file('ImageBasics/Bryan_Cranst.jpg') imgmain = cv2.cvtColor(imgmain, cv2.COLOR_BGR2RGB) imgTest = face_recognition.load_image_file('ImageBasics/bryan-cranston-el-camino-aaron-paul-1a.jpg') imgTest = cv2.cvtColor(imgTest, cv2.COLOR_BGR2RGB) faceLoc = face_recognition.face_locations(imgmain)[0] encodeElon = face_recognition.face_encodings(imgmain)[0] cv2.rectangle(imgmain, (faceLoc[3], faceLoc[0]), (faceLoc[1], faceLoc[2]), (255, 0, 255), 2) faceLocTest = face_recognition.face_locations(imgTest)[0] encodeTest = face_recognition.face_encodings(imgTest)[0] cv2.rectangle(imgTest, (faceLocTest[3], faceLocTest[0]), (faceLocTest[1], faceLocTest[2]), (255, 0, 255), 2) results = face_recognition.compare_faces([encodeElon], encodeTest) faceDis = face_recognition.face_distance([encodeElon], encodeTest) print(results, faceDis) cv2.putText(imgTest, f' ', (50, 50), cv2.FONT_HERSHEY_COMPLEX, 1, (0, 0, 255), 2) cv2.imshow('Main Image', imgmain) cv2.imshow('Test Image', imgTest) cv2.waitKey(0)

Code to recognize faces from webcam or live camera:

cv2.imshow("Frame", frame) if cv2.waitKey(1) & 0xFF == ord('q'): break video_capture.release() cv2.destroyAllWindows()

Recommended reading & datasets:

5. Object Detection

Object detection is the automatic inference of what an object is in a given image or video frame. It’s used in self-driving cars, tracking, face detection, pose detection, and a lot more. There are 3 major types of object detection – using OpenCV, a machine learning-based approach, and a deep learning-based approach.

Object detection - computer vision

May interest you

Below is the full code to detect objects:

import cv2 # Enable camera cap = cv2.VideoCapture(0) cap.set(3, 640) cap.set(4, 420) # import cascade file for facial recognition faceCascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml") ''' # if you want to detect any object for example eyes, use one more layer of classifier as below: eyeCascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_eye_tree_eyeglasses.xml") ''' while True: success, img = cap.read() imgGray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Getting corners around the face faces = faceCascade.detectMultiScale(imgGray, 1.3, 5) # 1.3 = scale factor, 5 = minimum neighbor # drawing bounding box around face for (x, y, w, h) in faces: img = cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 3) ''' # detecting eyes eyes = eyeCascade.detectMultiScale(imgGray) # drawing bounding box for eyes for (ex, ey, ew, eh) in eyes: img = cv2.rectangle(img, (ex, ey), (ex+ew, ey+eh), (255, 0, 0), 3) ''' cv2.imshow('face_detect', img) if cv2.waitKey(10) & 0xFF == ord('q'): break cap.release() cv2.destroyWindow('face_detect')

Recommended reading & datasets:

Intermediate level Computer Vision projects

We’re taking things to the next level with a few intermediate-level projects. These projects will probably be more fun than beginner projects, but also more challenging.

6. Hand Gesture Recognition

In this project, you need to detect hand gestures. After detecting the gesture, we’ll assign commands to them. You can even play games with multiple commands using hand gesture recognition.

How gesture recognition works:

Pose detection - computer vision

Recommended reading & datasets:

Pose detection 2 - computer vision

8. Road Lane Detection in Autonomous Vehicles

If you want to get into self-driving cars, this project will be a good start. You’ll detect lanes, edges of the road, and a lot more. Lane detection works like this:

Road detection - computer vision

Road lane detection - computer vision

Recommended reading & datasets:

9. Pathology Classification

Computer vision is emerging in healthcare. The amount of data that pathologists analyze in a day can be too much to handle. Luckily, deep learning algorithms can identify patterns in large amounts of data that humans wouldn’t notice otherwise. As more images are entered and categorized into groups, the accuracy of these algorithms becomes better and better over time.

It can detect various diseases in plants, animals, and humans. For this application, the goal is to get datasets from Kaggle OCT and classify data into different sections. The dataset has around 85000 images. Optical coherence tomography (OCT) is an emerging medical technology for performing high-resolution cross-sectional imaging. Optical coherence tomography uses light waves to look inside a living human body. It can be used to evaluate thinning skin, broken blood vessels, heart diseases, and many other medical problems.

Over time, it’s gained the trust of doctors around the globe as a quick and effective way of diagnosing more quality patients than traditional methods. It can also be used to examine tattoo pigments or assess different layers of a skin graft that’s placed on a burn patient.

Pathology classification - computer vision

Code for Gradcam library used for classification:

from tf_explain.callbacks.occlusion_sensitivity import OcclusionSensitivityCallback import datetime %load_ext tensorboard log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S") tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1) o_callbacks = [OcclusionSensitivityCallback(validation_data=(vis_test, vis_lab),class_index=2,patch_size=4),] model_TF.compile(optimizer=keras.optimizers.Adam(lr=0.001), loss='binary_crossentropy', metrics=[fbeta]) model_TF.fit(vis_test, vis_lab, epochs=10, verbose=1, callbacks=[tensorboard_callback, o_callbacks])

Recommended reading & datasets:

10. Fashion MNIST for Image Classification

One of the most used MNIST datasets was a database of handwritten images, which contains around 60,000 train and 10,000 test images of handwritten digits from 0 to 9. Inspired by this, they created Fashion MNIST, which classifies clothes. As a result of the large database and all the resources provided by MNIST, you get a high accuracy range from 96-99%.

This is a complex dataset containing 60,000 training images of clothes (35 categories) from online shops like ASOS or H&M. These images are divided into two subsets, one with clothes similar to the fashion industry, and the other with clothes belonging to the general public. The dataset contains 1.2 million samples (clothes and prices) for each category.

Fashion mnist - computer vision

Recommended reading & datasets:

Advanced level Computer Vision projects

Once you’re an expert in computer vision, you can develop projects from your own ideas. Below are a few advanced-level fun projects you can work with if you have enough skills and knowledge.

11. Image Deblurring using Generative Adversarial Networks

Image deblurring is an interesting technology with plenty of applications. Here, a generative adversarial network (GAN) automatically trains a generative model, like Image DeBlur’s AI algorithm. Before looking into this project, let’s understand what GANs are and how they work.

Read also

Generative Adversarial Networks is a new deep-learning approach that has shown unprecedented success in various computer vision tasks, such as image super-resolution. However, it remains an open problem how best to train these networks. A Generative Adversarial Network can be thought of as two networks competing with one another; just like humans compete against each other on game shows like Jeopardy or Survivor. Both parties have tasks and need to come up with strategies based on their opponent’s appearance or moves throughout the game, while also trying not to be eliminated first. There are 3 major steps involved in training for deblurring:

Image deblurring - computer vision

Recommended reading & datasets:

12. Image Transformation

With this project, you can transform any image into different forms. For example, you can change a real image into a graphical one. This is kind of a creative and fun project to do. When we use the standard GAN method, it becomes difficult to transform the images, but for this project, most people use Cycle GAN.

Check also

The idea is that you train two competing neural networks against each other. One network creates new data samples, called the “generator,” while the other network judges whether it’s real or fake. The generator alters its parameters to try to fool the judge by producing more realistic samples. In this way, both networks improve with time and continue to improve indefinitely – this makes GANs an ongoing project rather than a one-off assignment. This is a different type of GAN, it’s an extension of GAN architecture. What Cycle Gan does is create a cycle of generating the input. Let’s say you’re using Google Translate, you translate English to German, you open a new tab, copy the german output and translate German to English—the goal here is to get the original input you had. Below is an example of how transforming images to artwork works.

Image transformation - computer vision

Recommended reading & source code:

13. Automatic Colorization of Photos using Deep Neural Networks

When it comes to coloring black and white images, machines have never been able to do an adequate job. They can’t understand the boundary between grey and white, leading to a range of monochromatic hues that seem unrealistic. To overcome this issue, scientists from UC Berkeley, along with colleagues at Microsoft Research, developed a new algorithm that automatically colorizes photographs by using deep neural networks.

Deep neural networks are a very promising technique for image classification because they can learn the composition of an image by looking at many pictures. Densely connected convolutional neural networks (CNN) have been used to classify images in this study. CNN’s are trained with large amounts of labeled data, and output a score corresponding to the associated class label for any input image. They can be thought of as feature detectors that are applied to the original input image.

Colourization is the process of adding color to a black and white photo. It can be accomplished by hand, but it’s a tedious process that takes hours or days, depending on the level of detail in the photo. Recently, there’s been an explosion in deep neural networks for image recognition tasks such as facial recognition and text detection. In simple terms, it’s the process of adding colors to grayscale images or videos. However, with the rapid advance of deep learning in recent years, a Convolutional Neural Network (CNN) can colorize black and white images by predicting what the colors should be on a per-pixel basis. This project helps to colorize old photos. As you can see in the image below, it can even properly predict the color of coca-cola, because of the large number of datasets.

Automatic colorization - computer vision

Recommended reading & guide:

14. Vehicle Counting and Classification

Nowadays, many places are equipped with surveillance systems that combine AI with cameras, from government organizations to private facilities. These AI-based cameras help in many ways, and one of the main features is to count the number of vehicles. It can be used to count the number of vehicles passing by or entering any particular place. This project can be used in many areas like crowd counting, traffic management, vehicle number plate, sports, and many more. The process is simple:

And finally, vehicle counting:

Plate scanner - computer vision

Recommended reading & datasets:

Conclusion

And that’s it! Hope you liked the computer vision projects. As a cherry on top, I’ll leave you with several extra projects that you might also be interested in.

Extra projects

Additional research and recommended reading