Search

Home > Python Bytes > #20 Finding similar but not identical images in 128 bits via Python
Podcast: Python Bytes
Episode:

#20 Finding similar but not identical images in 128 bits via Python

Category: Technology
Duration: 00:23:48
Publish Date: 2017-04-05 03:00:00
Description:

Sponsored by Rollbar, thank you! rollbar.com/pythonbytes

#1 Brian: Duplicate image detection with perceptual hashing in Python

  • Ben Hoyt
  • From Jetsetter.com, Invitation-Only Travel Community
  • We use a perceptual image hash called dHash (“difference hash”), which was developed by Neal Krawetz in his work on photo forensics. It’s a very simple but surprisingly effective algorithm that involves the following steps (to produce a 128-bit hash value)
    • Convert the image to grayscale
    • Downsize to a 9x9 square of gray values (or 17x17 for a larger, 512-bit hash)
    • Calculate the “row hash”: for each row, move from left to right, and output a 1 bit if the next gray value is greater than or equal to the previous one, or a 0 bit if it’s less (each 9-pixel row produces 8 bits of output)
    • Calculate the “column hash”: same as above, but for each column, move top to bottom
    • Concatenate the two 64-bit values together to get the final 128-bit hash
  • Fast: Python is not very fast at bit twiddling, but all the hard work of converting to grayscale and downsizing is done by a C library: ImageMagick+wand or PIL.
  • Available via github: https://github.com/Jetsetter/pybktree

#2 Michael: Google Open Source/Python

  • subprocess32: A reliable subprocess module for Python 2
  • Grumpy: A Python to Go transcompiler and runtime
  • Python Fire: Automatically turns any Python object or module into a command line interface (CLI)
  • Python Client for Google Maps Services: Python client library for Google Maps API Web services
  • Hyou: Pythonic Interface to manipulate Google Spreadsheet
  • oauth2l: A simple CLI tool to get an OAuth token
  • mock_maps_apis: Small AppEngine application that can mock some of the Google Maps APIs
  • TensorFlow: TensorFlow is a fast, flexible, and scalable open source machine learning library

#3 Brian: How to Handle Missing Data with Python

  • Jason Brownlee
  • Real-world data often has missing values.
  • Data can have missing values for a number of reasons such as observations that were not recorded and data corruption.
  • Handling missing data is important as many machine learning algorithms do not support data with missing values.

#4 Michael: hug REST framework

  • Drastically simplify API development over multiple interfaces
  • With hug, design and develop your API once, then expose it however your clients need to consume it (locally, over HTTP, or through the command line)
  • hug is the fastest and most modern way to create APIs on Python3
  • hug has been built from the ground up with performance in mind.
    • It is built to consume resources only when necessary
    • compiled with Cython to achieve amazing performance
  • Built in version management
  • Automatic documentation
  • Annotation powered validation
  • Write once. Use everywhere (CLI, Python package, Web API)

#5 Brian CLI with Click

Total Play: 0

Some more Podcasts by Michael Kennedy

500+ Episodes
Talk Python .. 300+     50+