|
Description:
|
|
Sponsored by Rollbar, thank you! rollbar.com/pythonbytes
#1 Brian: Duplicate image detection with perceptual hashing in Python
- Ben Hoyt
- From Jetsetter.com, Invitation-Only Travel Community
- We use a perceptual image hash called dHash (“difference hash”), which was developed by Neal Krawetz in his work on photo forensics. It’s a very simple but surprisingly effective algorithm that involves the following steps (to produce a 128-bit hash value)
- Convert the image to grayscale
- Downsize to a 9x9 square of gray values (or 17x17 for a larger, 512-bit hash)
- Calculate the “row hash”: for each row, move from left to right, and output a 1 bit if the next gray value is greater than or equal to the previous one, or a 0 bit if it’s less (each 9-pixel row produces 8 bits of output)
- Calculate the “column hash”: same as above, but for each column, move top to bottom
- Concatenate the two 64-bit values together to get the final 128-bit hash
- Fast: Python is not very fast at bit twiddling, but all the hard work of converting to grayscale and downsizing is done by a C library: ImageMagick+wand or PIL.
- Available via github: https://github.com/Jetsetter/pybktree
#2 Michael: Google Open Source/Python
- subprocess32: A reliable subprocess module for Python 2
- Grumpy: A Python to Go transcompiler and runtime
- Python Fire: Automatically turns any Python object or module into a command line interface (CLI)
- Python Client for Google Maps Services: Python client library for Google Maps API Web services
- Hyou: Pythonic Interface to manipulate Google Spreadsheet
- oauth2l: A simple CLI tool to get an OAuth token
- mock_maps_apis: Small AppEngine application that can mock some of the Google Maps APIs
- TensorFlow: TensorFlow is a fast, flexible, and scalable open source machine learning library
#3 Brian: How to Handle Missing Data with Python
- Jason Brownlee
- Real-world data often has missing values.
- Data can have missing values for a number of reasons such as observations that were not recorded and data corruption.
- Handling missing data is important as many machine learning algorithms do not support data with missing values.
#4 Michael: hug REST framework
- Drastically simplify API development over multiple interfaces
- With hug, design and develop your API once, then expose it however your clients need to consume it (locally, over HTTP, or through the command line)
- hug is the fastest and most modern way to create APIs on Python3
- hug has been built from the ground up with performance in mind.
- It is built to consume resources only when necessary
- compiled with Cython to achieve amazing performance
- Built in version management
- Automatic documentation
- Annotation powered validation
- Write once. Use everywhere (CLI, Python package, Web API)
#5 Brian CLI with Click
|