Categories
code ffmpeg python video

Transcribing Videos with Google Cloud Speech-to-Text

Got an hour-long video and not really into manually creating subtitles? not plans to put it on YouTube for their automated transcription services? then – try Google Cloud Speech-to-Text! In this post I’ll share some scripts for automating the process and creating an .str file to go along your video for displaying the subtitles.

Categories
code machine learning python

Take a SWIG out of the Gesture Recognition Toolkit (GRT)

Reporting on a project I worked on for the last few weeks – porting the excellent Gesture Recognition Toolkit (GRT) to Python.
Right now it’s still a pull request: https://github.com/nickgillian/grt/pull/151.
Not exactly porting, rather I’ve simply added Python bindings to GRT that allow you to access the GRT C++ APIs from Python.
Did it using the wonderful SWIG project. Such a wondrous tool, SWIG is. Magical.
Here are the deets

Categories
code machine learning opencv programming python vision

Aligning faces with py opencv-dlib combo

This is my first trial at using Jupyter notebook to write a post, hope it makes sense.
I’ve recently taught a class on generative models: http://hi.cs.stonybrook.edu/teaching/cdt450
In class we’ve manipulated face images with neural networks.
One important thing I found that helped is to align the images so the facial features overlap.
It helps the nets learn the variance in faces better, rather than waste their “representation power” on the shift between faces.
The following is some code to align face images using the excellent Dlib (python bindings) http://dlib.net. First I’m just using a standard face detector, and then using the facial fatures extractor I’m using that information for a complete alignment of the face.
After the alignment – I’m just having fun with the aligned dataset 🙂

Categories
code linux machine learning python

Build your AWS Lambda Machine Learning Function with Docker

I’ve recently made a tutorial on using Docker for machine learning purposes, and I thought also to publish it in here: http://hi.cs.stonybrook.edu/teaching/docker4ml
It includes videos, slides and code, with hands-on demonstrations in class.
A GitHub repo holds the code: https://github.com/royshil/Docker4MLTutorial
I made several scripts to make it easy to upload python code that performs an ML inference (“prediction”) operation on AWS Lambda.
Enjoy!
Roy.

Categories
cmake code linux machine learning programming

Cross Compile TensorFlow C++ app for the Jetson TK1

Last time I’ve posted about cross compiling TF for the TK1. That however was a canned sample example from TF, based on the bazel build system.
Let’s say we want to make our own TF C++ app and just link vs. TF for inference on the TK1.
Now that’s a huge mess.
First we need to cross-compile TF with everything built in.
Then we need protobuf cross-compiled for the TK1.
Bundle everything together, cross(-compile) our fingers and pray.
The prayer doesn’t help. But let’s see what does…

Categories
3d Augmented Reality code graphics opencv python video vision

Projector-Camera Calibration – the "easy" way

First let me open by saying projector-camera calibration is NOT EASY. But it’s technically not complicated too.
It is however, an amalgamation of optimizations that accrue and accumulate error with each step, so that the end product is not far from a random guess.
So 3D reconstructions I was able to get from my calibrated pro-cam were just a distorted mess of points.
Nevertheless, here come the deets.

Categories
code graphics opencv python vision work

Revisiting graph-cut segmentation with SLIC and color histograms [w/Python]

As part of the computer vision class I’m teaching at SBU I asked students to implement a segmentation method based on SLIC superpixels. Here is my boilerplate implementation.
This follows the work I’ve done a very long time ago (2010) on the same subject.
For graph-cut I’ve used PyMaxflow: https://github.com/pmneila/PyMaxflow, which is very easily installed by just pip install PyMaxflow
The method is simple:

  • Calculate SLIC superpixels (the SKImage implementation)
  • Use markings to determine the foreground and background color histograms (from the superpixels under the markings)
  • Setup a graph with a straightforward energy model: Smoothness term = K-L-Div between superpix histogram and neighbor superpix histogram, and Match term = inf if marked as BG or FG, or K-L-Div between SuperPix histogram and FG and BG.
  • To find neighbors I’ve used Delaunay tessellation (from scipy.spatial), for simplicity. But a full neighbor finding could be implemented by looking at all the neighbors on the superpix’s boundary.
  • Color histograms are 2D over H-S (from the HSV)

Result

Categories
code graphics opencv vision

Laplacian Pyramid Blending with Masks in OpenCV-Python

lpb

A small example on how to do Laplacian pyramid blending with an arbitrary mask.
Enjoy
Roy

Categories
cmake code opencv vision work

OMG CMake/OpenCV3 can you be more difficult? Linking order problems with OpenNI2…

So I just spent 1.5 hours figuring this out.
Compiling an example on Ubuntu 16.04 with OpenCV built from scratch with OpenNI2 support.
(OpenNI2 is from Orbbec, but that doesn’t make any difference: https://orbbec3d.com/develop/)
When using this straightforward CMake script for compilation – it doesn’t work:

cmake_minimum_required(VERSION 3.2)
project(MyApp)
find_package(OpenCV 3 REQUIRED)
set(OPENNI2_LIBS "OpenNI2")
link_directories("/home/user/Downloads/2-Linux/OpenNI-Linux-x64-2.3/Redist")
add_executable(myapp main.cpp)
target_link_libraries(myapp ${OpenCV_LIBS} ${OPENNI2_LIBS})

Complains of undefined references:

/usr/bin/c++   -g   CMakeFiles/myapp.dir/main.cpp.o  -o myapp  -L/home/user/Downloads/2-Linux/OpenNI-Linux-x64-2.3/Redist -rdynamic -lOpenNI2 /usr/local/lib/libopencv_shape.so.3.2.0 /usr/local/lib/libopencv_stitching.so.3.2.0 /usr/local/lib/libopencv_superres.so.3.2.0 /usr/local/lib/libopencv_videostab.so.3.2.0 /usr/local/lib/libopencv_objdetect.so.3.2.0 /usr/local/lib/libopencv_calib3d.so.3.2.0 /usr/local/lib/libopencv_features2d.so.3.2.0 /usr/local/lib/libopencv_flann.so.3.2.0 /usr/local/lib/libopencv_highgui.so.3.2.0 /usr/local/lib/libopencv_ml.so.3.2.0 /usr/local/lib/libopencv_photo.so.3.2.0 /usr/local/lib/libopencv_video.so.3.2.0 /usr/local/lib/libopencv_videoio.so.3.2.0 /usr/local/lib/libopencv_imgcodecs.so.3.2.0 /usr/local/lib/libopencv_imgproc.so.3.2.0 /usr/local/lib/libopencv_core.so.3.2.0 -Wl,-rpath,/home/user/Downloads/2-Linux/OpenNI-Linux-x64-2.3/Redist:/usr/local/lib
/usr/local/lib/libopencv_videoio.so.3.2.0: undefined reference to `oniStreamGetProperty'
/usr/local/lib/libopencv_videoio.so.3.2.0: undefined reference to `oniRecorderDestroy'
/usr/local/lib/libopencv_videoio.so.3.2.0: undefined reference to `oniDeviceIsCommandSupported'
/usr/local/lib/libopencv_videoio.so.3.2.0: undefined reference to `oniDeviceSetProperty'

You’ll notice that -lOpenNI2 does indeed appear for correct linking.
The linker doesn’t complain that lib was not found – it just misses the references.
This lead me to understand it’s a linking order problem (after ~45 minutes of banging my head vs. the keyboard and swearing profusely).
Some more swearing and head banging got me to understand that CMake is messing around with the link order.
So even if try:

target_link_libraries(myapp ${OpenCV_LIBS} ${OPENNI2_LIBS} ${OpenCV_LIBS} ${OPENNI2_LIBS})

i.e. making the order effectively meaningless — it still doesn’t work!
More swearing and head banging, another ~40 minutes passed, and I figured out a solution.
The real solution is to slap someone in CMake in the face with a trout, but here’s a solution to my problem:

find_package(OpenCV 3 REQUIRED core highgui videoio) # ORDER MATTERS!!! videoio must be last!
set(OpenCV_LIBS "${OpenCV_LIBS};OpenNI2") #add openni2 at the end (although cmake doesn't keep order anyway)
target_link_libraries(myapp ${OpenCV_LIBS})

Now it compiles.
And look at the make VERBOSE=1:

/usr/bin/c++   -g   CMakeFiles/myapp.dir/main.cpp.o  -o myapp  -L/home/user/Downloads/2-Linux/OpenNI-Linux-x64-2.3/Redist -rdynamic /usr/local/lib/libopencv_highgui.so.3.2.0 /usr/local/lib/libopencv_videoio.so.3.2.0 -lOpenNI2 /usr/local/lib/libopencv_core.so.3.2.0 -Wl,-rpath,/home/user/Downloads/2-Linux/OpenNI-Linux-x64-2.3/Redist:/usr/local/lib -Wl,-rpath-link,/usr/local/lib

Can you see how highgui and videoio are before OpenNI2, and core is after?
Why? Whhhhhhy?
The key is to get OpenNI to be linked in order after videoio.
OMG CMake, OMG OpenCV, OMG you gaiz, W-T-F?
Update:
This method breaks down as soon as more OpenCV components are added. The order goes haywire again, and OpenNI2 comes before videoio, which breaks the link.
As of now the way I can compile it is like so:

set(LINK_LIBS /usr/local/lib/libopencv_core.so.3.2
              /usr/local/lib/libopencv_highgui.so.3.2
              /usr/local/lib/libopencv_videoio.so.3.2
              /usr/local/lib/libopencv_imgproc.so.3.2
              /usr/local/lib/libopencv_calib3d.so.3.2
              OpenNI2)
Categories
code opencv vision

New edition to the Mastering OpenCV book – now with OpenCV3!

Mastering OpenCV 3
I’m happy to announce that the new edition of Mastering OpenCV is out!
You can get it on Amazon: Mastering OpenCV 3
It brings up most of the older OpenCV2 book projects to OpenCV3, including my Toy-SfM (or “Exploring SfM”) project.
A lot has happened in the OpenCV3 APIs with respect to Structure from Motion.
It got much easier!
The book chapter on SfM is a gentle introduction to the subject, that focuses on coding and the core concepts, while abstracting on the math.
Thanks for listening!
Roy