Extract face images from videos with ffmpeg and OpenVINO

2 minute read

Tried before OpenVINO’s [interactive_face_detection_demo](https://docs.openvinotoolkit.org/2020.4/omz_demos_interactive_face_detection_demo_READme. In cooperation, only the face image is extracted from the video.
スクリーンショット 2020-10-03 024809.png

1. Preparation of video

It is assumed that the file name is ʻinput.mp4` and it is located in the download folder of Windows 10.

2. Modify video to 1 fps

interactive_face_detection_demo is inference for all frames, so it is convenient for shortening the processing time and when extracting with ffmpeg described later. Convert input.mp4` to 1 frame per second.

cd /cygdrive/c/Users/${USER}/Downloads/
ls -l input.mp4

mkdir -p output/
ffmpeg -i input.mp4 -r 1 output/input_r1.mp4 -y

3. Face detection with OpenVINO demo

Almost the same as the one I tried before, but I’m getting the raw output with the -r option.

echo 'source ${INTEL_OPENVINO_DIR}/bin/setupvars.sh

cd ${INTEL_CVSDK_DIR}/inference_engine/demos/
sed -i "s/*)/interactive_face_detection_demo)/g" CMakeLists.txt
./build_demos.sh

${INTEL_CVSDK_DIR}/deployment_tools/tools/model_downloader/downloader.py \
  --name face-detection-adas-0001 \
  --output_dir /content/model/ \
  --precisions FP32

echo `date`: start detection

/root/omz_demos_build/intel64/Release/interactive_face_detection_demo \
  -i /Downloads/output/input_r1.mp4 \
  -m /content/model/intel/face-detection-adas-0001/FP32/face-detection-adas-0001.xml \
  -no_show \
  -no_wait \
  -async \
  -r > /Downloads/output/raw.txt

echo `date`: end detection' | docker run -v /c/Users/${USER}/Downloads:/Downloads -u root -i --rm openvino/ubuntu18_dev:2020.4

4. Extract from raw.txt with ffmpeg

raw.txt


~~
[116,1] element, prob = 0.0198675    (-4,209)-(48,48)
[117,1] element, prob = 0.0198515    (444,146)-(68,68)
[0,1] element, prob = 0.999333    (222,115)-(205,205) WILL BE RENDERED!
[1,1] element, prob = 0.0601832    (405,393)-(94,94)
~~

As mentioned above, in raw.txt, candidates are output in order of proximity to the face for each frame, and face-like candidates (evaluation value of 0.5 or more) are marked with WILL BE RENDERED!.

THRESHOLD=0.9
perl -ne '$i++ if m{^\[0,1\]}; printf "ffmpeg -loglevel error -ss ".($i-1)." -i input_r1.mp4 -vframes 1 -vf crop=$4:$5:$2:$3 %05d.jpg -y\n", ++$j if m{([0-9.]+)\s+\((\d+),(\d+)\)-\((\d+),(\d+)\)} and $1 > '${THRESHOLD} raw.txt > ffmpeg.sh

Since the number of [0,1] is the number of frames, pass it to the -ss option later (since it is a 1-frame video per second, you can pass the number of frames to the -ss option that passes the number of seconds) .. Only those with a high facial evaluation value (WILL BE RENDERED! Is 0.5 or more, but since it is quite a monkey, the above is high), from the coordinates crop .html # crop) Get the filter parameters and print the ffmpeg command (it always seems to be square).

ffmpeg.sh


ffmpeg -loglevel error -ss 32 -i input_r1.mp4 -vframes 1 -vf crop=36:36:109:178 00001.jpg -y
ffmpeg -loglevel error -ss 36 -i input_r1.mp4 -vframes 1 -vf crop=34:34:107:177 00002.jpg -y
ffmpeg -loglevel error -ss 37 -i input_r1.mp4 -vframes 1 -vf crop=32:32:108:178 00003.jpg -y
ffmpeg -loglevel error -ss 39 -i input_r1.mp4 -vframes 1 -vf crop=32:32:109:179 00004.jpg -y
ffmpeg -loglevel error -ss 40 -i input_r1.mp4 -vframes 1 -vf crop=37:37:97:178 00005.jpg -y
ffmpeg -loglevel error -ss 41 -i input_r1.mp4 -vframes 1 -vf crop=34:34:46:176 00006.jpg -y
ffmpeg -loglevel error -ss 44 -i input_r1.mp4 -vframes 1 -vf crop=64:64:552:236 00007.jpg -y

Since a file containing the above commands is created,

sh ffmpeg.sh

By executing it in the shell, each face image is output.

5. Resize as needed

mogrify -resize 128x128! *.jpg

It seems convenient to have the same size with ImageMagick mogrify.