Extract face images from videos with ffmpeg and OpenVINO
Tried before OpenVINO’s [interactive_face_detection_demo](https://docs.openvinotoolkit.org/2020.4/omz_demos_interactive_face_detection_demo_READme. In cooperation, only the face image is extracted from the video.
1. Preparation of video
It is assumed that the file name is ʻinput.mp4` and it is located in the download folder of Windows 10.
2. Modify video to 1 fps
interactive_face_detection_demo is inference for all frames, so it is convenient for shortening the processing time and when extracting with ffmpeg described later. Convert input.mp4` to 1 frame per second.
cd /cygdrive/c/Users/${USER}/Downloads/
ls -l input.mp4
mkdir -p output/
ffmpeg -i input.mp4 -r 1 output/input_r1.mp4 -y
3. Face detection with OpenVINO demo
Almost the same as the one I tried before, but I’m getting the raw output with the -r
option.
echo 'source ${INTEL_OPENVINO_DIR}/bin/setupvars.sh
cd ${INTEL_CVSDK_DIR}/inference_engine/demos/
sed -i "s/*)/interactive_face_detection_demo)/g" CMakeLists.txt
./build_demos.sh
${INTEL_CVSDK_DIR}/deployment_tools/tools/model_downloader/downloader.py \
--name face-detection-adas-0001 \
--output_dir /content/model/ \
--precisions FP32
echo `date`: start detection
/root/omz_demos_build/intel64/Release/interactive_face_detection_demo \
-i /Downloads/output/input_r1.mp4 \
-m /content/model/intel/face-detection-adas-0001/FP32/face-detection-adas-0001.xml \
-no_show \
-no_wait \
-async \
-r > /Downloads/output/raw.txt
echo `date`: end detection' | docker run -v /c/Users/${USER}/Downloads:/Downloads -u root -i --rm openvino/ubuntu18_dev:2020.4
4. Extract from raw.txt with ffmpeg
raw.txt
~~
[116,1] element, prob = 0.0198675 (-4,209)-(48,48)
[117,1] element, prob = 0.0198515 (444,146)-(68,68)
[0,1] element, prob = 0.999333 (222,115)-(205,205) WILL BE RENDERED!
[1,1] element, prob = 0.0601832 (405,393)-(94,94)
~~
As mentioned above, in raw.txt, candidates are output in order of proximity to the face for each frame, and face-like candidates (evaluation value of 0.5 or more) are marked with WILL BE RENDERED!
.
THRESHOLD=0.9
perl -ne '$i++ if m{^\[0,1\]}; printf "ffmpeg -loglevel error -ss ".($i-1)." -i input_r1.mp4 -vframes 1 -vf crop=$4:$5:$2:$3 %05d.jpg -y\n", ++$j if m{([0-9.]+)\s+\((\d+),(\d+)\)-\((\d+),(\d+)\)} and $1 > '${THRESHOLD} raw.txt > ffmpeg.sh
Since the number of [0,1]
is the number of frames, pass it to the -ss
option later (since it is a 1-frame video per second, you can pass the number of frames to the -ss
option that passes the number of seconds) .. Only those with a high facial evaluation value (WILL BE RENDERED!
Is 0.5 or more, but since it is quite a monkey, the above is high), from the coordinates crop .html # crop) Get the filter parameters and print the ffmpeg command (it always seems to be square).
ffmpeg.sh
ffmpeg -loglevel error -ss 32 -i input_r1.mp4 -vframes 1 -vf crop=36:36:109:178 00001.jpg -y
ffmpeg -loglevel error -ss 36 -i input_r1.mp4 -vframes 1 -vf crop=34:34:107:177 00002.jpg -y
ffmpeg -loglevel error -ss 37 -i input_r1.mp4 -vframes 1 -vf crop=32:32:108:178 00003.jpg -y
ffmpeg -loglevel error -ss 39 -i input_r1.mp4 -vframes 1 -vf crop=32:32:109:179 00004.jpg -y
ffmpeg -loglevel error -ss 40 -i input_r1.mp4 -vframes 1 -vf crop=37:37:97:178 00005.jpg -y
ffmpeg -loglevel error -ss 41 -i input_r1.mp4 -vframes 1 -vf crop=34:34:46:176 00006.jpg -y
ffmpeg -loglevel error -ss 44 -i input_r1.mp4 -vframes 1 -vf crop=64:64:552:236 00007.jpg -y
Since a file containing the above commands is created,
sh ffmpeg.sh
By executing it in the shell, each face image is output.
5. Resize as needed
mogrify -resize 128x128! *.jpg
It seems convenient to have the same size with ImageMagick mogrify.