Finding relevant images in a large set of files

I recently had to recover data (images to be precise) from a 1 TB large drive. I used photorec to find the data on the drive. It resulted in lots and lots of images within lots and lots files. How do you approach finding relevant images within that amount of data? For initial filtering I focused on a single criteria: image dimension. For my case I used 1024 x 768 as the minimal resoltuion (which is 786432 when multiplied). I wrote a little script to find all images in a given directory, check their size, and symlink them to a folder sorting them by YEAR/MONTH/DAY. You can invoke them by calling the following:

./findimages.sh /some/directory

It will produce /some/directory/found with according subfolder for the date of the image. Note that you need the exiv2 binary to properly use this script.

#!/bin/sh

export LC_ALL=C
DIR='${1}'
TARGET_DIR='${DIR}/found/'

function symlink() {
    IMAGE="${1}"
    EXIV="${2}"
    IMAGE_DATE=$(echo "${EXIV}" | grep "Image timestamp" | sed -e "s/Image timestamp\s*:\s*//")
    DATE_FOLDER=$(echo ${IMAGE_DATE} | sed -e "s:\([0-9:]\+\) [0-9:]\+:\1:" | sed -e "s|:|/|g")
    mkdir -p "${TARGET_DIR}${DATE_FOLDER}"
    ln "${IMAGE}" "${TARGET_DIR}${DATE_FOLDER}"
    sync
}

COUNT=$(find ${DIR} -name '*.jpg' 2> /dev/null | wc -l)
CURRENT=1

for IMAGE in $(find ${DIR} -name '*.jpg' 2> /dev/null); do
    echo "${CURRENT}/${COUNT}"
    CURRENT=$((${CURRENT} + 1))
    EXIV="$(exiv2 ${IMAGE} 2> /dev/null)"
    IMAGE_SIZE=$(echo "${EXIV}" | grep "Image size" | sed -e "s/Image size\s*:\s*//")
    if [[ ${IMAGE_SIZE} =~ ^[0-9]+( )x( )[0-9]+.*$ ]]; then
        X=$(echo ${IMAGE_SIZE} | sed -e "s:\([0-9]\+\) x [0-9]\+:\1:")
        Y=$(echo ${IMAGE_SIZE} | sed -e "s:[0-9]\+ x \([0-9]\+\):\1:")
        if [ "$((${X} * ${Y}))" -gt "786432" ]; then
            symlink "${IMAGE}" "${EXIV}"
        fi
    else
        echo "${IMAGE}: ${IMAGE_SIZE} (${EXIV})"
        symlink "${IMAGE}" "${EXIV}"
    fi
done

If you have suggestions on how to improve the script or smarter alternatives, please let me know.

Copyright © christophbrill.de, 2002-2018.