Google Vision API を使った類似画像判定

MAGAZINE

ルーターマガジン

Google Vision API を使った類似画像判定

2024.04.30

エンジニアの kanazawa です。久々の機械学習ネタということで、類似画像の判定を試してみました。類似画像の判定には、古典的な画像分析で頑張る方法もあるかと思いますが、今回はエイヤで Google Vision API に投げて、その結果の類似度で類似画像判定できないかというアイデアを試してみました。

画像の特徴量の作り方

Google Vision API の LABEL DETECTION の機能を利用して、画像に対してラベルの付与を行います。

サンプル画像

Google Vision API の結果

{
"responses": [
    {
        "labelAnnotations": [
            {
                "mid": "/m/0c9ph5",
                "score": 0.97519124,
                "topicality": 0.97519124,
                "description": "Flower"
            },
            {
                "mid": "/m/05s2s",
                "score": 0.9259831,
                "topicality": 0.9259831,
                "description": "Plant"
            },
            {
                "mid": "/m/07j7r",
                "score": 0.88428324,
                "topicality": 0.88428324,
                "description": "Tree"
            },
            {
                "mid": "/m/016nqt",
                "score": 0.88323385,
                "topicality": 0.88323385,
                "description": "Twig"
            },
            {
                "mid": "/m/0b5gs",
                "score": 0.8688874,
                "topicality": 0.8688874,
                "description": "Branch"
            },
            {
                "mid": "/m/016q19",
                "score": 0.84059125,
                "topicality": 0.84059125,
                "description": "Petal"
            },
            {
                "mid": "/m/01fklc",
                "score": 0.82767916,
                "topicality": 0.82767916,
                "description": "Pink"
            },
            {
                "mid": "/m/02tcwp",
                "score": 0.7772147,
                "topicality": 0.7772147,
                "description": "Trunk"
            },
            {
                "mid": "/m/02q_bfg",
                "score": 0.760938,
                "topicality": 0.760938,
                "description": "Tints and shades"
            },
            {
                "mid": "/m/04sjm",
                "score": 0.74587005,
                "topicality": 0.74587005,
                "description": "Flowering plant"
            },
            {
                "mid": "/m/0j7ty",
                "score": 0.7320647,
                "topicality": 0.7320647,
                "description": "Blossom"
            },
            {
                "mid": "/m/083vt",
                "score": 0.7303314,
                "topicality": 0.7303314,
                "description": "Wood"
            },
            {
                "mid": "/m/018ssc",
                "score": 0.718404,
                "topicality": 0.718404,
                "description": "Groundcover"
            },
            {
                "mid": "/m/0hlzt",
                "score": 0.70114774,
                "topicality": 0.70114774,
                "description": "Deciduous"
            },
            {
                "mid": "/m/08t9c_",
                "score": 0.6235262,
                "topicality": 0.6235262,
                "description": "Grass"
            },
            {
                "mid": "/m/01ttd6",
                "score": 0.5967256,
                "topicality": 0.5967256,
                "description": "Plant stem"
            },
            {
                "mid": "/m/0gqbt",
                "score": 0.5924945,
                "topicality": 0.5924945,
                "description": "Shrub"
            },
            {
                "mid": "/m/03d28y3",
                "score": 0.5875379,
                "topicality": 0.5875379,
                "description": "Natural landscape"
            },
            {
                "mid": "/m/01qd72",
                "score": 0.57774043,
                "topicality": 0.57774043,
                "description": "Malus"
            },
            {
                "mid": "/m/0ckc5",
                "score": 0.55246717,
                "topicality": 0.55246717,
                "description": "Magenta"
            },
            {
                "mid": "/m/0hs32",
                "score": 0.54021865,
                "topicality": 0.54021865,
                "description": "Prunus"
            },
            {
                "mid": "/m/01wh_8",
                "score": 0.5165784,
                "topicality": 0.5165784,
                "description": "Cherry blossom"
            }
        ]
    }
]
}

ラベル今回は、特異度 (topicallity) は無視して、そのままラベルを半角空白で結合します。

Flower Plant Tree Twig Branch Petal Pink Trunk Tints and shades Flowering plant Blossom Wood Groundcover Deciduous Grass Plant stem Shrub Natural landscape Malus Magenta Prunus Cherry blossom

ちゃんと花の木であることが、エンコードされていますね!!

サンプル画像

商用利用OKということで、PAKUTASO から取ってきました。以下の10カテゴリに散らして、それぞれ 3枚ずつ、合計 30枚をテスト画像に使用しました。

さくら
男性
山
川
工場
満月
犬
鳥
自動車
お金

DB テーブルスキーマ

標準でいい感じの全文検索がかけるので、PostgreSQL でテーブルを作っています。

Google Vison API の検索を行なうようのテーブル

CREATE TABLE public.image_label_annotation_results (
    id bigint NOT NULL,
    status smallint DEFAULT 0,
    server character varying(50) DEFAULT NULL::character varying,
    image_hash text NOT NULL,
    image_s3_url text NOT NULL,
    api_result jsonb,
    created_at timestamp(0) without time zone DEFAULT CURRENT_TIMESTAMP(0) NOT NULL,
    updated_at timestamp(0) without time zone DEFAULT CURRENT_TIMESTAMP(0) NOT NULL
);

API の戻り値 (JSON) をパースして、ラベル化したもの

CREATE TABLE public.image_annotation_labels (
    id bigint NOT NULL,
    image_hash text NOT NULL,
    label character varying(1000) DEFAULT NULL::character varying,
    label_search tsvector,
    created_at timestamp(0) without time zone DEFAULT CURRENT_TIMESTAMP(0) NOT NULL,
    updated_at timestamp(0) without time zone DEFAULT CURRENT_TIMESTAMP(0) NOT NULL
);

スクリプト

以下の SQL クエリで、API に投げたい画像をDBに登録しておく。

INSERT INTO image_label_annotation_results (image_hash, image_s3_url) VALUES
('test_001', 'https://example.com/test_001.jpg'),
('test_002', 'https://example.com/test_002.jpg'),
('test_003', 'https://example.com/test_003.jpg'),
('test_004', 'https://example.com/test_004.jpg');

※ image_hash は画像を一意に特定するためのキー

次に、次のスクリプトで、Google Vision API を叩く。

require 'net/http'
require 'open-uri'
require 'json'
require 'active_record'
require 'pg'

ActiveRecord::Base.establish_connection(
    adapter: 'postgresql',
    host: 127.0.0.1,
    username: 'postgres',
    password: 'postgres!',
    database: 'image_annotation',
    schema_search_path: 'public',
    reconnect: true
)

ActiveRecord.default_timezone = :local
Time.zone_default = Time.find_zone! 'Tokyo'

class ImageLabelAnnotationResult < ActiveRecord::Base
    has_one :image_annotation_label, primary_key: :image_hash, foreign_key: :image_hash
end

def request_label_api(image_url:)
    ## gcloud CLI から アクセストークンを生成
    access_token = `gcloud auth application-default print-access-token`
    uri = URI('https://vision.googleapis.com/v1/images:annotate')
    body = {
        "requests": [
            {
                "features": [
                    {
                        "maxResults": 50,
                        "type": 'LABEL_DETECTION'
                    }
                ],
                "image": { "source": { "imageUri": "#{image_url}" } }
            }
        ]
    }.to_json(indent: false)

    headers = {
        'Authorization': "Bearer #{access_token}",
        'Content-Type': 'application/json'
    }

    http = Net::HTTP.new(uri.host, uri.port)
    http.use_ssl = true
    request = Net::HTTP::Post.new(uri.request_uri, headers)
    request.body = body

    response = http.request(request)
        JSON.parse(response.body)
    end

    def create_vision_api_results(request_record)
        request_record.update(status: 1)
        json = request_label_api(image_url: request_record.image_s3_url)
        request_record.update(status: 2, server: SERVER, api_result: json)
    end

if __FILE__ == $0
    loop do
        request_id = ImageLabelAnnotationResult.where(status: 0).limit(50).pluck(:id).sample
        return unless request_id

        request_record = ImageLabelAnnotationResult.find(request_id)
        create_vision_api_results(request_record)
    end
end

最後に、次のスクリプトで API結果のJSONをパースして、全文検索用のテーブルを作成する。

require 'json'
require 'active_record'
require 'pg'

ActiveRecord::Base.establish_connection(
    adapter: 'postgresql',
    host: 127.0.0.1,
    username: 'postgres',
    password: 'postgres!',
    database: 'image_annotation',
    schema_search_path: 'public',
    reconnect: true
)

ActiveRecord.default_timezone = :local
Time.zone_default = Time.find_zone! 'Tokyo'

class ImageLabelAnnotationResult < ActiveRecord::Base
    has_one :image_annotation_label, primary_key: :image_hash, foreign_key: :image_hash
end

class ImageAnnotationLabel < ActiveRecord::Base
    has_one :image_label_annotation_result, primary_key: :image_hash, foreign_key: :image_hash
end

def parse_vision_api_result(result_record)
    annotation = result_record.api_result.dig('responses', 0, 'labelAnnotations')
    return unless annotation

    label_str = annotation.map do |label_hash|
        label_hash['description']
    end.join(' ')
    rec = ImageAnnotationLabel.create({ image_hash: result_record.image_hash, label: label_str })
    ImageAnnotationLabel.where(id: rec.id).update_all("label_search = to_tsvector('english', label)")
end

if __FILE__ == $0
    ImageLabelAnnotationResult.where(status: 2).order(:id).pluck(:id).each do |id|
        result_record = ImageLabelAnnotationResult.find(id)
        parse_vision_api_result(result_record)
    end
end

実験

以下のクエリで、ラベル同士を比較して、その結果の上位3位を類似画像とします。全文検索をするために、検索元画像のラベルは一旦 OR (|) で繋ぎます。

例1 <さくら>

検索元画像

検索元画像のラベル

Flower Plant Tree Twig Branch Petal Pink Trunk Tints and shades Flowering plant Blossom Wood Groundcover Deciduous Grass Plant stem Shrub Natural landscape Malus Magenta Prunus Cherry blossom

クエリ

select
ts_rank_cd(label_search, to_tsquery('Flower|Plant|Tree|Twig|Branch|Petal|Pink|Trunk|Tints|and|shades|Flowering|plant|Blossom|Wood|Groundcover|Deciduous|Grass|Plant|stem|Shrub|Natural|landscape|Malus|Magenta|Prunus|Cherry|blossom')) as rank,
ilar.image_s3_url 
from
image_annotation_labels ial
join image_label_annotation_results ilar
on
ial.image_hash = ilar.image_hash
order by
rank desc
limit 3;

結果

類似1位

Flower Plant Tree Twig Branch Petal Pink Trunk Tints and shades Flowering plant Blossom Wood Groundcover Deciduous Grass Plant stem Shrub Natural landscape Malus Magenta Prunus Cherry blossom

類似2位

Plant Flower Botany Tree Branch Twig Natural landscape Biome Woody plant Grass Groundcover Tints and shades Shrub Blossom Flowering plant Landscape Garden Cherry blossom Trunk Landscaping Plant stem Road surface Petal City Prunus Road House Sidewalk

類似3位

Water Plant Water resources Fluvial landforms of streams Natural landscape Natural environment Branch Spring Waterfall Vegetation Watercourse Riparian zone Terrestrial plant Chute Landscape Groundcover Formation Grass Mountain river Tree Forest Non-vascular land plant Water feature Creek Stream Arroyo Temperate broadleaf and mixed forest Rapid Woodland Jungle Wildlife Mineral spring Vascular plant Rock Riparian forest Old-growth forest Tropical and subtropical coniferous forests Ravine Valdivian temperate rain forest Northern hardwood forest Stream bed Rainforest Shrub Moss Garden Algae Tributary Antarctic flora River Tropics

例2 <自動車>

検索元画像

検索元画像のラベル

Wheel Car Tire Land vehicle Vehicle Vehicle registration plate Building Automotive lighting Motor vehicle Window Infrastructure Automotive design Mode of transport Automotive exterior Plant Classic car Road City Automotive wheel system Classic Art Street Antique car Family car Parking Luxury vehicle Rolling Mid-size car Town Pedestrian Compact car Vintage car Grille Asphalt Sidewalk Personal luxury car Hardtop Coupé City car Van Sedan Facade Transport Cobblestone Bumper House Sky Traffic

クエリ

select
ts_rank_cd(label_search, to_tsquery('Wheel|Car|Tire|Land|vehicle|Vehicle|Vehicle|registration|plate|Building|Automotive|lighting|Motor|vehicle|Window|Infrastructure|Automotive|design|Mode|of|transport|Automotive|exterior|Plant|Classic|car|Road|City|Automotive|wheel|system|Classic|Art|Street|Antique|car|Family|car|Parking|Luxury|vehicle|Rolling|Mid-size|car|Town|Pedestrian|Compact|car|Vintage|car|Grille|Asphalt|Sidewalk|Personal|luxury|car|Hardtop|Coupé|City|car|Van|Sedan|Facade|Transport|Cobblestone|Bumper|House|Sky|Traffic')) as rank,
ial."label", 
ilar.image_s3_url 
from
image_annotation_labels ial
join image_label_annotation_results ilar
on
ial.image_hash = ilar.image_hash
order by
rank desc
limit 3;

結果

類似1位

Wheel Car Tire Land vehicle Vehicle Vehicle registration plate Building Automotive lighting Motor vehicle Window Infrastructure Automotive design Mode of transport Automotive exterior Plant Classic car Road City Automotive wheel system Classic Art Street Antique car Family car Parking Luxury vehicle Rolling Mid-size car Town Pedestrian Compact car Vintage car Grille Asphalt Sidewalk Personal luxury car Hardtop Coupé City car Van Sedan Facade Transport Cobblestone Bumper House Sky Traffic

類似2位

Car Tire Wheel Land vehicle Vehicle Vehicle registration plate Automotive lighting Hood Automotive design Motor vehicle Automotive tire Exhaust system Chevrolet corvette c6 zr1 Automotive exhaust Alloy wheel Automotive exterior Rolling Fender Personal luxury car Bumper Automotive wheel system Hardtop Spoiler Rim Asphalt Road Grand tourer Kit car Performance car Building Lotus Muffler Ferrari f430 Sky Sports car Automotive parking light Luxury vehicle Auto part Muscle car Vehicle door Automotive tail & brake light Electric blue Race car Coupé Chevrolet corvette Tree Automotive mirror Supercar Tesla Chevrolet

類似3位

Bus Vehicle Tire Wheel Window Motor vehicle Building Tree Mode of transport Vehicle registration plate Automotive lighting Road surface Thoroughfare Gas Electricity Metropolitan area City Public transport Road Metropolis Winter Lane Electric blue Pedestrian Street Commercial vehicle Automotive exterior Tour bus service Sidewalk Downtown Public utility Car Automotive wheel system Mixed-use Asphalt Traffic Cobblestone Transport Electrical supply

まとめ

Gogole Vison API のラベル検出機能を使って、画像をラベルに変換 & ラベルに対する全文検索をすることで、類似画像検索を実装してみました。
自動車などの物理的な物体があると、ラベル変換の精度が高いのでこの方法でもいい感じに類似の画像が引っ張れそう。
物体がない場合は、ラベル検知も意味がないキーワードを返すので辛い
全文検索のロジック周りはもっと工夫しがいがある。

ちなみに、APIのお値段は 1,000枚で $1.50 です。

Pocket

CONTACT

お問い合わせ・ご依頼はこちらから