Androidで簡単な音声認識アプリ作成

MAGAZINE

ルーターマガジン

Androidで簡単な音声認識アプリ作成

2022.09.02

今回は音声認識で特定の単語に反応して表示結果を変化するAndroidアプリの最小実装していきたいと思います。
実装動機としてはiOSのSiriやAndroidのGoogleアシスタントのような音声認識によるデバイス操作はコマンドの粒度が荒く、限界を感じているからです。iOSでは既に"音声コントロール"と呼ばれる機能が実装されており、Siriによるアプリ単位の操作ではなく、スワイプやダブルタップなどの画面操作までの細かい操作を音声認識で操作することが可能になっています。しかし、Androidには公式からそのような細かい音声操作する機能は備わっておりません。
Android上でより細かいデバイス操作を実現するためには、まず、音声を認識できなければ話になりません。そこで、今回はAndroidのSpeech Recognizerライブラリを使用して、連続的に音声認識していきます。

sdkバージョン

compileSdk: 32
minSdk: 27
targetSdk: 32

実装順序

ライブラリなどの下準備
録音権限付与
speechRecognizerの呼びだし
認識した音声に対する処理
録音開始処理

1.ライブラリなどの下準備

必要なライブラリをインポート

import android.Manifest
import android.content.Intent
import android.content.pm.PackageManager
import android.speech.RecognitionListener
import android.speech.RecognizerIntent
import android.speech.SpeechRecognizer
import android.util.Log
import android.widget.*
import androidx.appcompat.app.AlertDialog
import androidx.core.content.ContextCompat
import kotlinx.coroutines.GlobalScope
import kotlinx.coroutines.delay
import kotlinx.coroutines.launch

kotlinxライブラリを使用するために以下のコードをbuild.gradle(Module)に追加

apply plugin: 'kotlin-android-extensions'

また、kotlinx.coroutinesライブラリを使用するためにbuild.gradle(Module)のdependencies内に以下を追加する

def coroutines_version = "1.3.2"
implementation "org.jetbrains.kotlinx:kotlinx-coroutines-core:$coroutines_version"
implementation "org.jetbrains.kotlinx:kotlinx-coroutines-android:$coroutines_version"

activity_main.xmlに音声認識を開始するためのボタンを作成

<Button
    android:id="@+id/button"
    android:layout_width="140dp"
    android:layout_height="86dp"
    android:text="スタート"
    tools:layout_editor_absoluteX="135dp"
    tools:layout_editor_absoluteY="591dp" />

必要な変数を定義し、アプリ起動時関数を作成する

class MainActivity : AppCompatActivity() {
    private val PERMISSION_AUDIO = 200
    private val TAG = "RecognitionListener"
    private var mSpeechRecognizer: SpeechRecognizer? = null

    private var speechRecognizerIntent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH)
    private var listItems = mutableListOf("")

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)
        var button = findViewById<Button>(R.id.button)
    }
}

2.録音権限付与

録音パーミッションをAndroidMnifest.xmlのmanifestタブ直下に以下のコードを加えます。

<uses-permission android:name="android.permission.RECORD_AUDIO"/>
<queries>
    <intent>
        <action android:name="android.speech.RecognitionService" />
    </intent>
</queries>

録音権限承認用メソッドを作成
onCreateメソッドの後ろに録音権限をユーザーに確認させるテキストを表示するメソッドを作成


    fun requestAudioPermission() {
        if (shouldShowRequestPermissionRationale(Manifest.permission.RECORD_AUDIO)) {
            AlertDialog.Builder(baseContext)
                .setMessage("Permission Here")
                .setPositiveButton(android.R.string.ok) { _, _ ->
                    requestPermissions(arrayOf(Manifest.permission.RECORD_AUDIO), PERMISSION_AUDIO)
                }
                .setNegativeButton(android.R.string.cancel) { _, _ ->
                    finish()
                }
                .create()
        } else {
            requestPermissions(arrayOf(Manifest.permission.RECORD_AUDIO), PERMISSION_AUDIO)
        }
    }

3.speechRecognizerの呼びだし

override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)
        var button = findViewById<Button>(R.id.button)
        
        //音声認識パート
        listItems.removeAt(0)
        val lang = "ja-JP"
        speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, lang)
        speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_PREFERENCE, lang);
        speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_ONLY_RETURN_LANGUAGE_PREFERENCE, lang);
        speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_RESULTS_PENDINGINTENT, lang);
        speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM)
        speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE, this.packageName)
        speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true)
        speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_MAX_RESULTS, 1)
        mSpeechRecognizer = SpeechRecognizer.createSpeechRecognizer(applicationContext)
        mSpeechRecognizer?.setRecognitionListener(object : RecognitionListener {
            override fun onReadyForSpeech(params: Bundle?) {
                Log.d(TAG, "onReadyForSpeech")
            }
            override fun onRmsChanged(rmsdB: Float) {
                Log.d(TAG, "onRmsChanged")
            }
            override fun onBufferReceived(buffer: ByteArray?) {
                Log.d(TAG, "onBufferReceived")
            }
            override fun onBeginningOfSpeech() {
                Log.d(TAG, "onBeginningOfSpeech")
            }
            override fun onEndOfSpeech() {
                Log.d(TAG, "onEndOfSpeech")
            }
            override fun onPartialResults(p0: Bundle?) {
            }
            override fun onError(error: Int) {
                var errorCode = ""
                when (error) {
                    SpeechRecognizer.ERROR_AUDIO -> errorCode = "Audio recording error"
                    SpeechRecognizer.ERROR_CLIENT -> errorCode = "Other client side errors"
                    SpeechRecognizer.ERROR_INSUFFICIENT_PERMISSIONS -> errorCode = "Insufficient permissions"
                    SpeechRecognizer.ERROR_NETWORK -> errorCode = "Network related errors"
                    SpeechRecognizer.ERROR_NETWORK_TIMEOUT -> errorCode = "Network operation timed out"
                    SpeechRecognizer.ERROR_NO_MATCH -> errorCode = "No recognition result matched"
                    SpeechRecognizer.ERROR_RECOGNIZER_BUSY -> errorCode = "RecognitionService busy"
                    SpeechRecognizer.ERROR_SERVER -> errorCode = "Server sends error status"
                    SpeechRecognizer.ERROR_SPEECH_TIMEOUT -> errorCode = "No speech input"
                }
                Log.d("RecognitionListener", "onError:" + errorCode)
                try {
                    GlobalScope.launch {
                        runOnUiThread {
                            mSpeechRecognizer?.cancel()
                        }
                        delay(1000)
                        runOnUiThread {
                            mSpeechRecognizer?.startListening(speechRecognizerIntent)
                        }
                    }
                } catch (ex: Exception) {
                }
            }
            override fun onEvent(eventType: Int, params: Bundle?) {
                Log.d(TAG, "onEvent")
            }
    }

4.認識した音声に対する処理

認識音声内に特定のワードが含まれていれば、"**ワード検知**"という文字列をログに出力し、特定のワード以外の場合は認識した音声文字列をログに出力しています。
音声認識時の処理や特定のワードに反応して行う処理は各自書き換えて応用してみてください。

            // 認識した文字列に対する処理
            override fun onResults(results: Bundle?) {
                if (results == null) {
                    this.onResultsResponse("")
                    return
                }
                val result = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION)
                val speechText = result?.get(0)?.replace("\\s".toRegex(), "")

                if (speechText.isNullOrEmpty()) {
                    this.onResultsResponse("")
                } else {
                    // 独自リスナーに認識した文字列を渡す
                    this.onResultsResponse(speechText)
                }

                listItems.add(result.toString())
                if (listItems.count() > 10){
                    listItems.removeAt(0)
                }
                var text = ""
                listItems.forEach {
                    text += it + "\n"
                }
            }

            //音声認識で認識した文字列に特定のワードが含まれている場合、動作する処理
            fun onResultsResponse(speechText: String) {
                val TargetWord = Regex("特定のワード")

                if (TargetWord.containsMatchIn(speechText)) {
                    // 特定のワードに反応して動作させたい処理
                    Log.d("TAG", "**ワード検知**")
                } else {
                    // 特定のワード以外を音声の場合、認識した音声の文字列をログに出力
                    Log.d("TAG", speechText)
                }
            }
        })
        
        button.setOnClickListener {
            startListening()
        }

4.録音開始処理

onCreateメソッドと並列に音声認識を開始するメソッドを作成
録音権限を承認していなければ、承認確認用メソッドを起動し、録音権限を付与されていれば、音声認識を開始します。

//SpeechRecognizerにインテントを渡して音声認識開始
    private fun startListening() {
        val audioPermission = ContextCompat.checkSelfPermission(this, Manifest.permission.RECORD_AUDIO)
        if (audioPermission != PackageManager.PERMISSION_GRANTED) {
            requestAudioPermission()
        } else {
            mSpeechRecognizer?.startListening(speechRecognizerIntent)
        }
    }

5.まとめ

今回はandroidで音声認識アプリの最小実装を紹介しました。認識した音声を文字列に変換し、特定のワードに対しては特別な処理をし、それ以外の場合は認識した音声文字列をログに出力しました。この実装を応用することで、androidでもiOSの"音声コントロール"のような音声によってデバイスの細かい操作が実現可能になります。
ちなみに、Androidは公式で細かい音声操作の機能を実装していないと言いましたが、googleがgooglePlay上で提供しているアプリ"Voice Access"でもiOSの"音声コントロール"と似たような音声による画面操作などの細かいデバイス操作を行うことができます。ただし、"Voice Access"はまだ日本語の音声認識に対応していないため、日本語を割と正確に認識できるSpeechRecognizerは日本語に特化するのであれば今後も使えるライブラリになっています。

参考サイト

Pocket

CONTACT

お問い合わせ・ご依頼はこちらから