티스토리 뷰

프로그래밍/Android

Speech Input

앙망 2011. 7. 15. 10:16
Articles > Speech Input

간단한 메모 입력을 할 때 안드로이드의 speech input 기능을 써봐야 겠다는 생각이 들었다. 시도해 보자~ 고고고

Android 2.1 은 voice-enabled keyboard 를 도입했다. 이것은 연결하는 것을 매우 쉽게 만들어준다. 이제 메시지를 타이핑 하는 대신 구술하여 쓸수 있다. 그냥 새로운 마이크로폰 버튼을 클릭하고 평상시 타이핑하는 것처럼 어떤 문장이든 말하면 된다.

Speech input adds another dimension to staying in touch. Google's Voice Search application, which is pre-installed on many Android devices and available in Android Market, provides powerful features like "search by voice" and Voice Actions like "Navigate to." Further enhancing the voice experience, Android 2.1 introduces a voice-enabled keyboard, which makes it even easier to stay connected. Now you can dictate your message instead of typing it. Just tap the new microphone button on the keyboard, and you can speak in just about any context in which you would normally type.

우리는 말하기기능이 모바일 경험을 근본적으로 바꿀것이라고 믿는다. 그래서 모든 안드로이드 어플리케이션 개발자들이 Android SDK를 통해 speech input 기능을 통합해주길 기대한다. 마켓에서 speech input 을 도입한 가장 인기있는 앱중이 하나는 Handcent SMS 이다. 이 앱은 어떤 SMS에 대한 답장이라도 빠르게 SMS 팝업 윈도우를 통해 구술할 수 있다.

We believe speech can fundamentally change the mobile experience. We would like to invite every Android application developer to consider integrating speech input capabilities via the Android SDK. One of our favorite apps in the Market that integrates speech input is Handcent SMS, because you can dictate a reply to any SMS with a quick tap on the SMS popup window. Here is Speech input integrated into Handcent SMS:

Handcent SMS : http://www.handcent.com/ScreenShot.php

사용법

안드로이드 SDK는 쉽게 당신의 어플리케이션에 speech input를 직접 추가할 수 있다. 아래 소스를 카피해서 샘플 어플리케이션에서 시작해라. 샘플 애플리케이션은 처음에 타겟 디바이스가 speech input을 사용할 수 있는지 확인한다.

The Android SDK makes it easy to integrate speech input directly into your own application. Just copy and paste from this sample application to get started. The sample application first verifies that the target device is able to recognize speech input:

// Check to see if a recognition activity is present
PackageManager pm = getPackageManager();
List activities = pm.queryIntentActivities(
  new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH), 0);
if (activities.size() != 0) {
  speakButton.setOnClickListener(this);
} else {
  speakButton.setEnabled(false);
  speakButton.setText("Recognizer not present");
}

 <sdk>/samples/android-<version>/.

/* 
 * Copyright (C) 2008 The Android Open Source Project
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *      http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package com.example.android.apis.app;

import com.example.android.apis.R;

import android.app.Activity;
import android.content.Intent;
import android.content.pm.PackageManager;
import android.content.pm.ResolveInfo;
import android.os.Bundle;
import android.speech.RecognizerIntent;
import android.view.View;
import android.view.View.OnClickListener;
import android.widget.ArrayAdapter;
import android.widget.Button;
import android.widget.ListView;

import java.util.ArrayList;
import java.util.List;

/**
 * Sample code that invokes the speech recognition intent API.
 */
public class VoiceRecognition extends Activity implements OnClickListener {

    private static final int VOICE_RECOGNITION_REQUEST_CODE = 1234;

    private ListView mList;

    /**
     * Called with the activity is first created.
     */
    @Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);

        // Inflate our UI from its XML layout description.
        setContentView(R.layout.voice_recognition);

        // Get display items for later interaction
        Button speakButton = (Button) findViewById(R.id.btn_speak);

        mList = (ListView) findViewById(R.id.list);

        // Check to see if a recognition activity is present
        PackageManager pm = getPackageManager();
        List activities = pm.queryIntentActivities(
                new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH), 0);
        if (activities.size() != 0) {
            speakButton.setOnClickListener(this);
        } else {
            speakButton.setEnabled(false);
            speakButton.setText("Recognizer not present");
        }
    }

    /**
     * Handle the click on the start recognition button.
     */
    public void onClick(View v) {
        if (v.getId() == R.id.btn_speak) {
            startVoiceRecognitionActivity();
        }
    }

    /**
     * Fire an intent to start the speech recognition activity.
     */
    private void startVoiceRecognitionActivity() {
        Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
        intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
                RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
        intent.putExtra(RecognizerIntent.EXTRA_PROMPT, "Speech recognition demo");
        startActivityForResult(intent, VOICE_RECOGNITION_REQUEST_CODE);
    }

    /**
     * Handle the results from the recognition activity.
     */
    @Override
    protected void onActivityResult(int requestCode, int resultCode, Intent data) {
        if (requestCode == VOICE_RECOGNITION_REQUEST_CODE && resultCode == RESULT_OK) {
            // Fill the list view with the strings the recognizer thought it could have heard
            ArrayList matches = data.getStringArrayListExtra(
                    RecognizerIntent.EXTRA_RESULTS);
            mList.setAdapter(new ArrayAdapter(this, android.R.layout.simple_list_item_1,
                    matches));
        }

        super.onActivityResult(requestCode, resultCode, data);
    }
}

그다음 샘플 애플리케이션은 startActivityForResult()를 사용해서 음성 인식을 요구하는 인텐트를 브로드캐스트한다. 여기에는 추가 파마메터가 포함되어 두개의 언어 모델 중 하나를 명시한다. 음성인식 애플리케이션은 음성 입력을 처리하는 인텐트를 처리하고 인식된 문자열을 onActivityResult() 콜백함수를 호출하여 다시 되돌려준다.

The sample application then uses startActivityForResult() to broadcast an intent that requests voice recognition, including an extra parameter that specifies one of two language models. The voice recognition application that handles the intent processes the voice input, then passes the recognized string back to your application by calling the onActivityResult() callback.

Voice Search

안드로이드는 오픈 플랫폼이다. 그래서 당신의 애플리케이션은 잠재적으로 디바이스에 있는 RecognizerIntent를 받는 음성인식 서비스를 사용할 수 있다. 구글의 음석 검색 어플리케이션은 많은 안드로이드 기기에 미리 설치되어 있어서, "Speak now" 다이얼로그와 함께 RecognizerIntent에 반응하고 , 오디오를 구글 서버에 스트리밍한다. - 같은 서버를 이용하여 사용자가 마이크로폰 버튼을 클릭하거나 키보드의 보이스 위젯을 사용하도록 할 수있다. Settings > Applications > Manage applications 에서 Voice Search 가 설치되었는지 확인할 수 있다.

Android is an open platform, so your application can potentially make use of any speech recognition service on the device that's registered to receive a RecognizerIntent. Google's Voice Search application, which is pre-installed on many Android devices, responds to a RecognizerIntent by displaying the "Speak now" dialog and streaming audio to Google's servers -- the same servers used when a user taps the microphone button on the search widget or the voice-enabled keyboard. You can check whether Voice Search is installed in Settings > Applications > Manage applications.

One important tip

speech input 이 정확하게 동작하기 위해서는 말할 때 주로 사용되는 단어들을 가지는 것이 도움이 된다.

One important tip: for speech input to be as accurate as possible, it's helpful to have an idea of what words are likely to be spoken. While a message like "Mom, I'm writing you this message with my voice!" might be appropriate for an email or SMS message, you're probably more likely to say something like "weather in Mountain View" if you're using Google Search. You can make sure your users have the best experience possible by requesting the appropriate language model: free_form for dictation, or web_search for shorter, search-like phrases. We developed the "free form" model to improve dictation accuracy for the voice keyboard, while the "web search" model is used when users want to search by voice.

Google's servers support many languages for voice input, with more arriving regularly. You can use the ACTION_GET_LANGUAGE_DETAILS broadcast intent to query for the list of supported languages. The web search model is available in all three languages, while free-form has primarily been optimized for English. As we work hard to support more models in more languages, and to improve the accuracy of the speech recognition technology we use in our products, Android developers who integrate speech capabilities directly into their applications can reap the benefits as well.

댓글