Voice Recognition





edit SideBar

Voice Recognition

Objectives

Ideas

Simple Offline Control

  • Pocketsphinx with very limited vocabulary
  • Every command is keyword triggered
  • Quick timeouts

Hybrid Pocketsphinx Google API

  • Recognise trigger using pocketsphinx
  • Acknowledge with beeps
    • Need to manage mixer controls
  • Pass commands to online STT engine (http://wit.ai)
  • Process and control Kodi and openHAB
  • Fall-back to Simple Offline Control

Hardware

Software


Prerequisites

Support packages

sudo apt-get install alsa-utils python-pip python-yaml python-dateutil python-pyaudio
sudo pip install apscheduler # need never versions, apt versions are too old

ALSA playback

sudo modprobe snd_usb_audio # USB mic, loads as card1 on RPi (after snd-bcm2835)

THIS DOES NOT WORK:

options snd-usb-audio index=0
options snd-bcm2835 index=1

Don't even bother trying to force index=1 for snd-bcm2835, it doesn't support the index parameter:

osmc@osmc:~$ /sbin/modinfo snd-bcm2835
filename:       /lib/modules/4.3.3-3-osmc/kernel/sound/arm/snd-bcm2835.ko
alias:          platform:bcm2835_alsa
license:        GPL
description:    Alsa driver for BCM2835 chip
author:         Dom Cobley
srcversion:     46AE410DEA6D239DB70D2C9
alias:          of:N*T*Cbrcm,bcm2835-audio*
depends:        snd-pcm,snd
intree:         Y
vermagic:       4.3.3-3-osmc preempt mod_unload modversions ARMv6 
parm:           force_bulk:Force use of vchiq bulk for audio (bool)

Let snd-bcm2835 be card0 and load snd-usb-audio as card1:

osmc@osmc:~$ cat /etc/modprobe.d/jasper.conf 
options snd-usb-audio index=1

Then configure defaults in .asoundrc accordingly.

osmc@osmc:~$ arecord -l
**** List of CAPTURE Hardware Devices ****
card 1: CameraB409241 [USB Camera-B4.09.24.1], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0

Audio configuration for PS3 Eye

The PS3 Eye is a camera with a 4-channel array mic.

Local ~/.asoundrc

## Suggested by http://julius.sourceforge.jp/forum/viewtopic.php?f=9&t=66
pcm.array {
  type hw
  card 0
}

pcm.array_gain {
  type softvol
  slave {
    pcm "array"
  }
  control {
    name "Mic Gain"
    count 2
  }
  min_dB -10.0
  max_dB 5.0
}

pcm.cap {
  type plug
  slave {
    pcm "array_gain"
    channels 4
  }
  route_policy sum
}

pcm.!default {
    type asym

    playback.pcm {
     type plug
      slave.pcm {
        @func getenv
        vars [ ALSAPCM ]
        default "hw:0,0"
      }
    }
    capture.pcm {
        type plug
        slave.pcm "cap"
    }
}

Jasper

Project
http://jasperproject.github.io/
Passive STT
pocketsphinx
Active STT
wit.ai
TTS
Flite

Integrates STT and TTS systems. Python-based.

Configuration

~/.jasper/profile.yml

...
stt_passive_engine: sphinx
stt_engine: witai
witai-stt:
  access_token: A0VERY0LONG0ALPHA0NUMERIC0STRING
tts_engine: flite-tts
flite-tts:
  voice: slt
...

For split active and passive STT we need pocketsphinx and related packages.

RPi2? Installation

For RPi2? (armv7) we can use packages from Debian experimental:

sudo su -c "echo 'deb http://ftp.debian.org/debian experimental main contrib non-free' > /etc/apt/sources.list.d/experimental.list"
sudo apt-get update
sudo apt-get -t experimental install cmuclmtk phonetisaurus m2m-aligner mitlm libfst-tools libfst1-plugins-base libfst-dev

RPi1? Installation

For RPi1? (armv6) we can't use packages from Debian experimental so must build from source or install from elsewhere.

Install cognomen packages

# add repo
sudo su -c "echo 'deb http://cognomen.co.uk/apt/debian jessie main' > /etc/apt/sources.list.d/cognomen.list"
# import pgp key
gpg --keyserver keyserver.ubuntu.com --recv  FC88E181D61C9391C4A49682CF36B219807AA92B && gpg --export --armor keymaster@cognomen.co.uk | sudo apt-key add -
# update
sudo apt-get update
sudo apt-get install pocketsphinx pocketsphinx-hmm-en-hub4wsj python-pocketsphinx python-yaml phonetisaurus m2m-aligner mitlm libfst-tools libfst1-plugins-base libfst-dev cmuclmtk python-semantic
 

Building RPi1? dependencies from source

Trying to Cross Compile

Don't need crosstool-ng can use prebuilt raspberrypi-tools x86-32 linaro cross compiler.

Naïve openfst cross-compile

export PATH=~/src/raspberrypi-tools/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian/bin:$PATH
./configure --host arm-linux-gnueabihf --enable-compact-fsts --enable-const-fsts --enable-far --enable-lookahead-fsts --enable-pdt
make -j 8

Cross compilation works but Debian RaspberryPi Packaging doesn't.

Build natively

apt-get source phonetisaurus m2m-aligner mitlm openfst
# for each
dpkg-buildpackage -us -uc -rfakeroot

Install to repo

On the system with the signing keys:

sshfs yuggoth:/ yuggoth-ssh
cd yuggoth-ssh/var/www/data/cognomen.co.uk/apt/debian
for i in *.deb
do
    reprepro includedeb jessie "$i"
done

Other methods

wit.ai Standalone

Not used by jasper.

sudo apt-get install libsox2
wget https://github.com/wit-ai/witd/releases/download/v0.1/witd-armv6
chmod a+x witd-armv6
./witd-armv6

Voice Command for RPi?

CMU Sphinx, PocketSphinx?, KodiVC?

sudo apt-get install build-essential sshfs automake libtool

RaspBMC/Kodi uses pulseaudio so use that for kodivc.

sudo apt-get install bison libpulse-dev

KodiVC?: github

Google Voice API

V1 API probably doesn't work any more. V2 needs at least a new API key (limited to 50 calls per day).

Old script

From http://blog.oscarliang.net/raspberry-pi-voice-recognition-works-like-siri/ :

#!/bin/bash

echo "Recording... Press Ctrl+C to Stop."
arecord -D "plughw:1,0" -q -f cd -t wav | ffmpeg -loglevel panic -y -i - -ar 16000 -acodec flac file.flac  > /dev/null 2>&1

echo "Processing..."
wget -q -U "Mozilla/5.0" --post-file file.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium" | cut -d\" -f12  >stt.txt

echo -n "You Said: "
cat stt.txt

rm file.flac  > /dev/null 2>&1

Resources

Recent Changes (All) | Edit SideBar Page last modified on 19 January 2016, at 07:21 PM UTC Edit Page | Page History
Powered by PmWiki