chrome扩展程序_改善我的Chrome扩展程序的声音识别实例

news/2024/7/20 4:07:55 标签: chrome, java, 小程序, python, c++

chrome扩展程序

Recently I looked into improving the vocal recognition instance of my chrome extension, orange. Basically, it allows the user to control a webpage with their voice. Primarily I use it to control playback from sites like Netflix, Spotify, and Youtube.

最近,我研究了改进我的Chrome扩展名Orange的声音识别实例。 基本上,它允许用户使用他们的声音来控制网页。 首先,我使用它来控制来自Netflix,Spotify和Youtube等网站的播放。

Here’s a quick demo of the updated version:

这是更新版本的快速演示:

演示地址

The main issue I had with the older version is the sound from whatever source I was listening to would drown out my voice causing the recognition to mess up.

我使用较旧版本的主要问题是,无论我听什么来源的声音,都会淹没我的声音,导致识别混乱。

In this article I am going to write about some of the techniques I used to improve the voice recognition instance of the extension. I have previously written about the basic workings of the extension which can be found here:

在本文中,我将写一些我用来改进扩展的语音识别实例的技术。 我之前已经写了关于扩展的基本原理的文章,可以在这里找到:

chrome-extension-that-uses-vocal-recognition-to-control-dom-elements-on-a-webpage-a3a5cb19a584" rel="noopener noreferrer">

Also it is published to the chrome webstore and can be downloaded here

它也发布到chrome网站商店,可以在这里下载

Polarity Invert

极性反转

The first idea I had to improve the speech recognition involves filtering the browser audio from the mic’s audio stream.

我必须改善语音识别的第一个想法是从麦克风的音频流中过滤浏览器音频。

This is similar to a technique I would use while working as an audio engineer. Basically, if you have a duplicate sound and invert the polarity of one of the streams they will cancel each other out. The image below shows two waves that are inverted. When the red wave’s amplitude is at +1 the blue’s is at -1 thus canceling each other out.

这类似于我在担任音频工程师时使用的技术。 基本上,如果您有重复的声音并反转其中一个流的极性,它们将互相抵消。 下图显示了两个反转的波。 当红波的振幅为+1时,蓝波的​​振幅为-1,因此彼此抵消。

Image for post

So if I grabbed the browser audio from the browser audio and inverted it; then merged it with the mic input stream it should filter out the unwanted audio in the mic stream leaving just the spoken command word right?

因此,如果我从浏览器音频中获取浏览器音频并将其反转; 然后将其与麦克风输入流合并,它应该过滤掉麦克风流中不需要的音频,而只剩下口头命令字对吗?

Unfortunately this is more complicated than I am making it seem. For the polarity invert technique to work the sounds have to match up completely. The polarity invert technique I describe doesn’t work because it does not account for the transfer function of the acoustic path from the speakers to the microphone.

不幸的是,这比我看起来要复杂得多。 为了使极性反转技术起作用,声音必须完全匹配。 我描述的极性反转技术不起作用,因为它不能解决从扬声器到麦克风的声路传递函数。

However this problem is solvable; using Acoustic Echo Cancellation you can account for the transfer function of the acoustic path. Once I got to this point however I felt like I was reaching a bit to far and decided to look at other alternatives.

但是这个问题是可以解决的。 使用声学回声消除功能,您可以考虑声学路径的传递函数。 一旦达到这一点,我就觉得自己已经走到了很远,并决定考虑其他选择。

Volume Control

音量控制

Stepping back I decided I should just log the output of the recognition and see how it responds when I play the volume from my computer at full. And what I found was the recognition would pick up at least one of the two words in the command (ex. “orange play”) and most of the time it would be “orange”.

退后一步,我决定只记录日志的输出,并查看当我从计算机上完全播放音量时响应如何。 我发现,识别将至少拾取命令中的两个词之一(例如“橙色游戏”),大多数情况下将是“橙色”。

And to get a better idea of how this command event is working here is a diagram.

为了更好地了解此命令事件在此处的工作方式,请使用图表。

Image for post

As you can see I look for and instance of the word “orange” and then call whatever word is after it. This is the code that needs improvement since I am relying on both words being recognized.

如您所见,我在寻找“ orange”一词的实例,然后调用其后的任何词。 这是需要改进的代码,因为我依靠两个单词都可以被识别。

Since I was able to pick up “orange” most of the time I changed the command event to this:

因为大多数时候我都能拾取“橙色”,所以我将命令事件更改为:

Image for post

By using a boolean to manage the state of the command event I can improve recognition. When the word “orange” is spoken; the audio in the current browser is lowered and the state is changed to listening. This state will last for around 4 seconds giving the user time to speak their second command word.

通过使用布尔值来管理命令事件的状态,我可以提高识别度。 当说出“橙色”一词时; 当前浏览器中的音频会降低,并且状态会变为聆听。 此状态将持续约4秒钟,使用户有时间说第二个命令字。

Image for post

One reason I like this method is not only does it improve the recognition but it helps direct the user in how to call a command. I added in a sound effect to emphasize this that will play once audio has been lowered and the state is set to listening. Once the sound effect is played the user knows to speak the second command word.

我喜欢这种方法的一个原因不仅在于它可以提高识别能力,而且还可以指导用户如何调用命令。 我添加了声音效果来强调这一点,该声音将在降低音频并将状态设置为聆听后播放。 一旦播放了声音效果,用户便会说出第二个命令字。

Anyway hope you have found this read interesting. Feel free to drop a comment below if you want to chat about the extension!

无论如何,希望您发现这读起来很有趣。 如果您想聊天扩展,请在下面发表评论!

翻译自: https://medium.com/dev-genius/improving-the-vocal-recognition-instance-of-my-chrome-extension-73a36d470542

chrome扩展程序


http://www.niftyadmin.cn/n/1468209.html

相关文章

linux编码合适修改_linux 栈回溯(x86_64 )

前序前面几个章节我们了解了《ELF文件格式》、《ELF文件加载过程》、《x86通用寄存器》、《x86栈帧原理》和《linux 进程内核栈》,对x86平台上程序运行和调试机制有了一定认识。接下来我们从程序调试的角度,来一同学习下x86栈回溯的原理和使用。栈回溯发…

android 气泡,Android打造带阴影的气泡

前言在做Android项目的时候,我们的射鸡师常常会使用阴影,Android实现阴影目前主流有以下几种方式:1)使用.9文件2)使用 layer-list3)使用 elevation,translationZ4)使用自定义View 和自定义ViewGrop本文综合考虑之后,考虑使用的是第…

学习服务器 学哪一门语言_为什么要为下一门学习的语言考虑生锈

学习服务器 学哪一门语言According to the StackOverflow surveys, Rust has been the most-loved programming language for the last four years in a row. Most of the people that have tried out Rust would like to continue using it.根据StackOverflow的调查&#xff0…

android drag2有什么用,Android ViewDragHelper使用介绍

ViewDragHelper是support.v4下提供的用于处理拖拽滑动的辅助类,查看Android的DrawerLayout源码,可以发现,它内部就是使用了该辅助类来处理滑动事件的.public DrawerLayout(Context context, AttributeSet attrs, int defStyle) {super(context, attrs, defStyle);setDescendant…

hibernate框架构建_我们如何构建服务框架而不是框架

hibernate框架构建目录 (Table of Contents) Introduction 介绍 Building the Skeleton 建立骨架 - HTTP Endpoints -HTTP端点 - Swagger -昂首阔步 - Prometheus & Grafana -普罗米修斯与格拉法纳 - Logging -记录 - CICD Integration -CICD整合 Conclusion 结论 介…

解耦,未解耦的区别_受干净架构启发的解耦php架构

解耦,未解耦的区别This article would not be possible without the help of Rodrigo Jardim da Fonseca, Edison Junior, and Lemuel Roberto.没有Rodrigo Jardim da Fonseca, Edison Junior和Lemuel Roberto的帮助,这篇文章是不可能的。 Disclaimer: …

硬盘柱面损坏怎么办_硬盘有坏道就不行用了吗?别再吃哑巴亏了,今日跟大家再说一次...

硬盘是电脑的保存硬件,是电脑中重点的硬件之一,目前商场上首要运用的是固态硬盘与机械硬盘两种,固态硬盘的读写速率较快,容量小,价钱贵,机械硬盘读写速率慢,容量大价钱自制,现在的电…

android wenbview 自定义,Android自定义格式ClockTextView

android 自带的DigitalClock 不能自定义时间显示格式。我把代码重写了写了下, 添加了一个公有方法setFormat(String format) 自定义显示格式1.[代码][Java]代码初始化程序: ClockText clock new ClockText(this);clock.setFormat("yyyy-MM-dd hh:m…