在线图像描述和OCR

作者
NVDA兼容版本: 2018.3至2020.2
下载开发版

这个插件给NVDA添加了在线图像识别引擎。

有两种引擎可用，OCR和图像描述。

OCR 从图像中提取文本。

图像描述器以文本形式描述图像中的视觉特征，如一般描述、颜色类型地标等。

使用此插件需要互联网连接，因为图像描述服务由网络 API 提供。

它们在此插件中称为引擎。

有三类引擎可用。

在线 OCR
在线图像描述
Windows 10 OCR 引擎（脱机）

您还需要选择识别图像的来源。

当前导航对象
当前窗口
整个屏幕
来自剪贴板的图像数据或文件
来自剪贴板的图像文件路径名或图像 URL

键盘快捷键

选择这些类型后，您可以使用一个快捷键开始识别。

NVDA + alt + P 识别当前导航对象然后读取结果。如果按两次, 打开虚拟结果文档。

还有四个未分配的手势。请在使用前分配它们。

循环浏览不同的识别引擎类型。

循环浏览不同的识别源类型。

取消当前识别

如果您认为等待的时间过长，并且想要取消，此手势可能很有用。

此外，有时您不想被识别消息打扰，因为您需要在识别启动后浏览其他东西。

Show previous result in a virtual result document.

Though there is a feature to copy result to clipboard. Character position information cannot be preserved, so this gesture is added to solve this problem.

There are also four old gestures are left unassigned for users who prefer gestures in previous versions.

It is recommended to use new gesture and switch engine type according to your need.

Recognize current navigator object with online OCR engine Then read result. If pressed twice, open a virtual result document.

Recognizes image in clipboard with online OCR engine. Then read result. If pressed twice, open a virtual result document.

Recognize current navigator object Then read result. If pressed twice, open a virtual result document.

Recognizes image in clipboard . Then read result. If pressed twice, open a virtual result document.

引擎设置

您可以选择识别引擎, 并在 NVDA 设置对话框的 * 在线图像描述 * 类别中对其进行详细配置。

The author of addon have registered account with free API quota and set up a proxy server on www.nvdacn.com to make this addon easier to test at first. Test quota is limited and may be cancelled by API provider anytime.

It is highly recommended to register your own key according to guide in each engine.

以下设置适用于所有引擎。

Copy recognition result to the clipboard: if enabled, recognition result text will be copied to clipboard after recognition.
Use browseable message for text result: if enabled, recognition result text will be shown in a popup window instead of speech or braille message.
Swap the effect of repeated gesture with none repeated ones: by default, a virtual result document is shown only if you press the corresponding gesture twice, if you use that frequently you can enable this option so that you only need to press once to get a result viewer.
Enable more verbose logging for debug purposes: some logs are essential for debugging but affects performance and takes up a lot of space. Only turn this on if specifically instructed to by the addon author or an NVDA developer.
Proxy type: which type of proxy you are using. If you do not know what a proxy is just leave it as is.
Proxy address: full URL of your proxy. If you do not know what a proxy is just leave it as is. If you choose to use proxy your proxy will be verified before saving , after verification, there will be a prompt to tell you result.

The following settings means the same in all engines, describe them here to save space.

API Access Type: this controls how you get access to the corresponding API endpoints.
- If you choose "Use public quota", you are using free quota in an account registered by addon author.
- If you choose "Use your own API key", this addon will use quota from your own account.
APP ID, API key or API Secret Key: if you want to use quota from your own account corresponding access tokens is required. Some engines only need API key. Some engines require two tokens. These are only valid if you choose "use your own API key" in API Access type.

Note that the quality and accuracy of results are affected by many factors.

引擎商使用的模型和技术
上传图像的质量
导航对象是否隐藏在其他东西后面
屏幕分辨率

在线图像描述

这里有三个引擎可用。

微软 Azure 图像分析

此引擎根据图像内容提取一组丰富的视觉特征。

此引擎仅支持英语。如果需要其他语言的描述，可以使用 Microsoft Azure 图像描述器

视觉特征包括:

成人-检测图像在本质上是否色情 (描绘裸体或性行为)。性暗示内容也被检测到。
品牌-检测图像中的各种品牌, 包括大致位置。品牌论点仅提供英文版本。
类别-根据文档中定义的分类对图像内容进行分类。
颜色-确定强调色、主导颜色以及图像是否为黑色和白色。
描述-描述图像内容与一个完整的句子在支持的语言。
人脸-检测图像上是否有人脸。如果存在, 则生成坐标、性别和年龄。
图像类型-检测图像是不是剪贴画，是不是简笔画。
对象-检测图像中的各种对象, 包括大致位置。"对象" 参数仅在英语中提供。
标签-标记图像与图像内容相关的单词的详细列表。

某些功能还提供了其他详细信息:

名人-识别名人。
地标-如果在图像中检测到地标, 则标识地标。

微软 Azure 图像描述

此引擎生成一个图像的简单描述。这个描述是通过检测到的一系列标签合成的。可以为每个图像生成多个说明。描述是根据可能性排序的。此引擎有两个设置。 * 语言服务将返回图像描述的语言。默认情况下为英语。

可以为每个图像生成多个描述。描述按其置信度分数排序。

此引擎有两个设置。

语言: 用于识别的文本语言。默认情况下为英语。
最多返回多少个可能的描述要返回的描述的最大数量。默认值为1。

在线 OCR

Online engines rely on the use and presence of the following services.

https://www.nvdacn.com

https://ocr.space/ocrapi

https://azure.microsoft.com/en-us/services/cognitive-services/

http://ai.qq.com

http://ai.baidu.com

http://ai.sogou.com/

https://intl.cloud.tencent.com

引擎

有五个引擎可用。

腾讯云OCR

此 API 由腾讯云和无障碍研究会赞助，每天的配额为 15000。

此引擎支持 19 种语言。

中文和英文混合
日语
韩语
西班牙语
法语
德语
葡萄牙语
越南语
马来语
俄语
意大利语
荷兰语
瑞典语
芬兰语
丹麦语
挪威语
匈牙利语
泰语
拉丁语

下面是此引擎的设置。

语言: 用于识别的文本语言。默认情况下自动检测。

OCR空间

这是一个付费 API，由 [OCR Space] (https://ocr. space)提供免费配额

它支持 24 种语言

阿拉伯语
匈牙利语
简体中文
繁体中文
克罗地亚语
捷克语
丹麦语
荷兰语
英文
芬兰语
法语
德语
希腊语
匈牙利语
韩语
意大利语
日语
波兰语
葡萄牙语
俄语
斯洛文尼亚语
西班牙语
瑞典语
土耳其语

以下是此引擎的设置:

语言: 用于识别的文本语言。默认情况下为英语。
检测图像方向: 如果设置为 true, API 将正确自动调整图像。
缩放图像以获得更好的质量：如果设置为 true，API 将会自己放大图像。这可以显著改善 OCR 结果，尤其是低分辨率 PDF 扫描。
优化表格识别如果选中, OCR 引擎将确保分析的文本结果始终逐行返回。如果要处理表格、收据、发票和具有类似表结构的文档, 建议开启这个选项。

如果要使用自己的密钥, 还需要指定 API 密钥。

您可以通过在 [OCR Space] (https://ocr. space)上注册获得自己的免费 API 密钥

这里有一个简单的指南。

查找链接"免费注册 API 密钥"

点击它，你会发现一个表格。

表单要求您输入以下数据

电子邮件地址
名字
姓
您计划如何使用 OCR API？

填写并提交后。您可能还需要通过验证码

然后您将收到一封确认电子邮件

在该电子邮件中查找名为"是的，请订阅我到此列表"的链接。访问该链接，您很快就会通过电子邮件收到 API 密钥。

微软 azure ocr

此引擎在 Microsoft Azure 认知服务计算机视觉中使用 OCR API。

它支持24种语言, 包括

简体中文
繁体中文
捷克语
丹麦语
荷兰语
英文
芬兰语
法语
德语
希腊语
匈牙利语
意大利语
日语
韩语
挪威语
波兰语
葡萄牙语
俄语
西班牙语
瑞典语
土耳其语
阿拉伯语
罗马尼亚语
塞尔维亚西里尔文
塞尔维亚语
斯洛伐克语

以下是此引擎的设置:

语言: 用于识别的文本语言。默认情况下自动检测。
检测图像方向: 如果设置为 true, API 将正确自动调整图像。

如果您使用自己的密钥, 则应从以下链接获取使用 Microsoft 计算机视觉 API 的订阅密钥:

第 1 步：在 Azure 网站上创建一个帐户（https：//azure.microsoft.com/en-ua/try/认知服务/）

请注意, 必须为计算机视觉 API 创建密钥。您在使用单个密钥导航时遇到的第一个 "GET API 密钥" 按钮。目前, Microsoft 提供了创建7天试用密钥的选项。你也可以注册一个免费的 azure 帐户更多的线索。注册需要信用卡。如果您已经拥有订阅帐户, 则可以跳过此步骤。

步骤 2: 部署认知服务

现在，您有一个 azure 帐户。

首次登录 [Azure 门户] (http://portal.azure.com)

等待, 直到您收到消息门户已准备就绪, 您将登录到 azure 门户。

在 "所有服务" 按钮后找到名为 "所有资源" 的链接, 并将其激活。

第2步: 部署认知服务现在你有了一个模糊的帐户。首次登录 [Azure 门户] (http://portal.azure.com) 等待, 直到您收到消息门户已准备就绪, 您已登录到 azure 门户。在 "所有服务" 按钮后找到名为 "所有资源" 的链接, 并将其激活。等待, 直到你得到消息刀片所有资源都准备好了, 你的重点将是一个编辑框, 然后按 shift 选项卡找到一个名为添加并激活它的菜单项。

等待, 直到你得到消息搜索市场, 键入认知服务, 然后按向下箭头。等待, 直到你得到消息列表的选项认知服务五个之一, 然后按回车键。等待, 直到你得到消息刀片认知服务是准备按下选项卡或 b 找到一个按钮名为创建激活它。等待, 直到您收到消息刀片创建准备好, 您的焦点将是一个编辑框, 键入此资源的名称。请注意, 您的资源名称只能包含字母数字字符 ",-", 并且不能以 "" 或 "-" 结尾。我选择 NVDAOCR。按选项卡转到 "订阅" 组合框。通常你可以让它原封不动。按选项卡转到 "位置" 组合框。选择一个靠近您当前位置的位置。请务必记住这一点, 因为在引擎配置中需要位置。按选项卡转到 "定价捆绑" 组合框。通常像 F0 这样的免费领带就足够了。如果这还不够, 您可以在查看完整定价详细信息链接中的 "查看完整定价详细信息" 中选择其他层。按选项卡转到 "创建新资源组编辑框"。如果没有任何资源组, 则应创建一个资源组。按选项卡查找 "创建新按钮"。然后按 "选项卡转到" 创建按钮 "以创建此资源。等待, 直到您得到的消息部署成功。然后找到 "转到资源" 按钮, 有时您需要上去激活 "通知" 按钮, 然后才能找到 "转到资源" 按钮。等待, 直到您收到消息刀片快速启动是繁忙的。找到名为键的链接, 然后将其激活。等待, 直到您收到消息刀片管理密钥已准备就绪。查找名为键1或键2的编辑框。该编辑框的内容是引擎配置中所需的 API 密钥。然后, 如果您使用自己的 API 密钥, 则可以填写所需的这两个设置。 Azure 资源区域: 在 Azure 门户中部署认知服务时选择的区域。 API 密钥: 建议使用 key 2 作为在 Azure 门户中成功部署认知服务后获得的密钥。

Wait until you get the message Blade Create is ready, your focus will be an edit box, type a name for this resource. Note that Your resource name can only include alphanumeric characters, '', '-', and can't end with '' or '-'.

我选择NVDA_OCR。

Press tab to go to Subscription combo box. Usually you can leave it as is.

Press tab to go to Location combo box. Choose one close to your current location.

Be sure to remember this since location is required in engine configuration.

Press tab to go to Pricing tie combo box. Usually a free tie like F0 is adequate. If that is not enough you can choose other tier after reading full pricing details in View full pricing details link.

Press tab to go to Create new Resource group edit box. You should create one if you do not have any Resource group. Press tab find Create new button.

Then press tab go to Create Button to create this resource.

Find edit box named key 1 or key 2. The content of that edit box is the API key required in engine configuration. Press Ctrl-C to copy it for engine configuration

Then you can fill in these two settings required if you use your own API key.

Azure resource Region: the region you choose when deploying Cognitive Services in Azure Portal.
API key: the key you get after successfully deploying Cognitive Services in Azure Portal, KEY 2 is recommended.

百度OCR

This one is also a paid API with free quota provided by Baidu.

Baidu OCR supports 10 languages including

中文和英文混合
英文
葡萄牙语
法语
德语
意大利语
西班牙语
俄语
日语
韩语

这个引擎还可以得到每个字符的位置

以下是它的设置:

获取每个字符的位置允许您在某些无法访问的应用程序上执行更精确的操作。启用此功能将使识别速度稍慢。
Use Accurate API: if is enabled will use a different endpoint. That accurate endpoint takes longer time but has higher quality and (If you use your own API key its price is also higher).

它有四个具有单独配额限制的终结点。

Basic OCR without any information about text location. Currently 50000 times a day.
Basic OCR with information about text location. Currently 500 times a day.
Accurate OCR without any information about text location. Currently 500 times a day.
Accurate with information about text location. Currently 50 times a day.

If you press the gesture which only read result, you are using endpoints without any information about text location.

If you press the gesture which shows an result viewer, you are using endpoints with information about text location.

虽然它提供了相当慷慨的免费配额, 但它的网站只有中文, 并不十分方便。

腾讯优图

此 API 使用免费但是有频率限制，大约为每秒钟两个查询。

如果要绕过限制，可以注册自己的 API 密钥。此 API 的网站仅有中文，无障碍不好。

文档中没有有关语言支持的信息。根据我的测试支持中英文和中英文混合物。

此 API 没有其他配置。

更新日志

0.19

与 NVDA 2020.2 兼容
添加由腾讯云和无障碍研究会赞助的腾讯云OCR引擎
删除不可用的引擎奥利弗·埃德霍尔姆提供的人工智能引擎和搜狗OCR。
修复中文站上面的公共配额

0.18

与 python3 兼容
Introduce the concept of recognition source type and engine type to reduce gesture usage.
Add a new unassigned gesture to cycle through different recognition source types.
Add a new unassigned gesture to cycle through different recognition engine types.
Add a new gesture to recognize according to image source and engine type setting.
Add a new unassigned gesture to show previous result in a virtual result document.

0.17

修复了以下问题:
- Jump directly to panel when switch to onlineImageDescriber in settings dialog
- Fix wrong description in azure analyzer

0.16

添加取消识别的手势
修复了以下问题:
- 无法朗读 "复选列表框" 状态更改
- Swap the effect of repeated gesture not working in online image describer

0.15

Add an option to pop up a window containing message instead of speech or braille message for text results
将 Microsoft Azure 图像分析器中的可视功能复选框更改为 "复选框列表"。
修复以下问题:
- Cannot load jpg image file from clipboard
- 识别后, 结果文档对象不会显示。
- 如果图像在内部调整大小, 则结果文档对象中的位置不可靠。
- Microsoft Azure 图像描述器的结果位于同一行中, 因此很难在该行中导航。

0.14

修正了一些错误:
- Cannot use your own API key in Microsoft Azure engines
- 如果有盲文显示, 则无法获取文本结果

0.13

在不重新启动的情况下重新加载插件时, 确保加载项正常工作 (NVDA+Control+F3)

0.12

修正了 Microsoft Azure 图像描述器的浏览模式消息
用NVDA内置的方法描述强调色。
改进的 Microsoft Azure 图像分析仪的结果格式
根据审阅注释改进文档
修正手势不一致的问题。
Control+Shift+NVDA for clipboard while NVDA+ALT for navigator object
修复识别时丢失的 imageInfo 错误。

0.11

增加了图像描述功能
Change addon summary to online image describer

0.10

Fix error using user's own API key in sougou API.
Fix unknown panel issue by adding settings to supportedSettings

0.9

修复重复手势无效果问题。
修订文档, 以反映代码的更改。
Clarified what kind of clipboard image is supported and how to copy image for recognition.
修复了剪贴板识别无法打开结果查看器问题的问题。
Added support to recognize copied local image file path in clipboard.

0.8

如果识别结果为空, 提示用户内容为空。
Fixed another place do not work well with non ascii config path

0.6

Added proxy settings for people with access of Internet behind a specific proxy.
添加了几个常规选项。
修复由于将 Unicode URL 发送到 urllib3 而导致的 unicode 解码错误。

0.5

修复OCR 引擎直接上传图像文件, 而不是 base64 编码时发生的unicode 错误。
Change gesture of recognizing clipboard to Control+Shift+NVDA+R since NVDA+Shift+R is used in Word and Excel to define row headers in tables, or to delete the definitions when pressed twice.

0.4

修复配置路径包含非 ascii 字符时的安装错误
更改手势以避免与金色光标冲突。
将默认引擎更改为 Microsoft azure, 因为它可以自动检测文本语言。

0.3

添加有关如何获取 Microsoft Azure OCR 密钥的详细文档
修复了有关新安装的问题。
Removed auto OCR since this feature is problematic and may confuse with online engines. Auto OCR will be a separate addon, when it is stable enough.