Descritor de imagens online
- Autor: Larry Wang
- NVDA compatibility: from 2018.3 to 2020.2
- Baixar versão de desenvolvimento
Este extra visa adicionar motores de reconhecimento de imagem online ao NVDA.
There are two types of engines. OCR and image describer.
O OCR extrai texto da imagem.
O descritor de imagem descreve as características visuais da imagem em forma de texto, tais como descrição geral, pontos de referência do tipo de cor e assim por diante.
A ligação à Internet é necessária para utilizar este extra, uma vez que os serviços de descrição de imagens são fornecidos por pontos finais API na Internet.
São chamados motores neste extra.
There are three types of engine for this addon.
- Online OCR engine
- Online image describer engine
- Motor de OCR do Windows 10 (offline)
Também é necessário escolher a fonte da imagem de reconhecimento.
- Objecto navegador actual
- Current foreground window
- The whole screen
- Image data or file from clipboard
- Image file pathname or image url from clipboard
Comandos de teclado:
After choosing these types, you can start recognition with one gesture.
NVDA+Alt+P Perform recognize according to source and engine type setting, Then read result. If pressed twice, open a virtual result document.
There are four additional gestures left unassigned. Please assign them before using.
Cycle through different recognition engine types.
Cycle through different recognition source types.
Cancel current recognition
This gesture can be useful if you think you have waited for too long and want to cancel.
Also sometimes you do not want to be disturbed by recognition message because you need to review some messages arrived after recognition start.
Show previous result in a virtual result document.
Though there is a feature to copy result to clipboard. Character position information cannot be preserved, so this gesture is added to solve this problem.
There are also four old gestures are left unassigned for users who prefer gestures in previous versions.
It is recommended to use new gesture and switch engine type according to your need.
Recognize current navigator object with online OCR engine Then read result. If pressed twice, open a virtual result document.
Recognizes image in clipboard with online OCR engine. Then read result. If pressed twice, open a virtual result document.
Recognize current navigator object Then read result. If pressed twice, open a virtual result document.
Recognizes image in clipboard . Then read result. If pressed twice, open a virtual result document.
Configuração do motor
Pode escolher os motores de reconhecimento e configurá-los detalhadamente na categoria * Descritor de imagens online * na caixa de diálogo de configurações do NVDA.
The author of addon have registered account with free API quota and set up a proxy server on to make this addon easier to test at first. Test quota is limited and may be cancelled by API provider anytime.
It is highly recommended to register your own key according to guide in each engine.
As configurações seguintes são aplicáveis para todos os motores:
- Copy recognition result to the clipboard: if enabled, recognition result text will be copied to clipboard after recognition.
- Use browseable message for text result: if enabled, recognition result text will be shown in a popup window instead of speech or braille message.
- Swap the effect of repeated gesture with none repeated ones: by default, a virtual result document is shown only if you press the corresponding gesture twice, if you use that frequently you can enable this option so that you only need to press once to get a result viewer.
- Enable more verbose logging for debug purposes: some logs are essential for debugging but affects performance and takes up a lot of space. Only turn this on if specifically instructed to by the addon author or an NVDA developer.
- Proxy type: which type of proxy you are using. If you do not know what a proxy is just leave it as is.
- Proxy address: full URL of your proxy. If you do not know what a proxy is just leave it as is. If you choose to use proxy your proxy will be verified before saving , after verification, there will be a prompt to tell you result.
The following settings means the same in all engines, describe them here to save space.
API Access Type: this controls how you get access to the corresponding API endpoints.
- If you choose "Use public quota", you are using free quota in an account registered by addon author.
- If you choose "Use your own API key", this addon will use quota from your own account.
APP ID, API key or API Secret Key: if you want to use quota from your own account corresponding access tokens is required. Some engines only need API key. Some engines require two tokens. These are only valid if you choose "use your own API key" in API Access type.
Note that the quality and accuracy of results are affected by many factors.
- Modelos e técnicas usados pelo provedor do motor:
- Qualidade da imagem submetida
- O objecto do navegador está oculto atrás de outra coisa?
- Resolução de ecrã
Descrição da imagem online
Aqui estão três motores disponíveis.
Analisador de Imagens do Microsoft Azure
This engine extracts a rich set of visual features based on the image content.
This engine is english only. If you want description in other languages, you can use Microsoft Azure Image Describer
Visual Features include:
- Adult - detects if the image is pornographic in nature (depicts nudity or a sex act). Sexually suggestive content is also detected.
- Brands - detects various brands within an image, including the approximate location. The Brands argument is only available in English.
- Categories - categorizes image content according to a taxonomy defined in documentation.
- Color - determines the accent color, dominant color, and whether an image is black&white.
- Description - describes the image content with a complete sentence in supported languages.
- Faces - detects if faces are present. If present, generate coordinates, gender and age.
- ImageType - detects if image is clip art or a line drawing.
- Objects - detects various objects within an image, including the approximate location. The Objects argument is only available in English.
- Tags - tags the image with a detailed list of words related to the image content.
Alguns recursos também fornecem detalhes adicionais:
- Celebrities - identifies celebrities if detected in the image.
- Landmarks - identifies landmarks if detected in the image.
Descritor de imagens da Microsoft Azure
This engine generates a description of an image in human readable language with complete sentences. The description is based on a collection of content tags, which are also returned by the operation.
More than one description can be generated for each image. Descriptions are ordered by their confidence score.
There are two settings for this engine.
- Language: the language in which the service will return a description of the image. English by default.
- Maximum Candidates: maximum number of candidate descriptions to be returned. The default is 1.
OCR online
Os motores on-line contam com o uso e a presença dos seguintes serviços.
Há cinco motores disponíveis:
Tencent Cloud OCR
This API is sponsored by Tencent Cloud and Aceessibility Research Association, with a quota of 15000 per day.
This engine support 19 languages.
- Chinese-English mix
- Japanese
- Korean
- Spanish
- French
- German
- Portuguese
- Vietnamese
- Malay
- Russian
- Italian
- Dutch
- Swedish
- Finnish
- Danish
- Norwegian
- Hungarian
- Thai
- Latin
Here is the settings of this engine.
- Idioma: linguagem de reconhecimento de texto
OCR space
This one is a paid API with free quota provided by OCR Space
It supports 24 languages
- Arabic
- Bulgarian
- Chinese(Simplified)
- Chinese(Traditional)
- Croatian
- Czech
- Danish
- Dutch
- English
- Finnish
- French
- German
- Greek
- Hungarian
- Korean
- Italian
- Japanese
- Polish
- Portuguese
- Russian
- Slovenian
- Spanish
- Swedish
- Turkish
Aqui estão as configurações para este motor:
- Language: text language for recognition. English by default.
- Detect image orientation: if set to true, the API autorotates the image correctly.
- Scale image for better quality: if set to true, the API does some internal upscaling. This can improve the OCR result significantly, especially for low-resolution PDF scans.
- Optimize for table recognition: if set to true, the OCR logic makes sure that the parsed text result is always returned line by line. This switch is recommended for table OCR, receipt OCR, invoice processing and all other type of input documents that have a table like structure.
Se quiser usar a sua própria senha, também é necessário especificar a senha da API.
You can get your own free API key by registering onOCR space
Here is a simple guide.
Find the link "Register for free API key"
Click on it and you will find a form to fill in.
The form asks you to enter the following data
- Email Address
- First Name
- Last Name
- How do you plan to use the OCR API?
After filling it and submit. You may also need to pass a captcha
Then you will receive a confirmation e-mail
Find the link named "Yes, subscribe me to this list." in that e-mail. Access that link and you will receive API key by e-mail soon.
OCR da Microsoft Azure
Este motor usa a API de OCR da Visão Computacional dos Serviços Cognitivos do Microsoft Azure.
It supports 24 languages including
- Chinese Simplified
- Chinese Traditional
- Czech
- Danish
- Dutch
- English
- Finnish
- French
- German
- Greek
- Hungarian
- Italian
- Japanese
- Korean
- Norwegian
- Polish
- Portuguese
- Russian
- Spanish
- Swedish
- Turkish
- Arabic
- Romanian
- Serbian Cyrillic
- Serbian Latin
- Slovak
Aqui estão as configurações para este motor:
- Language: text language for recognition. Auto detection by default.
- Detect image orientation: if set to true, the API autorotates the image correctly.
If you use your own key, you should get a subscription key for using Microsoft Computer Vision API from the link below:
Step 1: Create an account on Azure website
Please note that the key must be created for the Computer Vision API. The first "GET API key" button you encounter with single key navigation. Currently Microsoft provides the option to create a trial key for 7 days. You can also sign up for a free azure account for more trail. Signing up requires a credit card. If you already have a subscription account, you can skip this step.
Step 2: Deploy Cognitive Services
Now you have an azure account.
First login on Azure Portal
Wait until you get the message Portal is Ready you are logged into azure portal.
Find the link called All resources after All services button and activate it.
Wait until you get the message Blade All resources are ready , your focus will be an edit box, then press shift tab find a menu item called add and activate it.
Wait until you get the message Search the Marketplace, type Cognitive Services and press down arrow.
Wait until you get the message List of options Cognitive Services one of five, then press enter.
Wait until you get the message Blade Cognitive Services is ready press tab or b to find a button named Create activate it.
Wait until you get the message Blade Create is ready, your focus will be an edit box, type a name for this resource. Note that Your resource name can only include alphanumeric characters, '', '-', and can't end with '' or '-'.
I choose NVDA_OCR.
Press tab to go to Subscription combo box. Usually you can leave it as is.
Press tab to go to Location combo box. Choose one close to your current location.
Be sure to remember this since location is required in engine configuration.
Press tab to go to Pricing tie combo box. Usually a free tie like F0 is adequate. If that is not enough you can choose other tier after reading full pricing details in View full pricing details link.
Press tab to go to Create new Resource group edit box. You should create one if you do not have any Resource group. Press tab find Create new button.
Then press tab go to Create Button to create this resource.
Wait until you get the message Deployment succeeded.
Then find Go to resource button sometimes you need go up to activate Notifications button before you can find Go to resource button.
Wait until you get the message Blade Quick Start is busy.
Find the link named keys, then activate it.
Wait until you get the message Blade Manage keys is ready.
Find edit box named key 1 or key 2. The content of that edit box is the API key required in engine configuration. Press Ctrl-C to copy it for engine configuration
Then you can fill in these two settings required if you use your own API key.
- Azure resource Region: the region you choose when deploying Cognitive Services in Azure Portal.
- API key: the key you get after successfully deploying Cognitive Services in Azure Portal, KEY 2 is recommended.
OCR do Baidu
This one is also a paid API with free quota provided by Baidu.
Baidu OCR supports 10 languages including
- Chinese and English mixture
- English
- Portuguese
- French
- German
- Italian
- Spanish
- Russian
- Japanese
- Korean
This engine can also get position of every character
Aqui estão as configurações:
Obter a posição de cada caracter: permite que faça um reconhecimento mais preciso em algum aplicativo inacessível. Activar isto tornará o reconhecimento um pouco mais lento.
Use Accurate API: if is enabled will use a different endpoint. That accurate endpoint takes longer time but has higher quality and (If you use your own API key its price is also higher).
Possui quatro endpoints com limite de cota separado.
- Basic OCR without any information about text location. Currently 50000 times a day.
- Basic OCR with information about text location. Currently 500 times a day.
- Accurate OCR without any information about text location. Currently 500 times a day.
- Accurate with information about text location. Currently 50 times a day.
If you press the gesture which only read result, you are using endpoints without any information about text location.
If you press the gesture which shows an result viewer, you are using endpoints with information about text location.
Embora forneça uma cota gratuita bastante generosa, o seu site é chinês e não é acessível.
Tencent AI OCR
This API is free to use with frequency limit about two query per second.
If you want to bypass the limit you can register your own API key. The website of this API is Chinese only and not quite accessible.
There is no information about language support in the document. According to my test Chinese and English and their mixture is supported.
There is no additional configuration for this API.
- Compatible with NVDA 2020.2
- Add Tencent Cloud OCR engine sponsored by Tencent Cloud and Aceessibility Research Association
- Removed unavailable Sougou OCR and Machine Learning Engine by Oliver Edholm.
- Fix public endpoint on NVDA China Site
- Compatible with python3
- Introduce the concept of recognition source type and engine type to reduce gesture usage.
- Add a new unassigned gesture to cycle through different recognition source types.
- Add a new unassigned gesture to cycle through different recognition engine types.
- Add a new gesture to recognize according to image source and engine type setting.
- Add a new unassigned gesture to show previous result in a virtual result document.
Fixed following issues:
- Jump directly to panel when switch to onlineImageDescriber in settings dialog
- Fix wrong description in azure analyzer
- Add a cancel recognition gesture
Fixed following issues:
- CheckListBox state change not announced
- Swap the effect of repeated gesture not working in online image describer
- Add an option to pop up a window containing message instead of speech or braille message for text results
- Change checkboxes for visual features in Microsoft Azure Image Analyzer into a CheckBoxList.
Fix following issues:
- Cannot load jpg image file from clipboard
- Result document object do not show up after recognition.
- Position in result document objects are not reliable if image is resized internally.
- Result from Microsoft Azure Image Describer is in the same line which makes it hard to navigate around.
Fixed some bugs:
- Cannot use your own API key in Microsoft Azure engines
- Cannot get text result if there is a braille display
- Make sure that the add-on works when reloading the plug-ins without restart (NVDA+Control+F3)
- Fixed browse mode message of Microsoft Azure Image Describer
- The accent color is now represented as NVDA colour descriptions.
- Improved result format of Microsoft Azure Image Analyser
- Improve document according to review comments
- Fixed gesture inconsistency.
- Control+Shift+NVDA for clipboard while NVDA+ALT for navigator object
- Fix missing imageInfo error while recognizing.
- Added image description capability
- Change addon summary to online image describer
- Fix error using user's own API key in sougou API.
- Fix unknown panel issue by adding settings to supportedSettings
- Fix double press gesture no effect issue.
- Revised document to reflect changes in code.
- Clarified what kind of clipboard image is supported and how to copy image for recognition.
- Fixed the clipboard recognition cannot open result viewer issue.
- Added support to recognize copied local image file path in clipboard.
- Added friendly notice if recognition result is empty.
- Fixed another place do not work well with non ascii config path
- Added proxy settings for people with access of Internet behind a specific proxy.
- Added several general options.
- Fix Unicode decode error due to sending Unicode URL to urllib3.
- Fix Unicode error if OCR engine upload image file directly instead of base64 encode.
- Change gesture of recognizing clipboard to Control+Shift+NVDA+R since NVDA+Shift+R is used in Word and Excel to define row headers in tables, or to delete the definitions when pressed twice.
- Fix installation error when config path contains non-ascii characters
- Change gesture to avoid collision with golden cursor.
- Change default engine to Microsoft azure because it can detect text language automatically.
- Add detail documentation on how to get API key of Microsoft Azure OCR
- Fix issue about new installation.
- Removed auto OCR since this feature is problematic and may confuse with online engines. Auto OCR will be a separate addon, when it is stable enough.