Online rozpoznávanie obrázkov a textu

Autor: Larry Wang
NVDA compatibility: from 2018.3 to 2020.2
Stiahnuť Vývojovú verziu

This addon aims at adding online image recognition engines to NVDA.

There are two types of engines. OCR and image describer.

OCR extract text from image.

Image describer describe visual features in image in text form, such as general description, color type landmarks and so on.

Internet connection is required to use this addon, since image describe services are provided by API endpoints on the Internet.

They are called engines in this addon.

There are three types of engine for this addon.

Online OCR engine
Online image describer engine
Windows 10 OCR engine (offline)

You also need to choose the source of recognition image.

Current navigator object
Current foreground window
The whole screen
Image data or file from clipboard
Image file pathname or image url from clipboard

Klávesové skratky

After choosing these types, you can start recognition with one gesture.

NVDA+Alt+P Perform recognize according to source and engine type setting, Then read result. If pressed twice, open a virtual result document.

There are four additional gestures left unassigned. Please assign them before using.

Cycle through different recognition engine types.

Cycle through different recognition source types.

Cancel current recognition

This gesture can be useful if you think you have waited for too long and want to cancel.

Also sometimes you do not want to be disturbed by recognition message because you need to review some messages arrived after recognition start.

Show previous result in a virtual result document.

Though there is a feature to copy result to clipboard. Character position information cannot be preserved, so this gesture is added to solve this problem.

There are also four old gestures are left unassigned for users who prefer gestures in previous versions.

It is recommended to use new gesture and switch engine type according to your need.

Recognize current navigator object with online OCR engine Then read result. If pressed twice, open a virtual result document.

Recognizes image in clipboard with online OCR engine. Then read result. If pressed twice, open a virtual result document.

Recognize current navigator object Then read result. If pressed twice, open a virtual result document.

Recognizes image in clipboard . Then read result. If pressed twice, open a virtual result document.

Nastavenia služieb

Služby môžete nastavovať v strome nasstavení NVDA v časti online rozpoznávanie obrázkov a textu.

The author of addon have registered account with free API quota and set up a proxy server on www.nvdacn.com to make this addon easier to test at first. Test quota is limited and may be cancelled by API provider anytime.

It is highly recommended to register your own key according to guide in each engine.

Nasledujúce nastavenia sú platné pre všetky služby.

Copy recognition result to the clipboard: if enabled, recognition result text will be copied to clipboard after recognition.
Use browseable message for text result: if enabled, recognition result text will be shown in a popup window instead of speech or braille message.
Swap the effect of repeated gesture with none repeated ones: by default, a virtual result document is shown only if you press the corresponding gesture twice, if you use that frequently you can enable this option so that you only need to press once to get a result viewer.
Enable more verbose logging for debug purposes: some logs are essential for debugging but affects performance and takes up a lot of space. Only turn this on if specifically instructed to by the addon author or an NVDA developer.
Proxy type: which type of proxy you are using. If you do not know what a proxy is just leave it as is.
Proxy address: full URL of your proxy. If you do not know what a proxy is just leave it as is. If you choose to use proxy your proxy will be verified before saving , after verification, there will be a prompt to tell you result.

The following settings means the same in all engines, describe them here to save space.

API Access Type: this controls how you get access to the corresponding API endpoints.
- If you choose "Use public quota", you are using free quota in an account registered by addon author.
- If you choose "Use your own API key", this addon will use quota from your own account.
APP ID, API key or API Secret Key: if you want to use quota from your own account corresponding access tokens is required. Some engines only need API key. Some engines require two tokens. These are only valid if you choose "use your own API key" in API Access type.

Note that the quality and accuracy of results are affected by many factors.

Spôsob spracovania použitý pri konkrétnej službe
Kvalita nahratého obrázka
Prekrytie navigačného objektu iným objektom
Rozlíšenie obrazovky

Rozpoznávanie obrázkov

Dostupné sú tieto služby.

Microsoft Azure Image Analyser

This engine extracts a rich set of visual features based on the image content.

This engine is english only. If you want description in other languages, you can use Microsoft Azure Image Describer

Rozpoznáva tieto vlastnosti obrázkov:

Obsah pre dospelých: Určuje, či je na obrázku pornografický materiál (nahota a sexuálne akty).
Značka: Rozpoznáva rôzne firmy a spoločnosti a poskytuje ich adresy (len v Angličtine).
Kategória: Určí kategóriu obrázka. Kategorizácia je popísaná v dokumentácii k službe.
Farba: Určí, aké farby sú na obrázku dominantné, tiež farbu pozadia a či je obrázok čiernobiely.
Popis: Celými vetami popíše obsah obrázka.
Rozpoznanie tváre: Rozpozná postavy, ich pohlavie a vek.
Typ obrázka: Určí, či ide o maľbu alebo fotografiu.
Objekty: Popíše objekty na obrázku a ich umiestnenie (len anglicky).
Značky: Označí obrázok kľúčovými slovami.

V niektorých prípadoch poskytuje aj podrobnejšie detaily:

Celebrities - identifies celebrities if detected in the image.
Pamätihodnosti a orientačné body: Rozpozná známe miesta a objekty.

Popisovač obrázkov Microsoft Azure Image describer

This engine generates a description of an image in human readable language with complete sentences. The description is based on a collection of content tags, which are also returned by the operation.

More than one description can be generated for each image. Descriptions are ordered by their confidence score.

There are two settings for this engine.

Language: the language in which the service will return a description of the image. English by default.
Maximum Candidates: maximum number of candidate descriptions to be returned. The default is 1.

Rozpoznávanie textu Online OCR

Dostupné sú tieto služby.

https://www.nvdacn.com

https://ocr.space/ocrapi

https://azure.microsoft.com/en-us/services/cognitive-services/

http://ai.qq.com

http://ai.baidu.com

http://ai.sogou.com/

https://intl.cloud.tencent.com

Dostupné služby

Dostupné sú tieto služby.

Tencent Cloud OCR

This API is sponsored by Tencent Cloud and Aceessibility Research Association, with a quota of 15000 per day.

This engine support 19 languages.

Chinese-English mix
Japanese
Korean
Španielčina
French
German
Portuguese
Vietnamese
Malay
Russian
Italian
Dutch
Švédčina
Finnish
Danish
Nórčina
Hungarian
Thai
Latin

Here is the settings of this engine.

Jazyk (language): Jazyk rozpoznávaného dokumentu. Predvolene je nastavené na automatickú detekciu.

OCR space

This one is a paid API with free quota provided by OCR Space

It supports 24 languages

Arabčina
Bulgarian
Chinese(Simplified)
Chinese(Traditional)
Croatian
Czech
Danish
Dutch
English
Finnish
French
German
Greek
Hungarian
Korean
Italian
Japanese
Polish
Portuguese
Russian
Slovenian
Španielčina
Švédčina
Turečtina

Dostupné sú tieto nastavenia:

Language: text language for recognition. English by default.
Detect image orientation: if set to true, the API autorotates the image correctly.
Scale image for better quality: if set to true, the API does some internal upscaling. This can improve the OCR result significantly, especially for low-resolution PDF scans.
Optimize for table recognition: if set to true, the OCR logic makes sure that the parsed text result is always returned line by line. This switch is recommended for table OCR, receipt OCR, invoice processing and all other type of input documents that have a table like structure.

Ak si vytvoríte účet v tejto službe, je potrebné zadať vlastný kľúč.

You can get your own free API key by registering onOCR space

Postup:

Find the link "Register for free API key"

Click on it and you will find a form to fill in.

The form asks you to enter the following data

Email Address
Krstné meno (first name)
Priezvisko (last name)
How do you plan to use the OCR API?

After filling it and submit. You may also need to pass a captcha

Then you will receive a confirmation e-mail

Find the link named "Yes, subscribe me to this list." in that e-mail. Access that link and you will receive API key by e-mail soon.

Microsoft Azure OCR

Používa technológiu od Microsoft Azure Cognitive Services Computer Vision.

It supports 24 languages including

Čínština (zjednodušená)
Čínština (tradičná)
Czech
Danish
Dutch
English
Finnish
French
German
Greek
Hungarian
Italian
Japanese
Korean
Nórčina
Polish
Portuguese
Russian
Španielčina
Švédčina
Turečtina
Arabčina
Rumúnčina
Srbčina (Cyrillika)
Srbčina (latinka)
Slovenčina

Dostupné sú tieto nastavenia:

Language: text language for recognition. Auto detection by default.
Detect image orientation: if set to true, the API autorotates the image correctly.

If you use your own key, you should get a subscription key for using Microsoft Computer Vision API from the link below:

Step 1: Create an account on Azure website

Kľúč je potrebné vytvoriť pre Computer Vision API. Preto hľadajte tlačidlo získať kľúč (GET API key). V súčasnosti je možné získať 7-dňový skúšobný kľúč zdarma. Je tiež možné vytvoriť azure účet zdarma, kde máte ďalšie dni zdarma. Registrácia vyžaduje kreditnú kartu.

Druhý krok je časť nasadenia služby (Deploy Cognitive Services)

Now you have an azure account.

Najprv sa prihláste na Azure Portály

Počkajte na zobrazenie správy Portál je pripravený, ste prihlásený (Portal is Ready you are logged into azure portal).

Nájdite a aktivujte odkaz Všetky zdroje (All resources). Nachádza sa pod tlačidlom Všetky služby (All services).

Počkajte na právu Všetky zdroje sú pripravené (All resources are ready). Mali by ste sa nachádzať v editačnom poli. Skratkou shift+tab sa vráďte do zoznamu, vyberte a aktivujte položku pridať (add).

Počkajte na správu vyhľadávanie (Search the Marketplace). Zadajte Cognitive Services. Stlačte šípku dole.

Počkajte na správu Cognitive Services 1 z 5 a potvrďte výber klávesom enter.

Počkajte na správu služba pripravená (Cognitive Services is ready). Nájdite a aktivujte tlačidlo vytvoriť (create).

Wait until you get the message Blade Create is ready, your focus will be an edit box, type a name for this resource. Note that Your resource name can only include alphanumeric characters, '', '-', and can't end with '' or '-'.

I choose NVDA_OCR.

Press tab to go to Subscription combo box. Usually you can leave it as is.

Press tab to go to Location combo box. Choose one close to your current location.

Be sure to remember this since location is required in engine configuration.

Press tab to go to Pricing tie combo box. Usually a free tie like F0 is adequate. If that is not enough you can choose other tier after reading full pricing details in View full pricing details link.

Press tab to go to Create new Resource group edit box. You should create one if you do not have any Resource group. Press tab find Create new button.

Then press tab go to Create Button to create this resource.

Počkajte na správu pripravené (Deployment succeeded).

Následne hľadajte a aktivujte tlačidlo zdroj (resource). Ak ho neviete nájsť, skúste najprv aktivovať tlačidlo upozornenia (notifications).

Počkajte na správu rýchly štart zaneprázdnené (Quick Start is busy).

Hľadajte a aktivujte odkaz kľúče (keys).

Počkajte na správu kľúče pripravené (Manage keys is ready).

Find edit box named key 1 or key 2. The content of that edit box is the API key required in engine configuration. Press Ctrl-C to copy it for engine configuration

Then you can fill in these two settings required if you use your own API key.

Azure resource Region: the region you choose when deploying Cognitive Services in Azure Portal.
API key: the key you get after successfully deploying Cognitive Services in Azure Portal, KEY 2 is recommended.

Baidu OCR

This one is also a paid API with free quota provided by Baidu.

Baidu OCR supports 10 languages including

Kombinácia Angličtiny a Čínštiny
English
Portuguese
French
German
Italian
Španielčina
Russian
Japanese
Korean

Dokáže rozpoznať pozíciu každého znaku samostatne

Dostupné sú tieto nastavenia:

Rozpoznanie presnej pozície znaku umožňuje presné rozpoznanie napríklad v oknách neprístupných aplikácií. Aktivovanie tejto možnosti spôsobí ale tiež spomalenie rozpoznávania.
Use Accurate API: if is enabled will use a different endpoint. That accurate endpoint takes longer time but has higher quality and (If you use your own API key its price is also higher).

K dispozícii sú 4 prístupy zdarma.

Basic OCR without any information about text location. Currently 50000 times a day.
Basic OCR with information about text location. Currently 500 times a day.
Accurate OCR without any information about text location. Currently 500 times a day.
Accurate with information about text location. Currently 50 times a day.

If you press the gesture which only read result, you are using endpoints without any information about text location.

If you press the gesture which shows an result viewer, you are using endpoints with information about text location.

Hoci ponúka bezplatné služby, stránka je len v Čínštine a nie je veľmi prístupná.

Tencent AI OCR

This API is free to use with frequency limit about two query per second.

If you want to bypass the limit you can register your own API key. The website of this API is Chinese only and not quite accessible.

There is no information about language support in the document. According to my test Chinese and English and their mixture is supported.

There is no additional configuration for this API.

Zoznam zmien

0.19

Compatible with NVDA 2020.2
Add Tencent Cloud OCR engine sponsored by Tencent Cloud and Aceessibility Research Association
Removed unavailable Sougou OCR and Machine Learning Engine by Oliver Edholm.
Fix public endpoint on NVDA China Site

0.18

Compatible with python3
Introduce the concept of recognition source type and engine type to reduce gesture usage.
Add a new unassigned gesture to cycle through different recognition source types.
Add a new unassigned gesture to cycle through different recognition engine types.
Add a new gesture to recognize according to image source and engine type setting.
Add a new unassigned gesture to show previous result in a virtual result document.

0.17

Opravené nasledujúce problémy:
- Jump directly to panel when switch to onlineImageDescriber in settings dialog
- Fix wrong description in azure analyzer

0.16

Pridaná možnosť zrušiť rozpoznávanie
Opravené nasledujúce problémy:
- Opravené oznamovanie začiarkávacích políčok
- Swap the effect of repeated gesture not working in online image describer

0.15

Add an option to pop up a window containing message instead of speech or braille message for text results
Začiarkávacie políčka pre nastavenia Microsoft Azure prerobené na zoznam s políčkami.
Opravené chyby:
- Cannot load jpg image file from clipboard
- Po rozpoznaní sa výsledok naozaj zobrazí v režime prehliadania.
- Pozície rozpoznaného textu sú presné, aj ak došlo k internej zmene mierky.
- Výsledok z Microsoft Azure viac nie je celý na jednom riadku.

0.14

Drobné opravy:
- Cannot use your own API key in Microsoft Azure engines
- Výsledok sa odteraz zobrazí aj v prípade, že je pripojený braillovský riadok

0.13

Doplnok funguje správne aj po opätovnom načítaní modulo (NVDA+ctrl+F3)

0.12

Opravená chyba v režime prehliadania pre Microsoft Azure
Farby sú interpretované tak ako NVDA zvyčajne interpretuje farby.
Zlepšený formát výsledku z popisovača obrázkov Microsoft Azure
Upravená dokumentácia
Upravené klávesové skratky.
Control+Shift+NVDA for clipboard while NVDA+ALT for navigator object
Oopravená chyba pri spracovaní obrázkov.

0.11

Pridaná možnosť rozpoznávať obrázky
Change addon summary to online image describer

0.10

Fix error using user's own API key in sougou API.
Fix unknown panel issue by adding settings to supportedSettings

0.9

Opravená chyba s nefunkčným dvojitým stlačení klávesovej skratky
Upravená dokumentácia.
Clarified what kind of clipboard image is supported and how to copy image for recognition.
Opravené otváranie výsledku v režime prehliadania pri rozpoznávaní obrázka zo schránky.
Added support to recognize copied local image file path in clipboard.

0.8

Pridané upozornenie, ak výsledok neobsahuje text.
Fixed another place do not work well with non ascii config path

0.6

Added proxy settings for people with access of Internet behind a specific proxy.
Pridané všeobecné nastavenia.
Opravené problémy s posielaním znakov unicode na urllib3.

0.5

Opravené problémy so znakmi unicode ak sa posielal namiesto base64 priamo obrázok.
Change gesture of recognizing clipboard to Control+Shift+NVDA+R since NVDA+Shift+R is used in Word and Excel to define row headers in tables, or to delete the definitions when pressed twice.

0.4

Opravená chyba, ktorá spôsobovala nemožnosť nainštalovať doplnok, ak boli v ceste k dátam používateľa znaky mimo ascii rozsahu
Opravené konfliktné klávesové skratky s doplnkom Pokročilý kurzor myši.
Predvolene sa používa služba Microsoft azure, lebo vie automaticky rozpoznať jazyk.

0.3

Pridané vysvetlenie, ako získať kľúč Microsoft azure.
Opravené chyby pri inštalácii.
Removed auto OCR since this feature is problematic and may confuse with online engines. Auto OCR will be a separate addon, when it is stable enough.