Is any one aware of any OCR freeware for Windows that understands Welsh orthography?
Diolch o flaen llaw.
Is any one aware of any OCR freeware for Windows that understands Welsh orthography?
Diolch o flaen llaw.
FWIW, Iāve ended up using this āFreeOCRā, at least for the moment:
http://www.paperfile.net/lang.html
I had used it before for German, so at least itās familiar. As distributed, it does not support Welsh. In theory, you can add configuration files for other languages. I found one for Welsh on Github, put it into the appropriate folder and reloaded the s/w, but it would not work at all.
For now, I am compromising, using the language setting for French, as it understands most of the to bachs that way, although it makes a mess of some other things.
Would love to help, but Iām not even sure what an OCR is :S
Optical character reader. So a computer can read stuff (printed or handwritten) through an image provided by a photo or scanner.
I didnāt even know they came in different languages (though it makes complete sense now mikeellwood mentions it!) so Iām of absolutely no help either!
Although I worked in IT for years, I didnāt know much about OCR, having rarely had a need to use it, although I did start using it in semi-earnest a few years ago, without knowing much about the āinnardsā.
However, even just looking at western European languages, a lot of them have their own unique special characters, accent marks, etc, and at first I thought it was mainly a question of the software being able to recognise those. However, when I had a (quick) look at what came with the language pack I got from Github for Welsh, even without understanding exactly how it was used, I could see it included a lot of common words. I assume that these can be used by the software to help recognise words when perhaps the scan is not of the best quality (which can easily happen, in spite of best efforts). Itās a pity I couldnāt actually get the Welsh language pack to work with this software. As it is, I think itās a bit of a compromise, and, for example, it very often confuses āwā for āvvā, perhaps because it doesnāt expect to find "w"s in places where it tends to occur in Welsh words.
Well, I can manage with this for now, but it would be nice to find a better solution in the long term.
I was wondering if the people at Bangor Uni had anything, but Iām not finding anything relevant on their website.
I found out Google Docs can do it. Go to Google Drive. In settings, enable Cymraeg as a language you understand. Upload an imagine file, then open it in Goggle Docs.
Wow, I would never have thought of that.
Many thanks!
Iām not very google-docs, etc-oriented, although I do have an account, and can probably get up to speed.
Doing a quick bit of googling suggests it can even recognise the language automatically (well, google translate often can, so this seems resasonable), but if it doesnāt, it wonāt be too much effort to set it manually, Iām sure.
It recognized Welsh automatically for me and the results were pretty good. I did not see an option for selecting manually. Not sure if you have to set Cymraeg as a known language or not. The tutorial I saw said to but it was old.
Itās not working exactly as per the tutorial I have found (also probably old), but it is working!
(e.g. I ticked a box in the settings to ask it to convert to google docs as it was uploaded, but itās not doing that. However, if I right-click on the uploaded .jpg, it gives me the option to open it in google docsā¦it thinks about that for a minute or so, and then voila, itās opened as text in google docs. There is a language setting in the āsettingsā and I have both English and Welsh set, but Iām not even sure if thatās relevant to the OCR-ing. Just for fun, I ran the option to translate, and I still had to select a language for that. (It didnāt make a very good job of the translation either!).
Not quite perfect (the OCR-ing) - a few little mistakes, but itās pretty good - better than the FreeOCR I was using before.
It hasnāt been tested yet with "Å·"or āŵā, but it has picked up the other vowels with to bachs correctly.
So far then, very promising! Diolch!
Thatās how it worked for me too. The example I ran, did have some āŵā in it and they were recognized. But the image was pretty clean coming from a screen cap of a news article.
Do you happen to know if there are unicode characters for the Welsh āffā, āllā and āddā which are strictly speaking single letters in Welsh? (or inputting them in some way or other as single characters, rather than just doubling up the single characters?)
Using the GD OCR, they are coming through as double letters, which is probably fine for my purposes. However, I think at least sometimes with what I was using before, they were actually coming through as single characters. Iāve had a look around, and havenāt seen a way to create those (the welsh.typeit for example, only covers the vowels).
ā¦
Iāve been playing around with the Windows ācharmapā, and not found them in there so far, either.
If anyone has the Windows Welsh keyboard installed, is there a way of inputting them with that?
ā¦
updateā¦managed to get this one using font āWelsh Cambriaā in charmap:
ļ¬
āU + FB00 Latin Small Ligatureā apparently.
(But doesnāt seem to work in ānormalā fonts, at least in notebook, although itās working here).
ā¦
Edit2: I suspect this is something not really worth worrying about. If itās this fiddly getting them in as single characters, I canāt see the average Welsh writer doing anything other than doubling up the single letters! As ever, open to correction / edification though.
I donāt think these would be typically used. Printers often used ligature characters to improve kerning of certain letter combinations such as āfiā. I would imagine many of these are in unicode but would not be used unless you were printing a book.
Yes, thinking about it, Iām sure you are right.
And even in English, things like āthā and āphā are really āsingle lettersā (and at least āthā used to be represented by a single letter in Old English), but they just happen to be represented by double letters in the modern English language (and Welsh, as it happens).
So itās another case of āpaid a phoeniā I think.