Big news - exciting times - finally, finally we're moving on other languages

mary-8 · June 14, 2024, 1:41pm

I am also interested to know that, as Southern Welsh seems closer to Cornish, which I already learned through you guys.

Brew · June 15, 2024, 2:39pm

Arabic would be brilliant.
What dialect would you choose? Levant (Lebanon/Syria/Jordan/Palestine) is popular because a lot of TV is in this dialect also Egyptian is well known.
Hopefully don’t choose Fusha/Modern Standard Arabic (Quran Arabic) because it’s only used by TV news/politics.

LiamS · June 20, 2024, 10:34am

A really exciting development. SSIW really is the most effective method I’ve come across. Hope the French course is released very soon as I need to engage in some intense study for work!

aran · June 24, 2024, 12:58pm

Yes - we’re just working on how to extract the latest recordings to the new app - we’ve had some misunderstandings in the past about how much content was available in AutoMagic (it was always meant to be everything) and we are now prioritising getting up to Level 3 for north and south available as soon as possible after the app launches - which we’re reasonably confident will be less than a month.

Well done on being able to understand that very fast natural usage, @SteakAndEggs! It’s going to be exciting when we launch Japanese - and we hope that it won’t be too long (maybe end of the year) before we’ll be able to make the Japanese (and the other languages) available to learn through the medium of Welsh, which will be an interesting extra challenge for advanced Welsh learners

We’ll want to offer multiple dialects in due course but we’ll probably start with Egyptian as being the long term most familiar TV standard

This is very much ‘watch this space’ territory - hoping to have the content fully available in the new app by next week, which means we’d be available to make it available to testers in advance of the app launch

ellie-holden · July 3, 2024, 11:53am

I keep checking back on this thread to see if there’s any news. Can’t wait to start German with my kiddos!

dougmorgan6 · July 3, 2024, 4:44pm

Ha, same here. I’m been waiting for level 3 southern to be on Automagic since January and may well hit some French after that comes on. Exciting times

aran · July 3, 2024, 5:32pm

It’s possible that migrating the remaining Welsh levels will happen very shortly after going live with the new courses - the database migration involved is a bit hairy - but the content for the new languages is coming along excellently, and we’re very, very confident it will be ready before the end of July

aran · July 3, 2024, 5:34pm

The German text (apart from intros) is done, and we just have to generate the voices - which now takes about twenty minutes per voice - so we’ll have the content ready to go by the end of next week, and it will be available as soon as the app goes live

mary-8 · July 4, 2024, 7:18am

Just wondering how it’s going with the Italian and other languages. Is the app up?

verity-davey · July 4, 2024, 12:13pm

Another curious inquiry: up to what level will the other languages’ materials go? Comparable to current Welsh level 2, or 3?
Have you considered stating a rough CEFR equivalency, since a lot of language learners will want to use that or some other grading system to measure fluency?

aran · July 4, 2024, 5:07pm

We’re not launched, but still on target for a July launch

The app is currently in test flight (which means it’s in the app stores, but not visible/available) - we’re currently hoping that we will be able to integrate our existing learner database before launch (so that the new app will know where you are in the course) - this is a little fiddly, so if necessary we’ll launch without integration and fix it as soon as possible.

In terms of levels, the new languages will actually have considerably more material than the old courses - probably more than 100 hours of learning time for most students - and once we’re launched and stable, we’ll be looking to make some hefty additions to our content for Welsh.

It’s very difficult to map our users to CEFR levels without actually doing an exam - we can’t just say ‘you’ve done Level 1 so now you’re B1’ - but we do have some ideas for possible solutions, and we’ll be working on them for the next iteration of the app

M2017 · July 4, 2024, 7:42pm

This is excellent news.

I’d disagree here. You are selling yourself short.

CEFR levels are poorly used and quoted in the language teaching world. The way they’re defined is all being able to handle certain situations, and I’d argue the difference between levels is sometimes a matter of confidence in those situations. They don’t prescribe any particular ability to use or understand “grammar”. I don’t think it really prescribes particular vocab either, though I’d take the argument it might be necessary early on where the definition is a bit more restrictive.

I think SSi fits the model better than most courses. You know the content the users will know by a certain point, and you know SSi users will know it well. With the app based courses you even know exactly what they have learnt well or are struggling with. I think you can have a lot more confidence in mapping to CEFR than most course books, where sadly they’re often just marketing.

verity-davey · July 4, 2024, 8:20pm

This sounds exciting. Presumably human involvement again since online translations / apps are somewhat lacking when it comes to Welsh? What might this look like? More advanced content, more dialect variations, more…?

tatjana · July 5, 2024, 12:19pm

Does this mean you’d use TTS generated voices and not real human recordings? Having experiences with doing things (like learning, reading etc) with generated voices, this sounds a bit so-so to me or adjustment of the speach has to be really precisely refined … .

aran · July 6, 2024, 10:11am

I think you’re right (and thank you!) in terms of where we fit CEFR fairly neatly - where we know we’re not ready yet is in terms of the significant variance between SSi learners who’ve been engaging in conversation versus those who haven’t.

A Level 1 learner who has regular conversations can be ahead of a Level 2 learner who doesn’t - but there are clues to these profiles in the nature of the response to the prompts - so to be confident in our predictions, we need to be able to measure learner responses in ways that we’re not currently doing.

I do think we’ll get there in 2025, though

@verity-davey the main thing I’d like to focus on would be adding a considerable amount of content - we’ll certainly need a higher level of human involvement, but the best tools are now producing very encouraging consistency in translation, so we’ll probably only need proofreading and internal consistency with previous vocabulary decisions.

Dialectical variations is something I think is best solved as part of our listening work, and we still need to build an in-app solution for listening work - something else we’ll be focusing on in 2025.

@tatjana we’ll be using TTS generated voices wherever we think they are indistinguishable from recorded humans (which is for quite a few languages now, in comparatively short phrases such as the ones we need). We may also use TTS to add a lot of courses quickly as interface languages - if someone already knows a language and wants to learn through the medium of that language, we can tolerate the models being slightly more overtly machine generated as long as they’re clear and easy to understand

owainlurch-1 · July 6, 2024, 10:16am

I’ve been producing audiobooks for a little while, and AI voices are heading towards taking over a lot of the production. They are getting impressive now, but you can always tell them. The intonation is often off. And even when it isn’t, it is never as good- never as “real” as a real person. (Unless of course, that person is not good themselves at reading out loud!) They can be as good as someone not particularly good at reading out loud. They can even be better than someone who is terrible at it. But yeah, to me, having natural speakers saying the words, in a natural intonation is the thing - being able to rely on the fact that the sound and intonation of the voice is something you can rely on as being from a natural speaker. But AI voices are cheaper of course, and can turn things out much quicker. Hence the way they are taking over a lot of jobs in the narration world. It is probably going to be at some point in the future the case that people needing voices will only pay out to real people for the real quality stuff. Whether that is necessary in a course for learning languages, I couldn’t say. I would think it would be.

owainlurch-1 · July 6, 2024, 10:18am

Can you tell me where these recorded voices which are indistinguishable from human voices are being produced? As someone who produces audiobooks and is being taken over by them, that’s worrying! But I haven’t seen any sign of them - and I would have thought they would have popped up pretty toute suite after being developed! Here’s hoping they aren’t, but good to know if they are about!

aran · July 6, 2024, 10:21am

The most impressive we’ve worked with so far have been from ElevenLabs. But audiobooks are a different kettle of fish to the comparatively short phrases we use, of course, so your mileage may vary etc.

owainlurch-1 · July 6, 2024, 10:27am

About them being indistinguishable? No, I don’t think so. Books are made up of many short phrases, it’s not a different kettle of fish at all- and natural intonation as well as basic pronunciation is important when learning languages. But yes, I always put more importance on having access to natural speakers and natural phrasing and intonation through SSiW than the rest of the SSiW method, as it were. So I think that is where our mileage varies. If you think the rest of the SSiW method is more important than access to natural speakers to such an extent that you can use AI voices as a way of producing them, then that’s a logical decision, albeit one over which our mileage differs.

tatjana · July 6, 2024, 11:36am

I’d agree with @owainlurch-1 here or I have too precise and refined ears for this all. For me almost any artificial voice is distinguishable and a lot of them are just copies of one and only voice of those IVONA has produced in the first place and were for a long time one of the best of the World. I’m not a true Welsh speaker (yet) but if I take (for example) IVONA’s Geraint and Gwynedd (I think I’ve spelled this right), they’re a work in progress to me although they’ve been finished long ago.

Maybe the speech itself even isn’t so much disturbing as it is intonation of the read content even if there are short phrases.

Well, the most surprise was to me that you’d go with generated voices because other companies - Memrise for example - went back to native speakers to produce courses and vocabularies for various languages. Even English/American voices, which are the most refined and ready to be used, are very distinguishable from the real human voices - well, at least to me.

Many times I give the TTS to read something in Welsh (and sometimes even in English) at the end I find myself to still want to read that all same text by myself again because I’ve found out that I actually didn’t understand well enough what has been read at all.

So, the artificial voices which you’d use, should be really fine-tuned. But it is still better to me to hear some human voice with some accent which gives you the gist of the language instead of sterile clear TTS voice which even for that reason alone sounds so wrong …

Well, this is just my opinion though and I surely don’t want the all announced new things would go down for that reason and regardless what I’ve said in all those messages, I am curious what this all brings and will surely try some of those languages if not from the other thing than out of curiousity. Maybe I’d like Automagic way of learning better starting a new language than I like it at the moment with Welsh. …

So … dal ati!