Big news - exciting times - finally, finally we're moving on other languages

Regular forum visitors will have seen me answering questions about ‘SSi for other languages’ frequently over the years - usually with a mixture of fluffy optimism and bitter caution.

But things have just changed :slight_smile:

The last time I answered a question about other languages, it was about French, and I said I was very confident that we’d have something in the new app by early May.

I was, as always, wrong.

But…

Middle/end May instead isn’t too bad, for once, is it?!

Two nights ago, I saw a beta French course live in our new app.

That’s exciting, but what it implies is even more exciting. Thanks to our AI whisperer Tom, that French course was automated from start to finish (apart from some of the fiddly stuff around running programs and storing results, which we’re going to get automated in the near future) and required precisely zero input from me.

This means that over the next couple of months, we can confidently expect to publish a new language every week (and we can probably go faster, but I don’t want to claim that until I can see it happening).

We currently have a limit of about 40 languages that we can do in this way, because we’re dependent on the availability of the necessary large language models - but Google have just released a tool that can produce translation for a language if given a grammar book for that language, which is an extraordinary development.

So, our new aim is to have 40 new languages for English and Welsh speakers and English and Welsh available to learn through the medium of 40 languages by the end of this year.

Next year (if not sooner) we’ll start matching up all the language pairs inside that 40 - think French for Spanish speakers, Spanish for French speakers etc.

And then also next year, we’ll be able to start producing beta courses for languages that don’t have robust LLMs - so our long term dream of building a course for every language is now starting to look very achievable.

The only final thing you’ll need to be patient is about when we go live with the new app in the app stores.

We’ve got one final piece of work to do for this - refining the API so that it points to our live customer databases - but everything else is ready.

I’ve got it too wrong too many times to be willing to make a prediction! It could potentially be as little as two weeks - but I think it’s fair to say that everyone on the team is going to be pretty disappointed if we don’t launch the new app before the end of June :crossed_fingers: :slight_smile:

17 Likes

Can you tell us what these 40 languages will be? :grinning:

1 Like

Yes, they’ll be the languages for which OpenAI can currently offer translation, more or less (depending slightly on the availability of really good automated voices).

https://help.openai.com/en/articles/8357869-how-to-change-your-language-setting-in-chatgpt

2 Likes

Does that mean no more recording with actual humans?

3 Likes

Firstly, this is really exciting news :star_struck:
Secondly, while I wait for Irish to get on the OpenAI list (unless I missed it already on there), I might now have to give in and wrestle with the prospect of getting a phone I can actually put apps on (I’m a bit app-phobic tbh) to find out just how much school French and German actually still lurks in the recesses of my brain :sweat_smile:

1 Like

Congratulations. This is really exciting, and probably one the best use of LLMs I can think of. I can’t imagine AI generated content having the… character… that the existing courses do, but perhaps that is the price to pay? There are several languages in the list I’m interested in learning, so please don’t think I’m a party-pooper for being a little cautious and having a few questions:

  1. Will there be speakers of those languages verifying the course content? LLMs are impressive, but they aren’t perfect and it would be awful to use the excellent SSi method to cement nonsense (or worse) into somebody’s brain.
  2. LLMs can only be as good as the content they have been trained on, and that will be much higher for more “dominant” languages. I have doubts that some less common languages will ever be on there. If they are, the quality of output will probably be lower. I hope some of the money this brings in will go towards creating content for those languages the old-fashioned way?
  3. One of the wonderful things about SSi is it teaches speaking how people actually speak. Will that still be the case given the majority of training material for LLMs will be from written sources? (I guess that takes me full circle back to question 1).
7 Likes

Excellent news Aran! I hope Russian is on that pathway somewhere? Sooner rather than later? Hwyl fawr!

2 Likes

Irish is an interesting case, because the automated voices for it are excellent - but we may have to look at paying for translations - we have a bid in with Irish gov at the moment to do this, so fingers crossed…!

2 Likes

The character comes partly from the inline commentary (which will be the same) and partly (in the older courses) from the ability to segue from the prompt to the commentary, which isn’t possible with a more automated approach - but when we’ve got the hang of producing a lot of new courses, we’ll be able to take a little time to think about how to make sure they’re entertaining as well :slightly_smiling_face:

We will be checking content, yes - everything we’ve checked so far has been flawless, but we expect more hiccups as we move outside the dominant languages.

And yes, the aim is to build a course for every language - which will mean paying for translation or opening up to crowdsourcing for languages that don’t have LLMs or useable grammar models. This will of course be dependent on turnover and/or partnership possibilities with governments etc…

And on how natural the languages will be, so far OpenAI has been excellent at responding to requests to use colloquial forms, so that seems quite promising at this stage :slightly_smiling_face:

4 Likes

Where the voices models are good enough, yes - but we’re expecting to have to pay (where possible) or crowdsource for the less tech supported languages.

And Russian, yes, seems very much to be on that list :slightly_smiling_face:

3 Likes

Oh my goodness, its norwegian and Irish that I am desperate to learn. Been using SaySomething for learning welsh I think that the SaySomething method is really good.

6 Likes

I’ve heard a lot of your fluffy optimism and bitter caution over the years Aran so this is really great news and I’m over the moon for you.

7 Likes

Great news Aran!

I am particularly looking forward to trying the Russian course, a language I have been struggling with using traditional language methods for well over a decade.

I know you’ve probably been asked this before, but any plans for a Breton / Breizh course via the medium of French?

Best wishes,
Kev

3 Likes

Diolch Gruntius! I’ll show you the French taster this afternoon :wink:

Kev - yes, Breton through the medium of French is very much on our target list - I’m not sure how well (or if at all) OpenAI supports Breton, so it’s possible that it will be in the second wave when we have some money to put into translations - but it will be right up at the top of that second wave, because other Celtic languages are very much at the front of our thinking :slightly_smiling_face:

6 Likes

@aran Just popping my head in here to reiterate that I’m still available for anything related to German, like checking translations or evaluating the quality of voice-overs – whatever helps to get German as a source (or target) language off the ground :slight_smile:

5 Likes

Thats amazing Aran. I detest AI and we’re doing a fabulous job of training the monster in waiting (yn fy marn i) .However we cant go back to the dark ages unless we can do something along the lines of the wonderful “The Worlds End”. So that being the case, lets run with the positive response. Exciting times for SSi !!

3 Likes

Sad face

I’m really uncomfortable with the idea of AI too, but if it brings in money for translations and recording with humans for some minority languages, maybe it’s worth it in the end? The tech is definitely impressive.
I even stick to the static mp3 files as I don’t much like the cut-and-paste feel of AutoMagic’s generated sentences… even though all the words were at least recorded by humans! (Btw, are there / were there ever such files for Spanish?)
But, a bit uncomfortable for me or not, it clearly works really well for a lot of people. That’s what we want, at the end of the day.
I’m sure the SSi team will do their very best with whatever’s next.

2 Likes

If you are asking if there are mp3s for Spanish, they’re here. I assume you can access it, but when I did the Spanish course it was a pay-per-course model rather than subscription like Welsh.

Like many I have concerns about AI, all of technical, existential and ethical, but the genie isn’t going back in the bottle. In SSi’s case the use is at least to the overall benefit of society and will help humans better themselves, rather than make themselves obsolete.

4 Likes

As promised, Aran showed me the French taster yesterday and I have to say that it looks and sounds really, really good … I’d like to put everyone who’s mentioned AI at ease, the voices sound COMPLETELY natural. I’ll probably never learn more than Welsh, Spanish and maybe French but the prospect of all that language learning choice at our fingertips is extremely exciting. And the idea that in the future anyone in the world will be able to learn any other language they fancy is incredible. Well done SSi, amazing work.

10 Likes