Smooth experience! I loved the details such as the assistant getting a bit annoyed when you go to the vending machine for a drink or “I regret to inform you…” when you try to use the internet terminal on board.
Yes, most Chinese characters are phono-semantic compounds.[0] However, this makes most sense in the original Old Chinese[1] phonology, where the forms were taking place.
For example,
偒 was pronounced /*l̥ʰaːŋʔ/ and 陽 was pronounced /*laŋ/, but the modern pronunciations are tǎng (/tʰɑŋ²¹⁴/) and yáng (/jɑŋ³⁵/) respectively. So the phonetic part 昜 /*laŋ/ no longer consistently represents that sound, although in this case the final -aŋ is still present.
And as for sounds that were present in Old Chinese but not in Middle Chinese and Mandarin, like [2] 巽 was pronounced /*sqʰuːns/, now xùn (/ɕyn⁵¹/), they underwent a series of regular sound shifts that make them sound quite different when used in characters in Mandarin.
Also, Old Chinese was not a tonal language, tones first appeared in Middle Chinese, which the modern system derives from (with changes). Tones never had a chance to appear in writing.
In this example I can’t imagine anyone preferring the second style, but there are cases where it’s nicer. For example compare the tacit:
foo = h . g . f
With the more verbose:
foo x =
let
a = f x
b = g a
c = h b
in c
If a, b, and c have useful names that help you understand the code then the second function might be preferable- but in a lot of cases all the intermediate variables are just adding noise and making it harder to see what’s happening at a glance. The tacit example makes it very clear at a quick glance exactly what’s happening.
My personal rule of thumb is that if you are passing combinators in as arguments to other combinators then you should probably stop, but straightforward chaining is usually okay.
tacit programming means you don't use argument names to direct your data to the desired output. what's interesting to me about that is the unexplored possibilities of how data could be directed without names.
Probably protects you against a decent amount of tracking, but there are numerous markers still that don’t even rely on cookies, and when used together can be correlated back to your profile. For instance, IP address, user agent OS string, timezone, hardware capabilities via the browser, etc.
Interested to see how it performs for Mandarin Chinese speech synthesis, especially with prosody and emotion. The highest quality open source model I've seen so far is EmotiVoice[0], which I've made a CLI wrapper around to generate audio for flashcards.[1] For EmotiVoice, you can apparently also clone your own voice with a GPU, but I have not tested this.[2]
Hi, WhisperSpeech dev here, we only support Polish and English at the moment but we just finished doing some inference optimizations and are looking to add more languages.
What we seem to need is high-quality speech recordings in any language (audiobooks are great) and some recordings for each target language which can be low-quality but need varied prosody/emotions (otherwise everything we generate will sound like an audiobook).
Last I checked, LibriVox had about 11 hours of Mandarin audiobooks and Common Voice has 234 validated hours of "Chinese (China)" (probably corresponding to Mandarin as spoken on the mainland paired with text in Simplified characters, but who knows) and 77 validated hours of "Chinese (Taiwan)" (probably Taiwanese Mandarin paired with Traditional characters).
Not sure whether that's enough data for you. (If you need paired text for the LibriVox audiobooks, I can provide you with versions where I "fixed" the original text to match the audiobook content e.g. when someone skipped a line.)
Librivox seems like a great source, being public domain, though the quality is highly variable.
I can recommend Elizabeth Klett as a good narrator. I've sampled her recordings of Jane Austen books Emma, pride and prejudice, and sense and sensibility.
For Polish I have around 700hr. I suspect that we will need less hours if we add more languages since they do overlap to some extent.
Fixed transcripts would be nice although we need to align them with the audio really precisely (we cut the audio into 30 second chunks and we pretty much need to have the exact text in every chunk). It seems this can be solved with forced alignment algorithms but I have not dived into that yet.
E.g. for the True Story of Ah Q https://github.com/Yorwba/LiteratureForEyesAndEars/tree/mast... .align.json is my homegrown alignment format, .srt are standard subtitles, .txt is the text, but note that in some places I have [[original text||what it is pronounced as]] annotations to make the forced alignment work better. (E.g. the "." in LibriVox.org, pronounced as 點 "diǎn" in Mandarin.) Oh, and cmn-Hans is the same thing transliterated into Simplified Chinese.
Just listened to the demo voices for EmotiVoice and WhisperSpeech. I think WhisperSpeech edges out EmotiVoice. EmotiVoice sounds like it was trained on English spoken by non-native speakers.
Not OP, but I develop Mochi [0] which is a spaced repetition flash card app that has text-to-speech and a bunch of other stuff built in (transcription, dictionaries, etc.) that you might be interested in.
I’ve been using FSRS for 3 months and it’s finally resolved some of my pain points about having to trial-and-error adjust the old SM2 scheduling algorithm, since the content of each deck can greatly affect what the optimal retention is. Now you can just retrain the weights for each deck you have every few months and it will adapt appropriately. The paper[0] is also definitely worth reading if you want to see some rigorous analysis of large-scale real-world spaced repetition science.
Because of the extensive benchmarking most people probably will not benefit from refitting the weights to their collection until they have thousands of reviews (author recommends 1k+).
Note it still works fine even if you do your cards late, since the recall probabilities are based on
the stability and when you last reviewed the card, and the stability will update a bit longer if you somehow managed to still recall a card after the due date.
"...our very expensive grinder." To save you the effort: that's about $4300 of coffee grinder, in that video. (Weber EG-1, Black. White is $400 "cheaper.")
Is there a high quality speech synthesizer (ideally local) for Mandarin you have found? There are some subtleties with tone sandhi rules and how they interact with prosody that I feel are lacking with current TTS voices I’ve tried.
I love the idea of LLMs being super-efficient language tutors. And you have a good point; coming soon: "We've been getting a lot of these tourists here lately, they're eerily fluent, but all seem to have the same minor speech impediment" (read: messed-up weights in a commonly used speech model).
I've been using ChatGPT 4 to translate and explain various texts in Mandarin and it's been very on point (checking with native speakers from time to time, or internet searches). As expected, it has trouble with slang and cross-language loanwords from time to time. However for languages with much lower information online, it hallucinates like crazy.
> coming soon: "We've been getting a lot of these tourists here lately, they're eerily fluent, but all seem to have the same minor speech impediment"
Haha, if that were to pass, that would still be a far better outcome than our current situation of completely blind machine translation (this is especially for various Asian languages that are very sensitive to phrasing) and mispronunciation by non-native speakers.
Kind of, Accents are typically derived from the intersection of natural languages, specifically which ones you learned the phonetics of first. (With the exception of the Mid-Atlantic accent...)
This would be something quite novel as the speech irregularities would not have their origin in people
I don't know what you would call it but it needs at least some adjective before accent to differentiate it IMO
I don't have the expertise to judge the quality of Mandarin pronunciation myself, being a beginner. But it sounds OK in English and it's made by native Mandarin speakers in China so I expect that it sounds better in Mandarin than English.
the azure neural tts voices in chinese are the best i’ve heard, specifically the “xiaochen” voice. i use it in anki daily to generate sentences for my mandarin decks with an api key/plugin. it’s not something you run locally of course, but they have a decent enough free tier.
i’m hoping a voice as realistic as this becomes a local app soon, but i’ve not found anything that’s nearly as natural sounding yet. (also, honorable mention to chatgpt’s “sky.” she pronounces mandarin with a funnily american accent, but it sounds natural and not as robotic as the open-source alternatives i’ve tried)
[0] https://downloads.reactivemicro.com/Electronics/Reverse%20En...
[1] https://cloud.siraben.dev/s/z9GTFfjDDgGXHSQ