Skip to content

Latest commit

 

History

History
229 lines (170 loc) · 10.4 KB

File metadata and controls

229 lines (170 loc) · 10.4 KB

Plan

Radical Audio / Pinyin Mismatch (Open Issue)

Problem

Users report that audio for some radical cards does not match the displayed pinyin. For example:

  • (side form of 阜): pinyin "fù" — audio reportedly played "shǎn" (陝).
  • (roof radical): pinyin "mián" — audio reportedly did not match.

Root Causes (Suspected)

  1. Character confusion (阝/陝): The deck data contained 陝 (U+961D, shǎn) instead of 阝 (U+962D, fù). This was a data-entry error: the wrong character was stored. The manifest correctly mapped the stored character to its UTF-8 filename, so the audio matched the stored character — but the card was labeled as 阝 (阜) with pinyin fù. Fix applied: Corrected deck data to use 阝 (U+962D).

  2. TTS mispronunciation of radicals: Edge-TTS (and TTS in general) may mispronounce radicals when they are spoken in isolation. Radicals like 宀, 讠, 氵 are rarely used alone in natural speech; the model may not have reliable pronunciation for them.

Approach Attempted (Not Verified)

A RADICAL_PRONUNCIATION_FALLBACK map was added to generate-audio.py: when generating audio for certain radicals, pass a different character to TTS (e.g. 棉 for 宀, 言 for 讠) that shares the same pinyin. The generated audio is saved under the radical's filename.

Problems With This Approach

  1. Display/audio mismatch: The user sees 宀 but hears 棉. We are playing audio for a different character than the one displayed. Pedagogically questionable.
  2. No verification: We have not listened to original vs. new audio. We assumed TTS mispronounces and that fallbacks fix it — unverified.
  3. Hidden behavior: The fallback map is opaque; the code does not make the substitution visible to readers.
  4. Scope creep: Extended to 11 radicals without evidence that all needed it.

What We Do Not Know

  • Whether edge-tts actually mispronounces these radicals.
  • Whether the fallback audio is correct.
  • Whether the original audio (before fallbacks) was wrong.
  • How to verify audio correctness programmatically (no automated check exists).

Constraints

  • Edge-TTS: Does not support pinyin input for proper Chinese pronunciation. Passing "mián" as text yields Latin-style pronunciation, not Mandarin.
  • Pinyin source: Deck pinyin is curated in deck-data.json; there is no runtime hanzi→pinyin conversion for audio.

Next Steps (To Be Decided)

  1. Listen to original and fallback audio; document which (if any) are actually wrong.
  2. Decide whether to revert RADICAL_PRONUNCIATION_FALLBACK and accept TTS limitations.
  3. Consider alternatives: human recordings for radicals, different TTS/voice, or no audio for unreliable radicals.
  4. If keeping fallbacks: add tests or documentation that make the substitution explicit.

Mnemonic Curation

Design principles, quality rules, and tooling: docs/mnemonic-curation.md

Run after any data edit:

node scripts/audit-mnemonics.js --mode all --fail-on-violations
node scripts/validate-anchor-stories.js
node scripts/test-mnemonic-curation.js
node scripts/test-hint-safety.js

Next Steps

  1. Expand phoneticAnchorAliases for anchors where exact token usage harms sentence quality.
  2. Add narrative quality snapshot fixtures in CI to catch accidental template drift.
  3. Pilot stricter anchor-placement cap (reduce max anchor-at-start from 60% to 40% if quality holds).
  4. Diversify remaining "family anchors" — SHE/ZOO/YOU/SHEER still cover 15–20 syllables each; further per-syllable splits need new anchor words or alias support.
  5. Improve anchor grammar gate heuristic to reduce false positives on natural pronoun/adverb subjects before enabling for vocab (125 false positives at current sensitivity).
  6. Multi-syllable phonetic hints — Add more than one phonetic anchor to stories when pinyin has multiple syllables (e.g. 电脑 → "Think of DEAN, NOW."). See docs/pinyin-multi-syllable-hints-investigation.md.

Priority 2: Cultural Tidbit Quality Expansion

Goal

Improve quote quality/coverage now that tidbits are available across all decks.

Work

  1. Expand tidbit corpus for more radical concepts with concrete semantic overlap.
  2. Curate/adjust relevance tags for better radical matches while preserving precision.
  3. Add spot-check fixtures for additional radicals beyond and .

Acceptance

  1. More radical cards surface high-quality tidbits where overlap is meaningful.
  2. Weak/abstract radicals still return null (no forced matches).

Priority 3: Test/Quality Gates

Goal

Strengthen guardrails for future data updates.

Work

  1. Add stronger mnemonic fixture snapshots for larger HSK1 subsets.
  2. Add CI gate for scripts/audit-mnemonics.js --fail-on-violations.

Acceptance

  1. CI fails on mnemonic audit regressions.
  2. Fixture tests catch narrative-quality regressions early.

Future

  1. Study guidance for combining decks effectively.
  2. Tone imagery pilot (measure recall benefit vs added cognitive load).
  3. Post-lesson "Expand/Continue" option so learners can keep going after the daily queue: offer to start the next lesson immediately (same deck), with an optional small cap on extra new cards; include a simple replay of finished cards without timers.
  4. Quiz mode — see below.

Quiz Mode

Implemented. Quiz mode tests recall with multiple-choice questions, offered after a deck's daily queue is completed via a "Take a quiz" button (no prompt).

  • Study flow: Manual Hard/Medium/Easy ratings replaced with a "Next" button. Study exposes cards (tap to reveal) but does not update SRS.
  • Quiz flow: After deck completion, a "Take a quiz" button starts a multiple-choice quiz over the lesson cards. All answers are multiple choice.
  • Performance-based difficulty: Quiz results drive SRS. Correct → easy; incorrect → hard. No timing; answer is either right or wrong.
  • Self-rating removed: No manual difficulty buttons; quiz performance is the sole signal for scheduling.

Visual Shape Hints for Hanzi

Stories currently carry two hooks: sound (phonetic anchor) and meaning (scene). A third hook — visual shape of the character — could strengthen recall, especially for simple pictographic characters.

Approaches, ranked by feasibility:

Approach Difficulty Coverage Notes
Separate shapeHint field Low ~60-70 simple chars Decoupled from story; UI renders as a small visual note
Shape woven into story (pictographs) Medium ~20-30 chars Tight 12-word budget when story also needs anchor + meaning
Component layout in story Hard ~100+ compound chars Conflicts with design rule: stories must not rely on radical knowledge
Stroke-level narratives Very Hard All chars Essentially a different mnemonic system (Heisig-style)

Recommended starting point: Add an optional shapeHint field to mnemonicData for the 35 radicals, where visual form ≈ meaning (口 is a square = mouth, 人 is two legs = person). Display it below the story without touching story text. Evaluate whether it measurably helps recall before expanding to compound characters.

mnemonicData: {
  soundAnchor: "Think of RUN.",
  story: "A lone traveler breaks into a RUN across the empty bridge.",
  shapeHint: "Two legs, one striding forward",
  components: []
}

Key challenges:

  • Authoring quality shape descriptions requires knowing what each character looks like to a first-time learner — a different skill from writing phonetic stories. LLM generation would need character images or reliable pictographic etymology data.
  • For compound characters, shape hints risk requiring radical knowledge the learner doesn't have yet (violating the existing design rule).
  • For complex/abstract characters, forcing a visual description may hurt more than it helps — leave shapeHint empty for those.

Sentence Deck

Add a fourth deck of full Chinese sentences to bridge the gap between single-word recall and reading comprehension.

Card shape:

{
  "id": "tatoeba-1234",
  "hanzi": "我喜欢西瓜。",
  "pinyin": "wǒ xǐhuān xīguā.",
  "english": "I like watermelon.",
  "audioId": 1234,
  "vocabWords": ["", "喜欢", "西瓜"]
}

Data sources (ranked):

Source License Sentences Audio Notes
Tatoeba CC BY 2.0 FR (some CC0) ~40 k+ Mandarin with English translations 5,814 Mandarin recordings Bulk TSV exports for sentences, translation links, audio IDs, and pinyin transcriptions
LLM-generated Original (no license concern) Unlimited None (syllable fallback or TTS) Full editorial control; can constrain vocab to HSK1/HSK2 words the learner already knows

Recommended approach — hybrid:

  1. Write a build script (like build-phonetic-hints.js) that ingests Tatoeba exports, filters for sentences where all/most characters come from the HSK1 vocab set, and keeps only entries with audio recordings.
  2. Curate ~50–100 sentences into data/sentence-data.json.
  3. Supplement with LLM-generated sentences for vocab words that Tatoeba doesn't cover well.
  4. Sentence audio at runtime: https://tatoeba.org/audio/download/{audioId} (same CDN-at-runtime pattern as hugolpz/audio-cmn). Fallback to syllable-by-syllable speakPinyin() for sentences without recordings.

UI/UX:

  • Reuse the existing three-stage card flow: stage 0 shows the Chinese sentence, stage 1 adds pinyin + audio, stage 2 reveals the English translation plus word-by-word gloss chips (e.g. 我 I · 喜欢 like · 西瓜 watermelon).
  • The gloss chips reuse the existing .chip-link styling.
  • New deck entry in DECKS (sentence_to_english) + DECK_STORAGE_KEYS.

Implementation cost:

Task Effort
Build script to filter/curate Tatoeba exports Medium
New data/sentence-data.json with curated entries Medium
Register deck in DECKS + storage key Low
Sentence audio playback (Tatoeba URL or syllable fallback) Low–Medium
Word-by-word gloss chips on reveal Medium

Open questions:

  • Should sentences be graded by lesson (e.g. "Lesson 1 sentences use only the first 10 vocab words") or offered as a single flat pool?
  • Is a reverse mode (english_to_sentence) valuable, or is comprehension-only enough to start?
  • Minimum audio-coverage threshold: skip sentences without audio, or accept the syllable fallback?