File size: 2,700 Bytes
f5b302e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
# AAC Context-Aware Demo: To-Do Document

## Goal

Create a proof-of-concept offline-capable RAG (Retrieval-Augmented Generation) system for ALS AAC users that:

* Uses a lightweight knowledge graph (JSON)
* Supports utterance suggestion and correction
* Uses local/offline LLMs (e.g., Gemma, Flan-T5)
* Includes a semantic retriever to match context (e.g. conversation partner, topics)
* Provides a Gradio-based UI for deployment on HuggingFace

---

## Phase 1: Environment Setup

* [ ] Install Gradio, Transformers, Sentence-Transformers
* [ ] Choose and install inference backends:

  * [ ] `google/flan-t5-base` (via HuggingFace Transformers)
  * [ ] Gemma 2B via Ollama or Transformers (check support for offline use)
  * [ ] Sentence similarity model (`sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` or similar)

---

## Phase 2: Knowledge Graph

* [ ] Create example `social_graph.json` (people, topics, relationships)
* [ ] Define function to extract relevant context given a selected person

  * Name, relationship, typical topics, frequency
* [ ] Format for prompt injection: inline context for LLM use

---

## Phase 3: Semantic Retriever

* [ ] Load sentence-transformer model
* [ ] Create index from the social graph topics/descriptions
* [ ] Match transcript to closest node(s) in the graph
* [ ] Retrieve context for prompt generation

---

## Phase 4: Gradio UI

* [ ] Simple interface:

  * Dropdown: Select "Who is speaking?" (Bob, Alice, etc.)
  * Record Button: Capture audio input
  * Text area: Show transcript
  * Toggle tabs:

    * [ ] "Suggest Utterance"
    * [ ] "Correct Message"
  * Output: Generated message
* [ ] Implement Whisper transcription (use `whisper`, `faster-whisper`, or `whisper.cpp`)
* [ ] Pass transcript + retrieved context to LLM model

---

## Phase 5: Model Comparison

* [ ] Test both Flan-T5 and Gemma:

  * [ ] Evaluate speed/quality tradeoffs
  * [ ] Compare correction accuracy and context-specific generation

---

## Optional Phase 6: HuggingFace Deployment

* [ ] Clean up UI and remove dependencies requiring GPU-only execution
* [ ] Upload Gradio demo to HuggingFace Spaces
* [ ] Add documentation and example graphs/transcripts

---

## Notes

* Keep user privacy and safety in mind (no cloud transcription if Whisper offline is available)
* Keep JSON editable for later expansion (add sessions, emotional tone, etc.)
* Option to cache LLM suggestions for fast recall

---

## Future Features (Post-Proof of Concept)

* Add visualisation of social graph (D3 or static SVG)
* Add editable profile page for caregivers
* Add chat history / rolling transcript viewer
* Add emotion/sentiment detection for tone-aware suggestions