cognitive-science - explorable

## Interactive Learning Environment & Cognitive Science Here we discuss the learning science evidence supporting Explorable's pedagogical design, examining how each phase of the learning experience—Explore, Build, Reflect & Consolidate, and Revise & Remember—is grounded in cognitive science research and learning theory. ### Table of Contents - [Learning UX](#Learning+UX) - [Scientific evidence for effectiveness of UX](#Scientific+evidence+for+effectiveness+of+UX) - [1. Explore](#1+Explore) - [Interactive text-adventure game using Prophetic narrative engine](#Interactive+text-adventure+game+using+Prophetic+narrative+engine) - [2. Build](#2+Build) - [Activity tracking using Vestige VSCode Extension](#Activity+tracking+using+Vestige+VSCode+Extension) - [3. Reflect & Consolidate](#3+Reflect+&+Consolidate) - [Post-session notes using Elderleaf](#Post-session+notes+using+Elderleaf) - [4. Revise & Remember](#4+Revise+&+Remember) - [Revision for long-term retention using Elderleaf test mode and dashboard](#Revision+for+long-term+retention+using+Elderleaf+test+mode+and+dashboard) - [Summary Table](#Summary+Table) - [Potential improvements](#Potential+improvements) # Learning UX Take an example of a game teaching debugging production issues in distributed systems: 1. **Explore** (tool: Prophetic narrative engine): start the text-adventure game 1. Read small story paragraph: learner explores a choice-based narrative that simulates the situation. Text is never shown all at once, to maximize engagement. 2. Choose next step: learner chooses one of the 3-7 available choices. 3. Guess consequences: the AI tutor asks the user what they expect to happen on the path they chose (can be right or wrong) 4. Socratic dialogue: AI tutor asks questions that guides the user onto the right mental models 5. See consequences: user sees the actual result of the action, so they can compare against their guess 6. Capture understanding: The AI tutor captures their understanding or misconceptions and saves them for future use 2. **Build** (tool: Vestige): 1. *Some* narrative branches require the learner to go do something hands-on. 2. The actions are tracked by the Vestige VSCode extension, which is then reported into Prophetic to continue the narrative. 3. e.g. deploy a backend with simulated traffic and a real dashboard to examine the situation. 3. **Reflect & Consolidate** (tool: Elderleaf): post-session notes 1. Recall and Consolidation: 1. Try to recollect from memory what you learnt from playing the game. 2. Create text / markdown (mdx) notes with pictures, animations, etc.--it's important this is not copy paste, i.e. active note taking. 2. Feedback: 1. Elderleaf reminds you of a) notes that you might have missed, and b) misconceptions to clarify and worth revising leater 4. **Revise & Remember** (tool: Elderleaf) 1. Login to Elderleaf dashboard to see your revision schedule (spaced repetition technique) 2. Use Test mode, and try to recreate your notes from scratch. You can use hints if you're stuck for more than 30 seconds. 3. Elderleaf tracks how many points you recollected correctly, and how many hints you needed, giving you a score for revision quality. # Scientific evidence for effectiveness of UX ## 1. Explore phase **Interactive text-adventure game using Prophetic narrative engine** #### Step 1: Read small story paragraph (chunked text, never shown all at once) **[Segmenting Principle / Cognitive Load Theory]**: Presenting information in smaller, learner-paced segments **reduces cognitive load** and improves both retention and transfer. **How Explorable implements it:** Text is revealed progressively in small paragraphs, never showing the full narrative at once, allowing learners to process each chunk before moving forward. **Citations:** - [1] Mayer, R. E., & Pilegard, C. (2014). Principles for managing essential processing in multimedia learning: segmenting, pre-training, and modality principles. _Cambridge Handbook of Multimedia Learning_ - [2] Rey, G. D., et al. (2019). A meta-analysis of the segmenting effect. _Educational Psychology Review_. Meta-analysis of 56 investigations showing segmented presentation produces small-to-medium effect sizes for retention and transfer. - [3] Sweller, J. (2016). Cognitive load theory. _Cambridge University Press_. Foundational theory showing working memory has limited capacity (~5-7 chunks), and overload impairs learning. --- #### Step 2: Choose next step in narrative **[Self-Determination Theory / Autonomy & Choice]**: Providing meaningful choices increases intrinsic motivation, engagement, and learning outcomes by satisfying the psychological need for autonomy. **How Explorable implements it:** Learners select from 3-7 narrative paths, giving them agency over their learning journey while keeping choices manageable. **Citations:** - [4] Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and the facilitation of intrinsic motivation. _American Psychologist_, 55(1), 68-78. Foundational paper showing autonomy support enhances intrinsic motivation. - [5] Schneider, S., et al. (2018). The autonomy-enhancing effects of choice on cognitive load, motivation and learning with digital media. _Learning and Instruction_, 58, 161-172. Found choice options improved retention and transfer performance, mediated by perceived autonomy. - [6] Patall, E. A., Cooper, H., & Robinson, J. C. (2008). The effects of choice on intrinsic motivation and related outcomes: A meta-analysis. _Psychological Bulletin_. Meta-analysis confirming positive effects of choice on intrinsic motivation. --- #### Step 3: Guess consequences (predict before seeing results) **[Prediction Error Learning / Generation Effect]**: Making predictions before receiving information creates "prediction errors" that enhance memory encoding and deepen conceptual understanding when the actual outcome is revealed. **How Explorable implements it:** Before seeing consequences, learners explicitly state what they expect to happen, activating prediction error mechanisms when reality is revealed. **Citations:** - [7] Sinclair, A. H., & Barense, M. D. (2019). Prediction error and memory reactivation. _Trends in Neurosciences_. Shows prediction errors strengthen memory through reconsolidation mechanisms, with implications for education. - [8] Brod, G., et al. (2023). The effect of prediction error on episodic memory encoding. _npj Science of Learning_. Demonstrates prediction violations enhance hippocampal encoding of new information. - [9] den Ouden, H. E. M., Kok, P., & de Lange, F. P. (2012). How prediction errors shape perception, attention, and motivation. _Frontiers in Psychology_, 3, 548. Reviews how prediction errors are fundamental learning signals across cognitive domains. --- #### Step 4 Socratic dialogue (AI asks guiding questions) **[Socratic Questioning / Elaborative Interrogation]**: Questioning that prompts learners to explain "why" and "how" promotes deeper processing, critical thinking, and conceptual understanding. **How Explorable implements it:** The AI tutor asks probing questions that guide learners to examine their reasoning and mental models rather than directly providing answers. **Citations:** - [10] Paul, R., & Elder, L. (2007). Critical thinking: The art of Socratic questioning. _Journal of Developmental Education_, 31(2), 32-33. Framework showing Socratic questioning develops critical thinking by making reasoning explicit. - [11] Chen, T. H. (2023). Thinking more wisely: using the Socratic method to develop critical thinking skills amongst healthcare students. _BMC Medical Education_. Empirical study showing Socratic questioning improved students' intellectual dimensions through repeated engagement. - [12] Hagos, T. (2025). Socratic method of questioning: the effect on improving students' understanding and application of chemical kinetics concepts. _Chemistry Education Research and Practice_. Quasi-experimental study showing significant improvements in conceptual understanding (η² = significant) through Socratic questioning vs. lecture. --- #### Step 5: See consequences (compare against prediction) **[Feedback & Prediction Error Correction]**: Immediate feedback following a prediction creates a powerful learning moment by allowing comparison between expected and actual outcomes. **How Explorable implements it:** After guessing, learners see actual consequences, enabling direct comparison that triggers prediction error learning mechanisms. **Citations:** - [13] Hattie, J., & Timperley, H. (2007). The power of feedback. _Review of Educational Research_, 77(1), 81-112. Foundational meta-analysis showing feedback is among the most powerful influences on learning (d = 0.73). - [14] Sinclair, A. H., et al. (2021). Prediction errors disrupt hippocampal representations and update episodic memories. _PNAS_. Shows prediction errors trigger memory updating mechanisms when reality differs from expectations. - [15] Kapur, M. (2014). Productive failure in learning math. _Cognitive Science_, 38(5), 1008-1022. Demonstrates that experiencing "failure" (incorrect predictions) before instruction produces deeper conceptual learning than instruction-first approaches (d = 0.36-0.97). --- ### Step 6: Capture understanding/misconceptions **[Formative Assessment / Misconception Tracking]**: Identifying and documenting misconceptions enables targeted remediation and prevents reinforcement of incorrect mental models. **How Explorable implements it:** The AI captures learner's current understanding and misconceptions for later review and targeted feedback during consolidation. **Citations:** - [16] Black, P., & Wiliam, D. (2009). Developing the theory of formative assessment. _Educational Assessment, Evaluation and Accountability_, 21, 5-31. Landmark review showing formative assessment produces among the largest effect sizes reported for educational interventions. - [17] Tanner, K. D. (2012). Promoting student metacognition. _CBE—Life Sciences Education_, 11(2), 113-120. Shows explicit attention to misconceptions improves metacognitive awareness and learning. - [18] Sadler, P. M., & Sonnert, G. (2016). Understanding misconceptions. _American Educator_. Reviews research showing misconceptions must be explicitly addressed to prevent interference with correct learning. --- ## 2. Build phase ### activity tracking using Vestige VSCode Extension #### Hands-on practice with real tools, tracked activity **[Learning by Doing / Situated Learning / Transfer]**: Hands-on practice in authentic contexts produces deeper learning and better transfer than passive instruction, especially when immediately following conceptual exploration. **How Explorable implements it:** Narrative branches require learners to perform real tasks (e.g., deploying backends, examining dashboards) in VSCode, with actions tracked and reported back to the narrative. **Citations:** - [19] Kolb, D. A. (1984). _Experiential Learning: Experience as the Source of Learning and Development_. Foundational theory showing learning is a cycle of concrete experience, reflection, conceptualization, and experimentation. - [20] Chi, M. T. H., et al. (2024). Learning by doing or doing without learning? _Educational Psychology Review_. Review showing activity-based learning is effective when motor activity is relevant to the task and doesn't create extraneous cognitive load. - [21] Brown, J. S., Collins, A., & Duguid, P. (1989). Situated cognition and the culture of learning. _Educational Researcher_, 18(1), 32-42. Foundational paper arguing knowledge is situated in authentic activity and context. - [22] Bransford, J. D., Brown, A. L., & Cocking, R. R. (2000). _How People Learn_. National Research Council report showing transfer requires practice in varied, authentic contexts. - [23] Skillable Learning Pyramid research: Practice by doing averages 75% retention vs. 20% for audiovisual and 10% for reading alone. --- ## 3. Reflect & Consolidate ### post-session notes using Elderleaf #### Step 1: Recall and consolidation (recreate from memory, active note-taking) **[Retrieval Practice / Testing Effect]**: Actively retrieving information from memory strengthens retention far more than re-reading or passive review. The act of reconstruction is itself a powerful learning event. **How Explorable implements it:** Learners attempt to recall what they learned without looking at source material, then create their own notes—not copying/pasting, but actively reconstructing understanding. **Citations:** - [24] Roediger, H. L., & Karpicke, J. D. (2006). Test-enhanced learning: Taking memory tests improves long-term retention. _Psychological Science_, 17(3), 249-255. Seminal study showing retrieval practice dramatically outperforms restudying (68% vs 54% on delayed tests). - [25] Karpicke, J. D., & Blunt, J. R. (2011). Retrieval practice produces more learning than elaborative studying with concept mapping. _Science_, 331, 772-775. Found 84% of students performed better with retrieval practice vs. concept mapping. - [26] Roediger, H. L., & Karpicke, J. D. (2006). The power of testing memory: Basic research and implications for educational practice. _Perspectives on Psychological Science_, 1(3), 181-210. Comprehensive review showing testing nearly stops forgetting and multiple tests slow forgetting more than single tests. - [27] Adesope, O. O., et al. (2017). Rethinking the use of tests: A meta-analysis of practice testing. _Review of Educational Research_, 87(3), 659-701. Meta-analysis confirming robust testing effects across domains. --- #### 3.2 Step 2: Feedback (reminders of missed notes, misconception clarification) **[Metacognition / Feedback on Learning Gaps]**: External feedback on what was missed or misunderstood supports metacognitive monitoring and helps calibrate future learning efforts. **How Explorable implements it:** Elderleaf identifies gaps between what was captured in the game and what appears in the learner's notes, plus flags misconceptions for later revision. **Citations:** - [28] Dunlosky, J., & Rawson, K. A. (2012). Overconfidence produces underachievement: Inaccurate self evaluations undermine students' learning and retention. _Learning and Instruction_, 22(4), 271-280. Shows students frequently misjudge their learning; external feedback corrects this. - [29] Koriat, A. (2012). The relationships between monitoring, regulation and performance. _Learning and Instruction_, 22(4), 296-298. Reviews how accurate self-monitoring is essential for effective learning regulation. - [30] Tanner, K. D. (2012). Promoting student metacognition. _CBE—Life Sciences Education_, 11(2), 113-120. Shows explicit metacognitive feedback improves learning monitoring and strategy selection. --- ## 4. Revise & Remember ### revision for long-term retention using Elderleaf test mode and dashboard #### Step 1: Spaced repetition schedule **[Spacing Effect / Distributed Practice]**: Spacing review sessions over increasing intervals dramatically improves long-term retention compared to massed practice ("cramming"). **How Explorable implements it:** Elderleaf calculates optimal review intervals based on spaced repetition algorithms, presenting revision schedules on the dashboard. **Citations:** - [31] Ebbinghaus, H. (1885/1964). _Memory: A Contribution to Experimental Psychology_. Original discovery of the spacing effect and forgetting curve. - [32] Cepeda, N. J., et al. (2006). Distributed practice in verbal recall tasks: A review and quantitative synthesis. _Psychological Bulletin_, 132(3), 354-380. Meta-analysis of 254 studies confirming spacing effects across 259 of 271 conditions. - [33] Kang, S. H. K. (2016). Spaced repetition promotes efficient and effective learning. _Policy Insights from the Behavioral and Brain Sciences_, 3(1), 12-19. Reviews hundred+ studies showing spacing produces superior long-term retention with effect sizes often exceeding d = 0.5. - [34] Rohrer, D. (2015). Student instruction should be distributed over long time periods. _Educational Psychology Review_. Recommends spacing learning across days/weeks for durable retention. --- #### Step 2: Test mode (recreate notes from scratch with hints) **[Retrieval Practice with Scaffolding / Desirable Difficulties]**: Attempting full recall before receiving hints creates "desirable difficulty" that strengthens memory more than immediate scaffolding. **How Explorable implements it:** Learners attempt to recreate their notes from memory first; hints appear only after 30 seconds of being stuck, balancing challenge with support. **Citations:** - [35] Bjork, R. A., & Bjork, E. L. (2011). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In _Psychology and the Real World_ (pp. 56-64). Foundational paper on how optimal challenge level enhances learning. - [36] Kornell, N., Hays, M. J., & Bjork, R. A. (2009). Unsuccessful retrieval attempts enhance subsequent learning. _Journal of Experimental Psychology: Learning, Memory, and Cognition_, 35(4), 989-998. Shows even failed retrieval attempts (before getting hints) strengthen subsequent memory. - [37] Kapur, M., & Bielaczyc, K. (2012). Designing for productive failure. _Journal of the Learning Sciences_, 21(1), 45-83. Demonstrates delayed scaffolding produces better conceptual understanding than immediate support. --- #### Step 3: Scoring revision quality (points, hints needed) **[Self-Monitoring / Metacognitive Calibration]**: Quantitative feedback on retrieval success helps learners accurately assess their knowledge state and allocate future study time. **How Explorable implements it:** Elderleaf tracks how many points were recollected vs. hints needed, providing objective data for learners to assess mastery. **Citations:** - [38] Dunlosky, J., et al. (2013). Improving students' learning with effective learning techniques. _Psychological Science in the Public Interest_, 14(1), 4-58. Comprehensive review showing self-testing with feedback is among the most effective study strategies. - [39] Soderstrom, N. C., & Bjork, R. A. (2015). Learning versus performance. _Perspectives on Psychological Science_, 10(2), 176-199. Shows objective performance metrics are essential because subjective judgments are often miscalibrated. - [40] Schraw, G. (1998). Promoting general metacognitive awareness. _Instructional Science_, 26, 113-125. Reviews how explicit monitoring supports development of self-regulated learning skills. --- ## Summary Table |UX Step|Primary Learning Principles|Key Effect Sizes| |---|---|---| |Chunked text|Segmenting, Cognitive Load|d = 0.3-0.5 (meta-analysis)| |Choice-based narrative|Self-Determination Theory|Medium-large effects on motivation| |Predict consequences|Prediction Error Learning|Enhanced hippocampal encoding| |Socratic dialogue|Elaborative Interrogation|Critical thinking gains| |See consequences|Feedback, Error Correction|d = 0.73 (feedback meta-analysis)| |Capture misconceptions|Formative Assessment|Among largest educational effect sizes| |Hands-on practice|Learning by Doing|75% retention vs 10% reading| |Recall from memory|Testing Effect|d = 0.5+ vs restudying| |Feedback on gaps|Metacognition|Calibration improvement| |Spaced repetition|Spacing Effect|d = 0.5+ long-term retention| |Test with delayed hints|Desirable Difficulties|Enhanced consolidation| |Revision scoring|Self-Monitoring|Improved self-regulation| # Potential Improvements # Critique of Explorable UX Flow: Gaps and Shortcomings ## 1. Explore Phase Critiques ### Missing: Worked Examples Before Productive Failure **Issue:** Productive failure research (Kapur) shows it works best when learners have _some_ prior knowledge to activate. For complete novices, starting with problem-solving can cause unproductive struggle and frustration. **What's missing:** No worked examples or modeling phase before exploration. CS education research strongly supports worked examples for novices. **Evidence:** Worked examples produce large effect sizes for novices, and productive failure is less effective for younger/less experienced learners. - Atkinson, R. K., Derry, S. J., Renkl, A., & Wortham, D. (2000). Learning from examples: Instructional principles from the worked examples research. _Review of Educational Research_, 70(2), 181-214. Meta-analysis showing d = 0.8 effect for novices. - Sinha, T., & Kapur, M. (2021). When problem solving followed by instruction works: Evidence for productive failure. _Review of Educational Research_. Meta-analysis finding PF less effective for younger/less experienced learners. **Fix:** Add optional "watch an expert" mode before exploration for true novices. **Impact:** Critical — without this, novices may experience unproductive failure rather than productive failure, leading to frustration and dropout. --- ### Missing: Explicit Self-Explanation Prompts **Issue:** Socratic dialogue asks questions, but doesn't explicitly prompt learners to _explain_ the narrative content to themselves as they read. **What's missing:** Research shows self-explanation during learning (not just Q&A) doubles learning gains. The UX relies on AI questions rather than learner-generated explanations. **Evidence:** Passive consumption of narrative, even with Socratic interruptions, is less effective than active self-explanation. - Chi, M. T. H., Bassok, M., Lewis, M. W., Reimann, P., & Glaser, R. (1989). Self-explanations: How students study and use examples in learning to solve problems. _Cognitive Science_, 13(2), 145-182. - Chi, M. T. H., De Leeuw, N., Chiu, M. H., & LaVancher, C. (1994). Eliciting self-explanations improves understanding. _Cognitive Science_, 18(3), 439-477. **Fix:** Add explicit "explain what just happened in your own words" prompts before choices. **Impact:** Critical — self-explanation is one of the highest-yield learning strategies; its absence significantly reduces learning efficiency. --- ### Missing: Subgoal Labeling for Procedural Knowledge **Issue:** Debugging is procedural. Research shows learners need explicit subgoal labels to chunk complex procedures. **What's missing:** Narrative describes situations but may not explicitly label transferable debugging subgoals (e.g., "Step 1: Reproduce → Step 2: Isolate → Step 3: Hypothesize"). **Evidence:** Without explicit subgoal labels, learners encode surface features rather than transferable structure. - Margulieux, L. E., Catrambone, R., & Guzdial, M. (2012). Subgoal-labeled instructional material improves performance and transfer in learning to develop mobile applications. _Proceedings of ICER 2012_. Showed 20-30% improvement in transfer for programming tasks. **Fix:** Overlay explicit subgoal structure on narrative arcs. **Impact:** Critical for transfer — without subgoal labeling, learners may solve the specific scenario but fail to transfer debugging strategies to new problems. --- ### Problem: Text-Based Narrative May Build Incorrect Mental Models **Issue:** Distributed systems require accurate "notional machines" — mental models of how execution actually works across nodes, time, and failure modes. **What's missing:** Text narratives risk creating folk theories. No visual/dynamic representations of system state, message passing, timing, or failure propagation. **Evidence:** Text alone is insufficient for temporal/spatial reasoning required in distributed systems; even experts frequently hold incorrect mental models. - Sorva, J. (2013). Notional machines and introductory programming education. _ACM Transactions on Computing Education_, 13(2), 1-31. - Alvaro, P., et al. (2015). Lineage-driven fault injection. _SIGMOD_. Demonstrates complexity of distributed system reasoning. - Kingsbury, K. (Jepsen studies). Empirical evidence that even expert-built systems contain subtle distributed systems bugs due to incorrect mental models. **Fix:** Integrate dynamic visualizations (sequence diagrams, Lamport diagrams, state machines) directly into narrative. **Impact:** Critical for this domain — distributed systems debugging fundamentally requires accurate mental models of concurrency, timing, and partial failure that text cannot adequately convey. --- ### Problem: Choice Architecture May Not Reflect Real Debugging **Issue:** 3-7 discrete choices is artificial. Real debugging involves continuous hypothesis generation, not menu selection. **What's missing:** Authentic debugging requires _generating_ hypotheses, not selecting from provided options. This trains recognition, not recall. **Evidence:** Multiple-choice formats train different cognitive skills than open-ended problem-solving; debugging is fundamentally hypothesis-driven. - Fitzgerald, S., Lewandowski, G., McCauley, R., Murphy, L., Simon, B., Thomas, L., & Zander, C. (2008). Debugging: Finding, fixing and flailing, a multi-institutional study of novice debuggers. _Computer Science Education_, 18(2), 93-116. - Ko, A. J., & Myers, B. A. (2004). Designing the Whyline: A debugging interface for asking questions about program behavior. _CHI 2004_. Research on interrogative debugging and hypothesis generation. **Fix:** Add free-form hypothesis entry before showing choices; score based on match to choice space. **Impact:** Nice-to-have — choice-based learning still has value for building recognition skills, but may not fully develop generative debugging abilities needed in practice. --- ## 2. Build Phase Critiques ### Missing: Graduated Scaffolding (Parsons Problems, Fading) **Issue:** Jumping from narrative to full hands-on deployment is a large cognitive leap. **What's missing:** Intermediate scaffolding like Parsons problems (rearranging code blocks), completion problems, or faded worked examples. **Evidence:** Abrupt removal of scaffolding causes cognitive overload; intermediate steps are essential for complex skill acquisition. - Weinman, N., & Fisler, K. (2017). Fostering shared understanding of data-driven design in a block-based language. _ICER 2017_. Research on Parsons problems in CS education. - Renkl, A., & Atkinson, R. K. (2003). Structuring the transition from example study to problem solving in cognitive skill acquisition. _Educational Psychologist_, 38(1), 15-22. **Fix:** Add intermediate "guided Build" steps before fully open-ended practice. **Impact:** Critical for novices — the gap between guided narrative and unscaffolded IDE work may cause cognitive overload and learned helplessness. --- ### Missing: Metacognitive Prompts During Activity **Issue:** Vestige tracks _actions_ but doesn't prompt _reflection_ during the hands-on phase. **What's missing:** Research shows prompting "what are you trying to do?" or "why did you just do that?" during problem-solving improves learning. **Evidence:** Passive action tracking misses metacognitive scaffolding opportunities that could deepen learning. - Aleven, V., & Koedinger, K. R. (2002). An effective metacognitive strategy: Learning by doing and explaining with a computer-based Cognitive Tutor. _Cognitive Science_, 26(2), 147-179. - Roll, I., Aleven, V., McLaren, B. M., & Koedinger, K. R. (2011). Improving students' help-seeking skills using metacognitive feedback in an intelligent tutoring system. _Learning and Instruction_, 21(2), 267-280. **Fix:** Vestige should periodically prompt learners to articulate their current hypothesis/goal. **Impact:** Nice-to-have — tracking alone provides data for spaced repetition, but adding prompts would significantly enhance in-the-moment learning. --- ### Problem: Context-Switching Cost **Issue:** Transition from Prophetic (narrative) to VSCode (IDE) incurs cognitive switching costs. **What's missing:** The mental model built in narrative must be re-instantiated in a completely different interface. This transfer is not scaffolded. **Evidence:** Each context switch can cost 15-25% of cognitive resources, potentially disrupting learning. - Rubinstein, J. S., Meyer, D. E., & Evans, J. E. (2001). Executive control of cognitive processes in task switching. _Journal of Experimental Psychology: Human Perception and Performance_, 27(4), 763-797. - Altmann, E. M., & Trafton, J. G. (2002). Memory for goals: An activation-based model. _Cognitive Science_, 26(1), 39-83. **Fix:** Tighter integration — embed terminal/dashboard within narrative interface, or provide explicit bridging prompts. **Impact:** Nice-to-have — some context switching may actually promote transfer, but the current abrupt transition could be smoother. --- ### Problem: Simulated vs. Real Environment Transfer **Issue:** "Simulated traffic and a real dashboard" may not transfer to actual production debugging. **What's missing:** Ecological validity research shows skills learned in simplified environments often fail to transfer. Real production has noise, incomplete information, time pressure. **Evidence:** Simulation fidelity matters significantly for transfer of complex skills. - Barnett, S. M., & Ceci, S. J. (2002). When and where do we apply what we learn? A taxonomy for far transfer. _Psychological Bulletin_, 128(4), 612-637. - Perkins, D. N., & Salomon, G. (1992). Transfer of learning. _International Encyclopedia of Education_ (2nd ed.). **Fix:** Include "chaos engineering" elements — incomplete logs, red herrings, missing metrics — to increase authenticity. **Impact:** Narrowly applicable — important for production debugging specifically, less relevant for learning foundational concepts. --- ## 3. Reflect & Consolidate Phase Critiques ### Problem: Post-Session Recall May Be Too Late **Issue:** Waiting until after the session to recall may miss the optimal window for retrieval practice. **What's missing:** Research suggests retrieval should be interleaved _during_ learning, not only after. **Evidence:** Retrieval during learning is more effective than massed end-of-session retrieval alone. - Karpicke, J. D., & Roediger, H. L. (2008). The critical importance of retrieval for learning. _Science_, 319(5865), 966-968. - Dunlosky, J., Rawson, K. A., Marsh, E. J., Nathan, M. J., & Willingham, D. T. (2013). Improving students' learning with effective learning techniques. _Psychological Science in the Public Interest_, 14(1), 4-58. **Fix:** Add micro-retrieval prompts during Explore and Build phases, not just in Reflect. **Impact:** Critical — the testing effect is most powerful when retrieval is distributed throughout learning, not concentrated at the end. --- ### Missing: Comparative/Contrastive Cases **Issue:** Learner recalls one path through the narrative, but may not understand _why_ other paths fail. **What's missing:** Research on analogical learning shows comparing cases (especially correct vs. incorrect) deepens understanding. **Evidence:** Seeing only your chosen path limits abstraction and understanding of underlying principles. - Gentner, D., Loewenstein, J., & Thompson, L. (2003). Learning and transfer: A general role for analogical encoding. _Journal of Educational Psychology_, 95(2), 393-405. - Schwartz, D. L., Chase, C. C., Oppezzo, M. A., & Chin, D. B. (2011). Practicing versus inventing with contrasting cases: The effects of telling first on learning and transfer. _Journal of Educational Psychology_, 103(4), 759-775. **Fix:** Reflect phase should surface "what if you had chosen X?" comparisons. **Impact:** Critical for conceptual understanding — without contrasting cases, learners may know _what_ works but not _why_, limiting transfer. --- ### Problem: Active Note-Taking Without Structure **Issue:** "Create text/markdown notes" is underspecified. Unguided note-taking often produces shallow, disorganized output. **What's missing:** Research shows structured note-taking (Cornell method, concept maps with prompts) outperforms free-form notes. **Evidence:** Learners need scaffolding for effective note organization; unstructured notes often capture surface features only. - Kiewra, K. A. (1989). A review of note-taking: The encoding-storage paradigm and beyond. _Educational Psychology Review_, 1(2), 147-172. - Bui, D. C., Myerson, J., & Hale, S. (2013). Note-taking with computers: Exploring alternative strategies for improved recall. _Journal of Educational Psychology_, 105(2), 299-309. **Fix:** Provide note templates aligned with debugging subgoals; prompt for causal explanations, not just descriptions. **Impact:** Nice-to-have — active note-taking is still better than passive review, but structured templates would increase effectiveness. --- ## 4. Revise & Remember Phase Critiques ### Problem: Spaced Repetition Optimized for Declarative, Not Procedural Knowledge **Issue:** Standard SR algorithms (SM-2, etc.) are designed for fact recall, not complex procedural debugging skills. **What's missing:** Debugging expertise requires pattern recognition and strategic knowledge, which may need different spacing schedules than vocabulary-style flashcards. **Evidence:** Procedural and declarative knowledge have different memory characteristics and require different practice approaches. - Anderson, J. R. (1993). _Rules of the Mind_. ACT-R theory distinguishing declarative and procedural memory systems. - van Merriënboer, J. J. G. (1997). _Training Complex Cognitive Skills_. Framework for complex skill acquisition emphasizing procedural practice over declarative recall. **Fix:** Supplement SR with spaced _problem-solving_ sessions, not just note recreation. **Impact:** Critical — recreating notes tests declarative knowledge but may not maintain procedural debugging skills, which require actual practice. --- ### Missing: Interleaved Practice Across Scenarios **Issue:** Revising notes from one debugging scenario reinforces that scenario but may not build transfer. **What's missing:** Interleaving different problem types (network issues vs. race conditions vs. resource exhaustion) during revision. **Evidence:** Interleaved practice dramatically improves transfer compared to blocked practice on single problem types. - Rohrer, D., & Taylor, K. (2007). The shuffling of mathematics problems improves learning. _Instructional Science_, 35(6), 481-498. - Kornell, N., & Bjork, R. A. (2008). Learning concepts and categories: Is spacing the "enemy of induction"? _Psychological Science_, 19(6), 585-592. Shows d = 0.5-0.8 advantage for interleaved practice. **Fix:** Revision sessions should mix scenarios, not just repeat the same one. **Impact:** Critical for transfer — without interleaving, learners may master individual scenarios but fail to abstract generalizable debugging strategies. --- ### Problem: Hints May Undermine Desirable Difficulty **Issue:** "Use hints if stuck for more than 30 seconds" may be too short. Brief struggle is productive. **What's missing:** Research suggests optimal hint delay is task-dependent and 30 seconds may be too aggressive for complex debugging concepts. **Evidence:** Too-quick hints prevent the productive struggle that consolidates memory; optimal difficulty varies by task and learner. - Bjork, R. A., & Bjork, E. L. (2011). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In _Psychology and the Real World_ (pp. 56-64). - Kalyuga, S. (2007). Expertise reversal effect and its implications for learner-tailored instruction. _Educational Psychology Review_, 19(4), 509-539. **Fix:** Adaptive hint timing based on item difficulty and learner history; allow learners to control hint timing. **Impact:** Nice-to-have — 30 seconds may be appropriate for simple items, but complex debugging concepts likely need longer struggle time. --- ## 5. Cross-Cutting/Missing Elements ### Missing: Expert Modeling / Cognitive Apprenticeship **Issue:** No explicit phase where learners observe expert debugging thinking. **What's missing:** Cognitive apprenticeship requires modeling before coaching. The AI asks questions but never demonstrates expert reasoning. **Evidence:** "Watch one, do one, teach one" is not "do one, get questioned" — modeling is foundational for skill acquisition. - Collins, A., Brown, J. S., & Newman, S. E. (1989). Cognitive apprenticeship: Teaching the crafts of reading, writing, and mathematics. In _Knowing, Learning, and Instruction: Essays in Honor of Robert Glaser_ (pp. 453-494). **Fix:** Add "expert playthrough" option showing think-aloud debugging with explicit strategy articulation. **Impact:** Critical — without modeling, learners lack a reference point for what expert debugging looks like, making it harder to develop expert-like strategies. --- ### Missing: Collaborative/Social Learning **Issue:** Entirely individual. No peer discussion, pair debugging, or collaborative problem-solving. **What's missing:** CS education research strongly supports peer instruction and pair programming for debugging. **Evidence:** Social metacognition (thinking about others' thinking) is a powerful learning mechanism absent from the current design. - Porter, L., Bailey Lee, C., & Simon, B. (2013). Halving fail rates using peer instruction: A study of four computer science courses. _SIGCSE 2013_. - Williams, L., & Kessler, R. (2000). All I really need to know about pair programming I learned in kindergarten. _Communications of the ACM_, 43(5), 108-114. **Fix:** Add multiplayer narrative branches or async peer comparison features. **Impact:** Nice-to-have for individual learners; Critical for classroom/cohort deployments where social learning is feasible. --- ### Missing: Explicit Far Transfer Training **Issue:** No systematic variation to promote abstraction and far transfer. **What's missing:** Learners may encode scenario-specific patterns ("when Kafka lags, check X") without abstracting general debugging principles. **Evidence:** Far transfer requires explicit abstraction activities; it does not emerge automatically from varied practice. - Barnett, S. M., & Ceci, S. J. (2002). When and where do we apply what we learn? A taxonomy for far transfer. _Psychological Bulletin_, 128(4), 612-637. - Perkins, D. N., & Salomon, G. (1992). Transfer of learning. _International Encyclopedia of Education_ (2nd ed.). **Fix:** After multiple scenarios, add explicit "what's common across all these?" abstraction prompts. **Impact:** Critical for real-world application — without explicit abstraction, learners may excel at practiced scenarios but struggle with novel debugging challenges. --- ### Missing: Affective/Motivational Scaffolding for Frustration **Issue:** Debugging is frustrating. No explicit support for emotional regulation during struggle. **What's missing:** Research on academic emotions shows frustration during problem-solving predicts dropout unless scaffolded. **Evidence:** Productive failure can become unproductive if learner affect is ignored; confusion is beneficial only when resolved. - D'Mello, S., & Graesser, A. (2012). Dynamics of affective states during complex learning. _Learning and Instruction_, 22(2), 145-157. - Pekrun, R. (2006). The control-value theory of achievement emotions: Assumptions, corollaries, and implications for educational research and practice. _Educational Psychology Review_, 18(2), 315-341. **Fix:** Add affective prompts ("This is hard — that's normal"); calibrate difficulty to maintain flow state. **Impact:** Nice-to-have for experienced learners; Critical for novices and in high-frustration domains like distributed systems debugging. --- ## Summary: Critical Gaps |Gap|Impact|Research Basis| |---|---|---| |No worked examples for novices|Critical|Atkinson et al., Kapur boundary conditions| |No expert modeling phase|Critical|Cognitive apprenticeship (Collins et al.)| |Text-only (no dynamic visualizations)|Critical|Notional machines (Sorva), distributed systems complexity| |No interleaved practice across scenarios|Critical|Rohrer & Taylor, Kornell & Bjork| |No self-explanation prompts|Critical|Chi et al. self-explanation effect| |No subgoal labeling|Critical|Margulieux, Catrambone, & Guzdial| |Post-session recall (not during learning)|Critical|Karpicke & Roediger, Dunlosky et al.| |No contrasting cases in reflection|Critical|Gentner et al., Schwartz et al.| |SR optimized for declarative not procedural|Critical|van Merriënboer, ACT-R| |No explicit far transfer training|Critical|Barnett & Ceci, Perkins & Salomon| |No graduated scaffolding (Parsons, fading)|Critical|Renkl, Weinman & Fisler| |No collaborative/peer elements|Nice-to-have|Porter, Williams & Kessler| |No metacognitive prompts during Build|Nice-to-have|Aleven & Koedinger, Roll et al.| |Context-switching costs (narrative → IDE)|Nice-to-have|Rubinstein et al., Altmann & Trafton| |Unstructured note-taking|Nice-to-have|Kiewra, Bui et al.| |Hints too aggressive (30s)|Nice-to-have|Bjork & Bjork desirable difficulties| |Affective scaffolding|Nice-to-have / Critical|D'Mello & Graesser, Pekrun| |Choice-based vs. generative debugging|Nice-to-have|Fitzgerald et al., Ko & Myers| |Simulated vs. real environment|Narrowly applicable|Barnett & Ceci, Perkins & Salomon| # Interactive Principles for Education Game Design We compare our progress against [this catalog of interactive principles for education game design](https://eharpste.github.io/interactive-principles/#/) ## Legend - ✅ **Implemented**: Present in current UX flow - ⚠️ **Noted as Missing**: Identified as a gap in our critique - 🔶 **Noted as Problem**: Partially implemented but with issues identified - ❌ **Not Considered**: Not addressed in our critique --- ## Category 1: Memory/Fluency (Principles 1-6) ### 1. Spacing **Definition:** Leave some time in between practice/study/play sessions. Space practice across time (✓) vs. Mass practice all at once (✗). **Status:** ✅ Implemented **Where in Explorable:** Elderleaf's spaced repetition system schedules review sessions across days/weeks. **Notes:** Core design element. Our critique noted this is well-implemented but raised concerns about SR algorithms being optimized for declarative rather than procedural knowledge. --- ### 2. Scaffolding **Definition:** Structure content modularly. Sequence instruction toward higher goals (✓) vs. No sequencing (✗). **Status:** 🔶 Noted as Problem **Where in Explorable:** Narrative structure exists in Prophetic with sequential choices leading to outcomes. **Problem identified:** We noted "No graduated scaffolding (Parsons problems, fading)" — the jump from narrative to full VSCode deployment is too large. Missing intermediate steps. **Implementation idea:** Add Parsons problems (code rearrangement), completion problems, and faded worked examples between Explore and Build phases. Create a "Guided Build" mode with partially-completed code. --- ### 3. Exam Expectations **Definition:** Make it clear how key concepts being learned will be put to use later. Students expect to be tested (✓) vs. No testing expected (✗). **Status:** ✅ Implemented **Where in Explorable:** Elderleaf's Revise phase explicitly frames content as "you will be tested on this." Learners know recall is coming. **Notes:** The testing expectation is built into the UX flow from the start. --- ### 4. Quizzing **Definition:** Re-expose players to core concepts by forcing them to recall material. Quiz for retrieval practice (✓) vs. Study same material (✗). **Status:** ✅ Implemented **Where in Explorable:** Elderleaf's test mode requires recreating notes from memory. Active retrieval, not passive review. **Notes:** Our critique noted this is well-implemented but raised concerns that SR is optimized for declarative knowledge. Consider adding spaced _problem-solving_ sessions, not just note recreation. --- ### 5. Segmenting **Definition:** Break material down into units in order to allow the player/learner to approach content at their own pace. Present lesson in learner-paced segments (✓) vs. As a continuous unit (✗). **Status:** ✅ Implemented **Where in Explorable:** Prophetic's chunked narrative (1-3 short paragraphs at a time) with choices between chunks. **Notes:** Directly cited in our evidence base (Mayer's segmenting principle, d = 0.3-0.5). --- ### 6. Feedback **Definition:** Provide feedback to the player on their performance as they are learning. Provide feedback during learning (✓) vs. No feedback provided (✗). **Status:** 🔶 Noted as Problem **Where in Explorable:** - Explore: Prediction vs. reality comparison - Reflect: AI feedback on gaps/misconceptions **Problem identified:** We noted "No metacognitive prompts during Build" — Vestige tracks actions but doesn't prompt reflection during the hands-on phase. Feedback is concentrated at certain points, not distributed throughout. **Implementation idea:** Vestige should periodically prompt "What are you trying to do?" or "Why did you just run that command?" during Build phase. Add micro-feedback loops throughout, not just at phase transitions. --- ## Category 2: Induction/Refinement (Principles 7-17) ### 7. Pretraining **Definition:** Provide some exposure to key concepts before the actual lesson. Practice key prior skills before lesson (✓) vs. Jump in (✗). **Status:** ⚠️ Noted as Missing **Where in Explorable:** Not currently implemented. Learners jump directly into narrative exploration. **Problem identified:** We noted "No worked examples for novices" and "No expert modeling phase" — complete novices need some orientation before productive failure works. **Implementation idea:** Add an optional "Primer" phase before Explore that introduces key terminology, shows the system architecture visually, and demonstrates what a debugging session looks like. Could be a 2-minute video or interactive diagram. --- ### 8. Worked Example **Definition:** Combine problem-solving practice with reviewing examples of solved problems. Worked examples + problem-solving practice (✓) vs. Practice alone (✗). **Status:** ⚠️ Noted as Missing (CRITICAL) **Where in Explorable:** Not currently implemented. Learners are thrown into problem-solving without seeing expert solutions first. **Problem identified:** Explicitly noted: "No worked examples for novices" — Atkinson et al. (2000) meta-analysis showed d = 0.8 effect for novices. Kapur's own research shows productive failure is less effective for inexperienced learners. **Implementation idea:** 1. Add "Watch Expert" mode showing annotated expert playthrough with think-aloud 2. Before each scenario, show a worked example of a _different but related_ debugging problem 3. After learner completes scenario, show expert solution for comparison 4. Use example-problem pairs: worked example → similar problem → worked example → similar problem --- ### 9. Concreteness Fading **Definition:** When explaining key concepts, use both abstract and concrete representations. Concrete to abstract (✓) vs. Starting with abstract representations (✗). **Status:** 🔶 Noted as Problem **Where in Explorable:** Narrative uses concrete scenarios (specific services, specific errors), but: **Problem identified:** We noted "Text-only (no dynamic visualizations)" — there's no visual abstraction layer. Learners see concrete text descriptions but don't see the abstract patterns (sequence diagrams, state machines) that would help transfer. **Implementation idea:** 1. Start with concrete narrative scenario 2. Mid-scenario, introduce simplified diagram of what's happening 3. End with abstract principle/pattern diagram 4. In Reflect phase, prompt learner to draw their own diagram before showing canonical version --- ### 10. Guided Attention **Definition:** Provide explicit instructions as needed to help the player anticipate where they should focus. Words include cues about organization (✓) vs. No organization cues (✗). **Status:** ⚠️ Noted as Missing **Where in Explorable:** Narrative provides implicit guidance, but: **Problem identified:** We noted "No subgoal labeling for procedural knowledge" — debugging is procedural (Reproduce → Isolate → Hypothesize → Test → Fix) but the narrative doesn't explicitly label these transferable subgoals. **Implementation idea:** 1. Overlay subgoal labels on narrative: "STEP 1: REPRODUCE" appears as header 2. Use consistent visual cues for each debugging phase 3. At end of scenario, show subgoal sequence diagram: "You just completed: Reproduce → Isolate → Hypothesize → Test" 4. In Reflect phase, ask learner to identify which subgoal each action served --- ### 11. Linking **Definition:** Highlight connections between instructional units. Integrate instructional components (✓) vs. No integration (✗). **Status:** ❌ Not Considered **Where in Explorable:** Not explicitly addressed in current design. **Gap:** As learners complete multiple scenarios, there's no explicit mechanism showing how concepts connect across scenarios. **Implementation idea:** 1. Add a "Concept Map" view in Elderleaf showing connections between scenarios 2. When introducing a new scenario, show "This builds on [previous scenario] because..." 3. In revision sessions, occasionally ask "How is this similar to [other scenario]?" 4. Create explicit "bridge" content between scenarios highlighting shared principles --- ### 12. Goldilocks **Definition:** Account for the player's prior knowledge and skill level when considering difficulty. Instruct at intermediate difficulty level (✓) vs. Too hard or too easy (✗). **Status:** ⚠️ Noted as Missing **Where in Explorable:** Fixed difficulty across all learners. **Problem identified:** We noted "Hints too aggressive (30s)" — this is a symptom of non-adaptive difficulty. 30s may be appropriate for some items/learners but not others. **Implementation idea:** 1. Add initial diagnostic to assess prior knowledge 2. Adaptive hint timing based on item difficulty and learner history 3. Allow learners to self-select difficulty: "I want to struggle longer" vs. "Give me more support" 4. Track success rate and adjust scenario complexity accordingly 5. Expertise reversal: reduce scaffolding as learner demonstrates mastery --- ### 13. Activate Preconceptions **Definition:** Prompt players to examine what they already know (or think they know) to help surface misconceptions. Cue student's prior knowledge (✓) vs. No prior knowledge cues (✗). **Status:** ✅ Implemented **Where in Explorable:** - Prediction prompts before seeing outcomes - Choice architecture forces learners to commit to what they think will happen - Misconceptions are captured for later remediation **Notes:** This is a strength of the design. Prediction error learning is well-supported. --- ### 14. Immediate Feedback **Definition:** Provide immediate feedback on errors. Immediate feedback on errors (✓) vs. Delayed feedback (✗). **Status:** ✅ Implemented **Where in Explorable:** - Predictions immediately compared to outcomes - Socratic dialogue provides immediate response to choices - AI tutor responds in real-time **Notes:** Well-implemented in Explore phase. Less clear in Build phase (see Principle 6). --- ### 15. Interleaving **Definition:** Practice different skills at the same time, or in alternation. Intermix practice on different skills (✓) vs. Block practice all at once (✗). **Status:** ⚠️ Noted as Missing (CRITICAL) **Where in Explorable:** Not currently implemented. Revision sessions focus on single scenario. **Problem identified:** Explicitly noted: "No interleaved practice across scenarios" — revising one scenario doesn't build transfer. Need mixing of problem types. **Implementation idea:** 1. Revision sessions should mix scenarios: network issue → race condition → resource exhaustion → back to network 2. Add "mixed practice" mode that randomly selects items across all completed scenarios 3. Interleave during initial learning too: don't complete entire Scenario A before starting Scenario B 4. Create "discrimination training" exercises: "Is this a network issue or a race condition?" --- ### 16. Application **Definition:** Challenge the player to put their newly acquired knowledge to use solving real problems. Practice applying new knowledge (✓) vs. No application (✗). **Status:** ✅ Implemented **Where in Explorable:** Build phase in VSCode with real tools and simulated traffic. **Notes:** This is the core purpose of the Build phase. Our critique noted some concerns about simulation fidelity and context-switching costs, but the application principle itself is addressed. --- ### 17. Variability **Definition:** Use multiple and varied examples to teach and practice the use of abstract concepts. Practice with varied instances (✓) vs. Similar instances (✗). **Status:** ⚠️ Noted as Missing **Where in Explorable:** Multiple scenarios exist, but: **Problem identified:** We noted "No explicit far transfer training" — learners may encode scenario-specific patterns without abstracting general principles. Variability exists but isn't _systematic_. **Implementation idea:** 1. Design scenarios to systematically vary surface features while maintaining deep structure 2. After 3-4 scenarios, add explicit abstraction prompt: "What's common across all these?" 3. Include "near transfer" and "far transfer" test scenarios 4. Vary the debugging context: sometimes start with logs, sometimes with metrics, sometimes with user report --- ## Category 3: Sense-making/Understanding (Principles 18-30) ### 18. Comparison **Definition:** Use multiple examples to explain a new concept, and highlight similarities and differences. Compare multiple instances (✓) vs. Only one instance (✗). **Status:** ⚠️ Noted as Missing **Where in Explorable:** Learner sees one path through the narrative. **Problem identified:** Explicitly noted: "No comparative/contrastive cases" — learner recalls their chosen path but doesn't see _why_ other paths fail. **Implementation idea:** 1. After completing scenario, show "What if you had chosen X?" comparison 2. Add "Comparison View" showing two paths side-by-side 3. In Reflect phase, explicitly compare correct vs. incorrect approaches 4. Use contrastive explanation: "This worked BECAUSE X, unlike option B which would have failed BECAUSE Y" --- ### 19. Multimedia **Definition:** Use visual aids to enhance verbal descriptions. Graphics + verbal descriptions (✓) vs. Verbal descriptions alone (✗). **Status:** ⚠️ Noted as Missing (CRITICAL) **Where in Explorable:** Text-only narrative in Prophetic. **Problem identified:** Explicitly noted: "Text-only (no dynamic visualizations)" — distributed systems require accurate mental models of timing, message passing, failure propagation. Text alone risks folk theories. **Implementation idea:** 1. Add sequence diagrams showing message flow between services 2. Include Lamport diagrams for timing/causality 3. Animate state machines showing system state transitions 4. Show real-time system topology as narrative progresses 5. Use dashboard visualizations from Build phase earlier in Explore phase --- ### 20. Modality Principle **Definition:** Present verbal descriptions in audio rather than text when possible (when visuals are present). Verbal descriptions presented in audio (✓) vs. In written form (✗). **Status:** ❌ Not Considered **Where in Explorable:** All verbal content is text-based. **Gap:** If visualizations are added (per #19), narration could be audio to reduce split attention. **Implementation idea:** 1. Add optional audio narration for narrative segments 2. When showing diagrams, provide audio explanation rather than text labels 3. Allow learner to choose modality preference 4. Use text for reference material that learners may want to re-read --- ### 21. Redundancy **Definition:** If you provide verbal descriptions in audio, refrain from also including them as text. Verbal descriptions in audio (✓) vs. Both audio & written (✗). **Status:** ❌ Not Considered **Where in Explorable:** Not applicable currently (no audio). **Gap:** If audio is added, avoid duplicating as text (except for accessibility options). **Implementation idea:** When implementing audio (per #20), don't show same text on screen. Do provide transcripts/captions for accessibility, but as opt-in. --- ### 22. Spatial Contiguity **Definition:** Position descriptive text as closely as possible to the part of the visualization it describes. Present description next to image element described (✓) vs. Separated (✗). **Status:** ❌ Not Considered **Where in Explorable:** Not applicable currently (no visualizations with labels). **Gap:** When diagrams are added, text labels should be proximate to relevant elements. **Implementation idea:** When adding system diagrams (#19): 1. Use inline labels on diagram elements, not legends below 2. For animated sequences, highlight relevant element when describing it 3. Avoid "see Figure 3" references — embed explanation at point of reference --- ### 23. Temporal Contiguity **Definition:** Present complementary audio and visual elements at the same time to encourage deeper processing. Present audio & image element at the same time (✓) vs. Separated (✗). **Status:** ❌ Not Considered **Where in Explorable:** Not applicable currently (no audio/visual pairing). **Gap:** If audio narration and visualizations are added, they should be synchronized. **Implementation idea:** When implementing audio + visuals: 1. Animate diagram elements in sync with narration 2. Highlight relevant components as they're discussed 3. Avoid "now look at the diagram we showed earlier" — show it again --- ### 24. Coherence **Definition:** Refrain from including extraneous words, pictures, and sounds that do not serve the learning objective. Extraneous elements excluded (✓) vs. Included (✗). **Status:** ❌ Not Considered (Likely OK) **Where in Explorable:** Not explicitly addressed, but text-based narrative is presumably focused. **Gap:** No specific issues identified, but worth auditing narrative for tangential content. **Implementation idea:** 1. Audit narrative for content that doesn't serve learning objectives 2. Avoid decorative graphics if/when visuals are added 3. Keep AI tutor responses focused on debugging concepts, not tangential explanations 4. Remove "interesting but irrelevant" system details --- ### 25. Anchored Learning **Definition:** Engage the player in realistic rather than theoretical problem-solving. Real-world problems (✓) vs. Abstract problems (✗). **Status:** ✅ Implemented **Where in Explorable:** - Simulated traffic on real dashboard - Actual VSCode with real debugging tools - Authentic distributed systems scenarios **Notes:** This is a design strength. Our critique noted some concerns about simulation fidelity (missing noise, time pressure), but the anchoring principle is addressed. --- ### 26. Metacognition **Definition:** Empower the player to self-reflect and self-correct. Metacognition supported (✓) vs. No support for metacognition (✗). **Status:** 🔶 Noted as Problem **Where in Explorable:** - AI feedback in Reflect phase - Self-scoring in test mode **Problem identified:** We noted "No metacognitive prompts during Build" — passive action tracking without reflection prompts. Also noted that free-form note-taking lacks structure for metacognitive support. **Implementation idea:** 1. Vestige prompts during Build: "What are you trying to find out?" "What would change your hypothesis?" 2. Structured note templates in Elderleaf with metacognitive prompts: "What confused me was..." "I now understand that..." 3. Calibration feedback: "You rated this 4/5 confidence but got it wrong — why?" 4. Periodic "what do you know so far?" checkpoints --- ### 27. Explanation **Definition:** Prompt the player to reason through the underlying explanatory principles. Prompt for self-explanation (✓) vs. Give explanation / No prompt (✗). **Status:** ⚠️ Noted as Missing **Where in Explorable:** Socratic dialogue asks questions, but doesn't explicitly prompt self-explanation. **Problem identified:** Explicitly noted: "No self-explanation prompts" — being asked questions ≠ generating your own explanations. Chi et al. showed self-explanation doubles learning gains. **Implementation idea:** 1. Before each choice: "In your own words, explain what you think will happen and why" 2. After seeing outcome: "Explain why this happened" before AI provides explanation 3. In Reflect phase: "Explain this concept to a junior engineer" prompt 4. Require typed explanation before proceeding, not just selection --- ### 28. Questioning **Definition:** Empower your player to embrace and explore uncertainties they encounter. Time for reflection & questioning (✓) vs. Instruction alone (✗). **Status:** 🔶 Noted as Problem **Where in Explorable:** Socratic dialogue provides questions, but learner-initiated questioning is limited. **Problem identified:** Choice architecture is instructor-led. Learner selects from options but doesn't generate their own questions. **Implementation idea:** 1. Add "I have a question" button that lets learner ask AI tutor anything 2. Before choices appear, prompt: "What questions do you have at this point?" 3. Track questions asked and revisit unanswered ones 4. Reward curiosity: "Good question!" for insightful learner queries 5. Add "explore tangent" paths for learners who want to dig deeper --- ### 29. Cognitive Dissonance **Definition:** Present incorrect or alternative perspectives of the subject matter to afford opportunities for reflection. Present incorrect or alternative perspective (✓) vs. Only correct (✗). **Status:** ✅ Implemented **Where in Explorable:** - Productive failure: learner predictions often wrong - Choices include incorrect options - Misconceptions are surfaced and addressed **Notes:** This is core to the productive failure approach. Seeing predictions fail creates the cognitive dissonance that drives learning. --- ### 30. Interest **Definition:** Engage your player by structuring content around their interests and goals. Instruction relevant to student interests (✓) vs. Not relevant (✗). **Status:** 🔶 Noted as Problem **Where in Explorable:** Scenarios are presumably relevant to target audience (distributed systems engineers). **Problem identified:** We noted "No affective scaffolding" — debugging is frustrating, and there's no explicit support for emotional regulation or motivational maintenance. **Implementation idea:** 1. Allow learner to choose scenario context (e-commerce, gaming, fintech) based on interests 2. Add progress indicators and achievement milestones 3. Acknowledge difficulty: "This is hard — that's normal" prompts 4. Connect to learner's stated goals: "This skill will help you with [goal]" 5. Include variety: some quick wins, some challenging scenarios --- ## Summary Table |#|Principle|Status|Category| |---|---|---|---| |1|Spacing|✅ Implemented|Memory/Fluency| |2|Scaffolding|🔶 Problem noted|Memory/Fluency| |3|Exam Expectations|✅ Implemented|Memory/Fluency| |4|Quizzing|✅ Implemented|Memory/Fluency| |5|Segmenting|✅ Implemented|Memory/Fluency| |6|Feedback|🔶 Problem noted|Memory/Fluency| |7|Pretraining|⚠️ Missing|Induction/Refinement| |8|Worked Example|⚠️ Missing (CRITICAL)|Induction/Refinement| |9|Concreteness Fading|🔶 Problem noted|Induction/Refinement| |10|Guided Attention|⚠️ Missing|Induction/Refinement| |11|Linking|❌ Not considered|Induction/Refinement| |12|Goldilocks|⚠️ Missing|Induction/Refinement| |13|Activate Preconceptions|✅ Implemented|Induction/Refinement| |14|Immediate Feedback|✅ Implemented|Induction/Refinement| |15|Interleaving|⚠️ Missing (CRITICAL)|Induction/Refinement| |16|Application|✅ Implemented|Induction/Refinement| |17|Variability|⚠️ Missing|Induction/Refinement| |18|Comparison|⚠️ Missing|Sense-making/Understanding| |19|Multimedia|⚠️ Missing (CRITICAL)|Sense-making/Understanding| |20|Modality Principle|❌ Not considered|Sense-making/Understanding| |21|Redundancy|❌ Not considered|Sense-making/Understanding| |22|Spatial Contiguity|❌ Not considered|Sense-making/Understanding| |23|Temporal Contiguity|❌ Not considered|Sense-making/Understanding| |24|Coherence|❌ Not considered (likely OK)|Sense-making/Understanding| |25|Anchored Learning|✅ Implemented|Sense-making/Understanding| |26|Metacognition|🔶 Problem noted|Sense-making/Understanding| |27|Explanation|⚠️ Missing|Sense-making/Understanding| |28|Questioning|🔶 Problem noted|Sense-making/Understanding| |29|Cognitive Dissonance|✅ Implemented|Sense-making/Understanding| |30|Interest|🔶 Problem noted|Sense-making/Understanding| --- ## Statistics |Status|Count|Percentage| |---|---|---| |✅ Implemented|10|33%| |🔶 Problem noted|8|27%| |⚠️ Missing|8|27%| |❌ Not considered|4|13%| **Coverage by Category:** - Memory/Fluency (1-6): 4 implemented, 2 problems = 67% addressed - Induction/Refinement (7-17): 4 implemented, 1 problem, 5 missing, 1 not considered = 45% addressed - Sense-making/Understanding (18-30): 2 implemented, 5 problems, 2 missing, 4 not considered = 54% addressed --- ## Priority Recommendations ### Critical (Must Fix) 1. **#8 Worked Example**: Add expert modeling before productive failure 2. **#15 Interleaving**: Mix scenario types in revision sessions 3. **#19 Multimedia**: Add dynamic visualizations for distributed systems concepts ### High Priority (Should Fix) 4. **#7 Pretraining**: Add primer/orientation phase for novices 5. **#10 Guided Attention**: Add subgoal labeling to narrative 6. **#27 Explanation**: Add explicit self-explanation prompts 7. **#18 Comparison**: Add contrastive case analysis ### Medium Priority (Nice to Have) 8. **#12 Goldilocks**: Implement adaptive difficulty 9. **#17 Variability**: Systematize variation for transfer 10. **#11 Linking**: Add concept map connecting scenarios 11. **#6 Feedback**: Add metacognitive prompts during Build phase ### Lower Priority (Consider Later) 12. **#20-23 Modality/Contiguity**: Implement with audio if adding multimedia 13. **#28 Questioning**: Enable learner-initiated questions 14. **#30 Interest**: Add scenario personalization and affective support