Quantized Tiny Recursive Models for Edge Deployment

Last updated on 13, Nov, 2025 MSc Thesis

Overview

Tiny Recursive Models (TRMs) are naturally compact and recur over their latent state, making them attractive for ultra-efficient deployment on microcontrollers. This thesis develops low-precision (INT8/INT4) TRMs, analyses their stability under recursion and deep supervision, and deploys them on real hardware (GAP9 or Cortex-M) to study latency, memory, and energy trade-offs together with adaptive halting.

Expected Outcomes

Reference PyTorch TRM implementation ready for quantisation (building on the time-series thesis or existing TRM code)
Post-training and quantisation-aware training (QAT) pipelines (INT8, INT4 / mixed precision) with detailed accuracy–precision curves
Investigation of how quantisation affects recursive dynamics and deep supervision signals
Deployment of a quantised TRM on GAP9 or Cortex-M, with measured latency, memory footprint, and energy per inference
Evaluation of learned halting on-device versus fixed-compute baselines
(Optional) Technical report or workshop paper summarising quantisation + deployment findings

Prerequisites

Strong programming skills in Python and C/C++
Solid ML/DL foundations (PyTorch or equivalent)
Interest in embedded systems / edge AI and willingness to work with hardware toolchains
Prior experience with quantisation or model compression is a plus but not required

Tools & Skills

Python, PyTorch, quantisation frameworks (PyTorch built-ins, Brevitas, etc.)
C/C++ for embedded deployment (GAP SDK / PMSIS, or ARM microcontroller toolchain)
Linux, Git, scripting for experiment automation
Access to lab tools for power/latency measurements (provided)

What You Will Learn

How TRMs behave under low-precision arithmetic and recursive refinement
Practical post-training and QAT techniques for sequence / recursive models
Mapping compact neural networks onto MCUs or PULP SoCs while respecting memory and timing constraints
Measuring and interpreting energy/latency/accuracy trade-offs for edge deployment
Integrating ML algorithms with embedded software stacks to deliver a full-stack thesis

Application

Please email thoriri@iis.ee.ethz.ch with the subject "[MSc Thesis] Quantized Tiny Recursive Models for Edge Deployment". Attach a short motivation, CV, transcripts, and note any experience with PyTorch, quantisation, or embedded platforms.

Related Links

TRM paper (Jolicoeur-Martineau et al.): https://arxiv.org/abs/2510.04871
PULP Platform overview: https://pulp-platform.org/
IIS foundation model work (LUNA, FEMBA, CEReBrO)

Tiny Recursive Models Quantization Edge AI Microcontrollers GAP9