Research

MicroSafe-RL: A Safety Layer for AI on Microcontrollers, in Under 2 Microseconds

April 3, 2026 2 min read

When a reinforcement learning agent (an AI that learns by trial and error) runs on a microcontroller controlling physical hardware, things can go wrong in ways that software-only AI never faces. Mechanical wear, sensor noise, and thermal drift push the hardware into states the AI never encountered during training. The AI's response? Sometimes it issues extreme control commands that physically destroy the equipment.

MicroSafe-RL v1.0, released this week by developer Dimitar Kretski, is a safety interception layer that sits between the AI agent and the hardware. It monitors commands in real time and clamps dangerous ones before they reach the motors or actuators.

The performance numbers are the headline here: 1.18 microseconds worst-case execution time on an STM32 Cortex-M3 running at 72MHz. That's roughly 85 clock cycles. The entire library uses 20 bytes of RAM. It's a header-only C++ library with zero external dependencies, meaning you add a single #include line to your existing code.

The approach is statistical rather than physics-based. You record about 2 minutes of normal motor operation, run a profiling script that generates C++ configuration parameters, and deploy. The system uses exponential moving averages and mean absolute deviation to detect when the hardware's behavior drifts outside its "Operational Stability Signature" - essentially, what normal looks like. When it detects anomalies, it clamps the AI's output to safe ranges.

Supported platforms include STM32, ESP32, and Arduino (where latency rises to 5-6 microseconds at Arduino's slower 16MHz clock). On the software side, it integrates with standard RL frameworks like Stable Baselines3 and Gymnasium for training-time testing.

This is very early-stage. The repository has 8 commits, zero stars, and no academic paper backing the approach. The license is commercial/proprietary, not open source, and Kretski is actively looking for pilot partners in robotics and industrial automation.

For the niche it targets - embedded RL on resource-constrained hardware - the resource efficiency is striking. Twenty bytes of RAM and deterministic O(1) execution with no dynamic memory allocation is the kind of constraint-driven engineering that microcontroller developers care about. Whether the statistical detection approach holds up across diverse real-world failure modes is the open question.

More from today

Claude AI Discovers Remote Code Execution Bugs in Vim and Emacs

Adafruit Tests Show LLMs Reproduce Open-Source Code Verbatim at High Rates

ETH Zurich Study: LLMs Can Identify Anonymous Users for $4 a Person

Cookie Preferences