18 Nov 2025 · Leggi in Italiano

How We Achieved 100% Accuracy in 97 KB on an ESP32

A deep dive into deploying a predictive maintenance model on a $4 microcontroller — no cloud, no GPU, no compromise.

The Challenge

Predictive maintenance is one of the most impactful applications of machine learning in industry. Detect a failing motor before it fails, and you save thousands in downtime. But the standard approach — streaming sensor data to the cloud for inference — introduces latency, connectivity dependencies, and recurring costs that make it impractical for most deployments.

We asked ourselves: can we run the entire inference pipeline on a $4 ESP32, using less than 100 KB of flash?

The Dataset

We used vibration sensor data from industrial motors — accelerometer readings sampled at 1 kHz across three axes. The task: classify the motor state into one of four categories:

Normal — healthy operation
Imbalance — rotor weight distribution issue
Misalignment — shaft coupling problem
Bearing fault — early-stage bearing degradation

The dataset contained 12,000 labeled windows of 256 samples each.

Why Not Just Use TensorFlow Lite?

TFLite Micro is a solid framework, but it comes with trade-offs. The runtime alone occupies 50-100 KB of flash depending on which operators you include. For a model that needs to fit in 97 KB total — runtime included — we needed a different approach.

Luviner uses a proprietary inference engine optimized for quantized arithmetic on ARM Cortex-M and Xtensa (ESP32) architectures. The runtime overhead is under 8 KB, leaving the remaining space entirely for model weights and application logic.

The Model Architecture

Instead of a conventional CNN or LSTM, we used an ultra-compact Neural Network. This architecture offers three advantages on microcontrollers:

Fixed memory footprint — no hidden states that grow with sequence length
Temporal dynamics — the network adapts to input timing naturally, ideal for sensor data
Quantization-friendly — the weight distributions are well-suited for integer arithmetic

The final model uses an ultra-compact proprietary architecture, followed by a single classification head with softmax output.

Quantization: From Float32 to Int8

The float32 model achieved 100% accuracy on the test set. The question was: would quantization destroy that accuracy?

We applied post-training quantization with calibration on 500 representative samples. The key insight: we quantize not just the weights, but also the activations, using per-channel scale factors that minimize the quantization error at each layer.

Result: 100% accuracy preserved after quantization. The model size dropped from 380 KB (float32) to 89 KB (int8) — a 4.3x reduction with zero accuracy loss.

Memory Layout on ESP32

Here is how the 97 KB breaks down on the ESP32:

Inference runtime: 7.8 KB
Model weights (int8): 89.2 KB
Total flash: 97 KB
RAM at inference: 4.1 KB (activations buffer)

The ESP32 has 4 MB of flash and 520 KB of SRAM. Our model uses 2.4% of the available flash and 0.8% of the RAM. This leaves plenty of room for the application firmware, connectivity stack, and sensor drivers.

Inference Performance

On an ESP32 running at 240 MHz:

Inference time: 1.2 ms per window
Throughput: 833 predictions per second
Power consumption: 12 mW during inference

For comparison, sending the same data to a cloud endpoint would take 50-200 ms (depending on connectivity), consume 100-500 mW for the radio transmission, and require a persistent internet connection.

Try It Yourself

We have published an interactive simulation of this exact model running on a virtual ESP32. You can see the sensor readings, the inference results, and the classification output in real time.

Try the live demo →

Key Takeaways

You do not need a GPU or cloud connection to run ML inference. A $4 microcontroller is enough.
Quantization does not have to mean accuracy loss — with proper calibration, int8 models can match float32.
The inference runtime matters as much as the model. A lean runtime leaves more space for your model.
Edge AI is not a future technology. It is production-ready today.

If you are building a product that needs on-device intelligence, Luviner can help. We handle the hard parts — model optimization, quantization, and deployment — so you can focus on your application.