LUVINER
Live Demo Benchmarks Docs Edge AI About Blog IT Log in
← Back to blog
18 Nov 2025 · Leggi in Italiano

How We Achieved 100% Accuracy in 97 KB on an ESP32

A deep dive into deploying a predictive maintenance model on a $4 microcontroller — no cloud, no GPU, no compromise.

The Challenge

Predictive maintenance is one of the most impactful applications of machine learning in industry. Detect a failing motor before it fails, and you save thousands in downtime. But the standard approach — streaming sensor data to the cloud for inference — introduces latency, connectivity dependencies, and recurring costs that make it impractical for most deployments.

We asked ourselves: can we run the entire inference pipeline on a $4 ESP32, using less than 100 KB of flash?

The Dataset

We used vibration sensor data from industrial motors — accelerometer readings sampled at 1 kHz across three axes. The task: classify the motor state into one of four categories:

  • Normal — healthy operation
  • Imbalance — rotor weight distribution issue
  • Misalignment — shaft coupling problem
  • Bearing fault — early-stage bearing degradation

The dataset contained 12,000 labeled windows of 256 samples each.

Why Not Just Use TensorFlow Lite?

TFLite Micro is a solid framework, but it comes with trade-offs. The runtime alone occupies 50-100 KB of flash depending on which operators you include. For a model that needs to fit in 97 KB total — runtime included — we needed a different approach.

Luviner uses a proprietary inference engine optimized for quantized arithmetic on ARM Cortex-M and Xtensa (ESP32) architectures. The runtime overhead is under 8 KB, leaving the remaining space entirely for model weights and application logic.

The Model Architecture

Instead of a conventional CNN or LSTM, we used an ultra-compact Neural Network. This architecture offers three advantages on microcontrollers:

  • Fixed memory footprint — no hidden states that grow with sequence length
  • Temporal dynamics — the network adapts to input timing naturally, ideal for sensor data
  • Quantization-friendly — the weight distributions are well-suited for integer arithmetic

The final model uses an ultra-compact proprietary architecture, followed by a single classification head with softmax output.

Quantization: From Float32 to Int8

The float32 model achieved 100% accuracy on the test set. The question was: would quantization destroy that accuracy?

We applied post-training quantization with calibration on 500 representative samples. The key insight: we quantize not just the weights, but also the activations, using per-channel scale factors that minimize the quantization error at each layer.

Result: 100% accuracy preserved after quantization. The model size dropped from 380 KB (float32) to 89 KB (int8) — a 4.3x reduction with zero accuracy loss.

Memory Layout on ESP32

Here is how the 97 KB breaks down on the ESP32:

  • Inference runtime: 7.8 KB
  • Model weights (int8): 89.2 KB
  • Total flash: 97 KB
  • RAM at inference: 4.1 KB (activations buffer)

The ESP32 has 4 MB of flash and 520 KB of SRAM. Our model uses 2.4% of the available flash and 0.8% of the RAM. This leaves plenty of room for the application firmware, connectivity stack, and sensor drivers.

Inference Performance

On an ESP32 running at 240 MHz:

  • Inference time: 1.2 ms per window
  • Throughput: 833 predictions per second
  • Power consumption: 12 mW during inference

For comparison, sending the same data to a cloud endpoint would take 50-200 ms (depending on connectivity), consume 100-500 mW for the radio transmission, and require a persistent internet connection.

Try It Yourself

We have published an interactive simulation of this exact model running on a virtual ESP32. You can see the sensor readings, the inference results, and the classification output in real time.

Try the live demo →

Key Takeaways

  • You do not need a GPU or cloud connection to run ML inference. A $4 microcontroller is enough.
  • Quantization does not have to mean accuracy loss — with proper calibration, int8 models can match float32.
  • The inference runtime matters as much as the model. A lean runtime leaves more space for your model.
  • Edge AI is not a future technology. It is production-ready today.

If you are building a product that needs on-device intelligence, Luviner can help. We handle the hard parts — model optimization, quantization, and deployment — so you can focus on your application.


Related articles

14 Mar 2026
Enterprise-Grade Mesh: 5 Features That Make Distributed AI Production-Ready
14 Mar 2026
Mesh Intelligence: When Your Sensors Form a Distributed Nervous System
14 Mar 2026
Anomaly Detection Without Fault Data: How Luviner Enables Predictive Maintenance from Day One
Pricing Contact Terms of Service Privacy Policy End User License Agreement

© 2026 Luviner. Edge AI for every device.

P.IVA / VAT ID: IT02880910340