Skip to main content
Engineering Excellence

Ingenuity

Ten engineering solutions where deep understanding met real constraints. Not theoretical exercises — production systems running in the field. Each one a moment where the "impossible" requirement met the engineer willing to break it down to the DNA level.

Embedded Systems: 5

Software & Pipelines: 3

Architecture & Cost: 2

The Approach

How These Solutions Happen

These aren't accidents. They emerge from a consistent approach to problem-solving.

01

Understand the DNA

Break every problem down to its fundamental components. The elegant solution reveals itself when you see the actual constraints, not the assumed ones. What looks impossible at the surface level often becomes obvious at the atomic level.

02

Question Every Component

That specialty IC? Maybe firmware can do it. That expensive MCU? Maybe you're using features you don't need. Every component earns its place on the BOM or gets eliminated. Nothing is sacred except the requirement itself.

03

Learn What's Needed

No CMake experience? Learn it. No C++ background? Learn it. No Nordic SDK knowledge? Learn it. The solution dictates the skills required, not the other way around. "I don't know how" is the starting point, not the conclusion.

The Work

10 Engineering Innovations

01

Embedded Architecture

Dual SPI Bus Display Architecture

30 seven-segment displays controlled with just 6 GPIOs

The GPIO Problem

Controlling 30 seven-segment displays using conventional methods requires 38 GPIOs minimum — 8 for segment lines, 30 for digit selection. No sub-$5 microcontroller has that many pins available. The typical solution is either a more expensive MCU with enough pins, or dedicated display driver ICs that add cost and complexity to the BOM. For the 3-phase power monitor project, neither option was acceptable. The system needed to show voltage, current, power, frequency, and phase information across 30 digits while keeping costs low enough for production viability.

The Realization

SPI is just synchronized serial data. Most MCUs have multiple SPI peripherals. What if two SPI buses shared the same clock timing but carried completely different data streams? One bus could handle segment data (which segments to light), while the other handles digit selection (which digit is currently active). Both operations need to happen simultaneously during multiplexing — and with shared timing, they do.

The Execution

  • First SPI bus drives a single 74HC595 shift register for the 8 segment lines (a-g plus decimal point)
  • Second SPI bus drives 5 cascaded 74HC595 shift registers for 30 digit selection lines, plus additional outputs for status LEDs
  • Both buses share clock timing — when segment data shifts out, digit selection shifts out simultaneously
  • The MCU's native SPI hardware handles all timing — firmware just loads the data and triggers the transfer
  • Multiplexing runs fast enough that all 30 digits appear continuously lit with no visible flicker
  • Total additional components: 6 shift registers at ~$0.50 each

Impact

6

GPIO Pins Required

vs 38 with direct drive

<$2

MCU Cost

ESP32 or entry-level STM32

~$3

Added BOM Cost

6× 74HC595 shift registers

Many

Production Units

Deployed in 3-phase power monitors

Technologies

ESP3274HC595SPI ProtocolHardware MultiplexingShift Register Cascade

Outcome

This architecture is now the standard for all WURS10 multi-digit display products. The same 6-GPIO footprint scales from simple displays to complex instrumentation panels.

02

Resource Optimization

The Mega-Multiplexer

8 analog inputs, 3-digit display, 3 buttons, 12 AC dimmers — all on an ATmega16

The Pin Count Impossibility

A professional lighting dimmer controller needed: 8 potentiometers for level control, a 3-digit seven-segment display for channel feedback, 3 navigation buttons, 12 independent AC dimmer outputs (each capable of 7kW), zero-crossing detection for each phase, and UART for DMX communication. Adding up the requirements: 8 ADC pins for pots, 11 GPIOs for the display (8 segments + 3 digits), 3 for buttons, 12 for dimmer gate drives, 3 for zero-crossing inputs, 2 for UART. That's 39 pins minimum. The ATmega16 has 32 I/O pins total. The math doesn't work. A larger MCU would break the cost target.

The Decoder Topology

A 74LS138 is a 3-to-8 decoder — 3 input pins select which of 8 outputs goes active. A 4051 is an 8-channel analog multiplexer — the same 3 select pins choose which of 8 analog inputs connects to a single output. What if both chips shared the same 3 select lines? At any given moment, the decoder activates one digit of the display while the multiplexer connects one potentiometer to the ADC. The same 3 GPIOs control both operations simultaneously. Time-slice through all 8 positions, and you've read 8 pots and refreshed 3 display digits using just 3 select pins + 1 ADC + 8 segment lines. The buttons? They're wired to decoded outputs too — scan them during the same time-slicing. No dedicated pins needed.

The Execution

  • 3 GPIO pins drive both the 74LS138 decoder and 4051 analog mux simultaneously
  • 8 potentiometers read through a single ADC channel — the mux routes each pot to the ADC in sequence
  • Display multiplexing happens on the same timing — as the mux cycles, so does digit selection
  • Buttons are scanned by checking their state only when their corresponding select line is active
  • 12 channels of software PWM synchronized to 50Hz/60Hz zero-crossing detection
  • Each dimmer channel uses forward-phase control with proper gate drive timing
  • Flicker-free operation through precise interrupt timing aligned to AC line frequency
  • Full 7kW capacity per channel — 84kW total system capacity

Impact

~20

GPIOs Used

Performing the work of 39+

ATmega16

MCU

16KB flash, 1KB RAM, ~$1.50

84kW

AC Capacity

12 channels × 7kW each

2010

In Production Since

15+ years running

Technologies

ATmega1674LS138 Decoder4051 Analog MuxZero-Cross DetectionPhase-Control DimmingSoft PWM

Outcome

This dimmer design has been in continuous production since 2010. Units deployed over a decade ago are still running. The architecture proves that "impossible" pin counts are just unsolved topology problems.

03

Protocol Engineering

Bulletproof Edge Logger

Guaranteed data delivery when MQTT's QoS isn't enough

The Reliability Gap

An industrial pay-per-use machine contained proprietary technology that needed protection from tampering. Every access, every anomaly, every sensor reading had to be logged and transmitted to a central server — with zero data loss tolerance. Standard MQTT QoS levels weren't sufficient: - QoS 0: Fire and forget. No delivery guarantee at all. - QoS 1: At least once delivery — but only while connected. WiFi drops? Message lost. Broker unreachable? Lost. Power cycle? Lost. - QoS 2: Exactly once — same connectivity dependency. The machine operated in industrial environments with unreliable connectivity. A sophisticated operator could potentially disconnect the network, tamper with the machine, reconnect, and leave no trace. Unacceptable.

Application-Layer Guarantee

MQTT's QoS is a transport-layer guarantee — it only works while the connection exists. True guaranteed delivery requires application-layer acknowledgment. The solution: store locally first, transmit second, delete only after explicit confirmation. The device becomes the source of truth, not the broker. Network goes down? Keep logging locally. Network comes back? Sync everything with sequence numbers. Server confirms receipt? Only then delete from local storage. Add heartbeat monitoring so the server knows when a device goes silent — silence itself becomes a logged event.

The Execution

  • All events written to SPIFFS filesystem first — survives power loss, network outages, and reboots
  • Custom log format optimized for size: timestamps, event codes, and sensor data compressed to maximize storage
  • Nested queue system: events queue for transmission, transmitted events wait for ACK, only confirmed events get deleted
  • Heartbeat protocol: device sends periodic "I'm alive" messages; server flags any gaps as potential tampering
  • Custom acknowledgment layer on top of MQTT: server explicitly confirms each log batch with sequence numbers
  • WiFiManager integration for field configuration — no hardcoded credentials
  • Custom ESP32 partition table: minimized code partition, maximized SPIFFS for log storage
  • RS485 interface for industrial communication with the protected machine
  • Battery backup with charge monitoring — logs the power state too
  • Modular design: core logger firmware reusable across different machine types

Impact

500K-1M

Log Capacity

16MB ESP32, optimized format

0%

Data Loss

By design, not by luck

>QoS 2

Delivery Guarantee

Application-layer verification

2020

Running Since

5+ years continuous operation

Technologies

ESP32SPIFFSMQTTRS485WiFiManagerCustom ACK ProtocolBattery Management

Outcome

The system has been protecting industrial equipment since 2020. In five years of operation, zero data loss events. Every tamper attempt, every anomaly, every access — logged and verified.

04

Performance Engineering

The OpenCV Evolution

From laggy Python script to real-time C++ with GPU acceleration

The Latency Wall

A burst-fire detection system needed to track projectile impacts at extreme speeds — identifying shot placement with sub-5mm accuracy while rounds were still landing. The detection algorithm worked, but the implementation was too slow. Initial Python + OpenCV implementation processing a single 1280×960 @ 30fps RTSP camera feed had noticeable lag. When the system needed to support a second camera for stereoscopic analysis, the lag became unusable. Frames were dropping, detection was missing shots, and the entire premise of real-time analysis was failing. The conventional advice: "Buy a more powerful computer." But the algorithm itself wasn't the bottleneck — the infrastructure around it was.

Peel the Onion

Performance problems are layered. You don't solve them by throwing hardware at the symptom — you solve them by identifying each layer's bottleneck and eliminating it. Layer 1: RTSP handling. OpenCV's default VideoCapture uses FFmpeg with settings optimized for compatibility, not latency. Layer 2: Python's GIL. Even with "threading," Python can't truly parallelize CPU-bound work. Layer 3: Python itself. Interpreter overhead, memory management, function call overhead — death by a thousand cuts. Layer 4: CPU vs GPU. Image processing is embarrassingly parallel — exactly what GPUs excel at. Each layer needed to be addressed in sequence.

The Execution

  • Stage 1 — GStreamer Integration: Discovered OpenCV's default RTSP handling was the first bottleneck. GStreamer offers hardware-accelerated decoding and lower-latency pipelines. But OpenCV doesn't ship with GStreamer support by default.
  • Stage 2 — Building from Source: No prior experience with CMake or building complex projects from source. Learned the entire toolchain to compile OpenCV with GStreamer support. First major lag reduction achieved.
  • Stage 3 — Multithreading: Second camera reintroduced lag. Diagnosis: capture and processing were serialized. Implemented proper threading to decouple frame capture from frame processing. Each camera gets its own capture thread; processing runs independently.
  • Stage 4 — C++ Rewrite: Python overhead became the new ceiling. No C++ experience prior to this project. Rewrote the entire application in C++ anyway. Function call overhead eliminated. Memory management explicit and controlled. Processing time dropped dramatically.
  • Stage 5 — GPU Acceleration: Final optimization — moved applicable OpenCV operations to GPU using CUDA. Matrix operations, filtering, and detection algorithms now run on dedicated hardware.

Impact

Near-zero

Final Latency

Real-time burst tracking achieved

2+

Cameras Supported

Concurrent multi-stream processing

<5mm

Accuracy Achieved

At burst-fire speeds

5+

Skills Acquired

GStreamer, CMake, Threading, C++, CUDA

Technologies

C++OpenCVGStreamerCUDAMultithreadingRTSPCMake

Outcome

The system now tracks projectile impacts in real-time, even during burst fire. What started as "I don't know C++" became a production-grade computer vision pipeline. The lesson: the solution dictates the skills, not the other way around.

05

Pipeline Design

Universal Target Scoring Pipeline

Any target shape, any scoring zones — solved with data, not code

The Geometry Limitation

An Unreal Engine shooting simulator needed to score hits on various target types. The initial requirement was simple: bullseye targets with concentric rings. Distance from center determines score. Basic geometry. Then came silhouette targets. Human-shaped outlines with anatomically-defined scoring zones. Head shots score differently than torso hits. Limbs have their own values. Distance-from-center math is useless — the zones aren't circular. Then came irregular targets, asymmetric shapes, targets with holes, targets with non-contiguous scoring regions. Each new target type would require custom collision geometry, custom scoring logic, custom Blueprint code. This doesn't scale. Every new target becomes a development project.

The Image Already Knows

Every target image inherently contains its zone information — zones are just regions of pixels. What if the scoring system didn't define zones in code, but extracted them from the image itself? Split a target image into separate PNGs — one per scoring zone. Keep the canvas size and origin point identical across all images. When stacked, they recreate the original target perfectly. Now each zone is just a collection of pixel coordinates. Export those coordinates. At runtime, a hit is just a coordinate lookup: "Which zone contains this X,Y position?" No geometry. No collision meshes. No code per target. Just data.

The Execution

  • Target images split into individual zone PNGs using any image editing software (Photoshop, GIMP, etc.)
  • Critical: all zone PNGs maintain identical canvas dimensions and origin point (0,0). When overlaid, they perfectly recreate the original target.
  • Python script processes each zone PNG: reads every pixel, identifies pixels with value (zone exists), exports X,Y coordinates to CSV
  • CSV files become Unreal Engine data assets — loaded at runtime, not compiled into code
  • Hit detection in Blueprint: get UV coordinates of hit location, iterate through zone data, find which zone contains that coordinate
  • Scoring logic reads zone values from data table — changing point values requires no code changes
  • New target workflow: split image → run script → drop CSV into project → assign to target actor. Done.

Impact

Minutes

New Target Setup

Split, run script, import CSV

Zero

Code Changes for New Target

Purely data-driven

8+

Target Types Created

And counting — no limit

100%

Reusability

Same pipeline, any shape

Technologies

Unreal EngineBlueprintPythonCSV Data AssetsUV Coordinate MappingImage Processing

Outcome

The shooting simulator now supports any target type the training program requires. New targets are created by artists using familiar image editing tools, not by engineers writing code. The pipeline that solved one target problem solved all target problems.

06

Asset Reuse

Pipeline Reuse: Zone Visualization

Same solution, different application — from scoring to visual feedback

The Training Interface

The range control application needed visual feedback for instructors. When selecting a scoring zone from a dropdown or list, the corresponding region on the target should highlight. Instructors needed to see exactly which zone they were configuring — "Zone 3" means nothing without visual context. Building a new zone highlighting system from scratch seemed necessary. Different application (PySide6 desktop app vs Unreal Engine game), different runtime, different rendering approach.

Assets Are Universal

The target scoring pipeline had already solved the hard problem: defining zones as separate image layers with aligned coordinates. Those zone PNGs exist. They're already created. They already perfectly align with the target image. The same assets that define scoring zones can define visual overlays. Select "Zone 3" → display the Zone 3 PNG as a semi-transparent overlay on the target image. No new zone definition work. No coordinate remapping. Just image compositing. The DNA-level solution (zones as aligned image layers) manifests wherever needed.

The Execution

  • Existing zone PNG assets from the scoring pipeline imported directly into the PySide6 project
  • Target image displayed as base layer in Qt widget
  • Zone selection (dropdown, list, or click) triggers overlay display
  • Selected zone PNG rendered as semi-transparent overlay on the target
  • Multiple zones can be highlighted simultaneously for complex configurations
  • Same asset update workflow: if zone boundaries change, update the PNG — both applications automatically reflect the change

Impact

Hours

Development Time

Not days — assets existed

100%

Asset Reuse

Identical zone PNGs

Single source

Maintenance Burden

Update once, applies everywhere

2

Frameworks Spanned

Unreal Engine + PySide6

Technologies

PySide6PythonQt Image CompositingShared Asset Pipeline

Outcome

The zone overlay feature in the range control application was implemented in hours, not days. The lesson: when you solve a problem at the fundamental level, the solution travels with you to every project that shares that fundamental structure.

07

Supply Chain Independence

Eliminating the ADE9000

When the metering IC price exploded, firmware became the replacement

The Supply Chain Crisis

A 3-phase energy meter product used the ADE9000 — a dedicated power metering IC from Analog Devices. It handles voltage and current measurement, true RMS calculation, power factor computation, energy accumulation, and harmonic analysis. One chip does everything. The product was in production. Clients were happy. Then global supply chain disruptions hit. The ADE9000 price didn't just increase — it became unpredictable, unavailable, and when available, prohibitively expensive. The BOM cost made the product unviable. The conventional response: find an equivalent metering IC from another vendor, redesign the hardware, requalify the product. Still dependent on specialty silicon. Still vulnerable to supply chain.

The IC Is Just Doing Math

What does the ADE9000 actually do? It samples voltage and current waveforms at high speed, performs digital signal processing to calculate RMS values, multiplies V×I to get power, tracks phase relationships for power factor, and integrates power over time for energy. That's not magic. That's math. Math that any sufficiently capable MCU can perform — if it has fast enough ADCs and enough processing headroom. The STM32H7 series has 16-bit ADCs with simultaneous sampling, DSP instructions, and significant processing overhead. What if the metering IC was replaced with firmware running on a more capable general-purpose MCU?

The Execution

  • MCU upgrade: STM32G072 (original host processor) → STM32H7A3 (marginally higher cost, dramatically more capability)
  • Analog frontend design: precision resistor dividers for voltage sensing, CT (current transformer) inputs with proper burden resistors, bias circuits for single-supply ADC operation
  • Dual ADC configuration: simultaneous sampling of voltage and current channels eliminates phase error from sequential sampling
  • Internal voltage reference used for calibration stability — no external reference IC needed
  • Firmware DSP implementation: true RMS calculation using sum-of-squares integration over AC cycle, optimized for the Cortex-M7 DSP instructions
  • Power calculations: active power (real), reactive power, apparent power, all derived from properly synchronized V and I samples
  • Power factor calculated from phase relationship between voltage and current zero-crossings
  • Frequency measurement from voltage zero-crossing period
  • Phase sequence detection for 3-phase systems
  • Calibration mechanism: gain and offset correction stored in flash, adjustable per-channel
  • Noise floor characterization and removal for accuracy at low signal levels

Impact

Zero

Specialty IC Dependency

ADE9000 completely eliminated

Minimal

Supply Chain Risk

STM32H7 is widely available

3 weeks

Development Time

Hardware + firmware + validation

Active

Production Status

Currently shipping

Technologies

STM32H7A3Analog Frontend DesignDSPTrue RMS AlgorithmsCalibration SystemsSimultaneous Sampling ADC

Outcome

The product is back in production with no specialty metering IC. The design is arguably better — full control over algorithms, calibration, and update capability via firmware. A supply chain crisis became an opportunity to build a more robust product.

08

Cost Engineering

F103 → G030: Same Features, Quarter Resources

COVID pricing forced a migration that proved we never needed those resources

The Price Shock

An SPI pixel LED controller drove 2048+ pixels with DMX input, master/slave operation for daisy-chaining multiple units, and flexible output configuration. Built on the STM32F103 — a capable Cortex-M3 with 64KB flash, 20KB RAM, and a comfortable development experience. Then COVID-era chip shortages hit. The STM32F103 price jumped from ~$2 to $10-15+ when available at all. Lead times stretched to months. The product was dead at that cost point. The STM32G030 was available at reasonable prices: ~$1-2 even during the shortage. But it's a Cortex-M0+ with 32KB flash, 8KB RAM, fewer peripherals, and a lower clock speed. On paper, it couldn't run the existing firmware.

Comfort vs Necessity

The F103 had headroom. The firmware used 20KB RAM because it was available, not because it was required. Variables were sized for convenience. Buffers were generous. Code was written for clarity over density. The real question wasn't "can the G030 run this?" but "what do we actually need to run this?" 2048 pixels × 3 bytes (RGB) = 6KB for the frame buffer. DMX is 512 bytes. Protocol overhead is minimal. The math could work — but only if every byte earned its place.

The Execution

  • Complete firmware audit: identified every buffer, every variable, every lookup table. Questioned whether each one was necessary at its current size.
  • Frame buffer optimization: implemented double-buffering with careful memory layout to minimize waste. DMA transfers directly from buffer to SPI peripheral.
  • DMX parsing rewrite: zero-copy parsing where possible, minimal intermediate state.
  • Code density improvements: M0+ lacks some M3 instructions, but careful coding minimizes the penalty. Avoided constructs that compile inefficiently on Cortex-M0+.
  • DMA utilization maximized: CPU sets up transfers, DMA executes them. The CPU is free to handle DMX input while pixel data streams out.
  • Timing recalibration: lower clock speed required tighter timing analysis for pixel protocols (WS2812, SK6812, etc.). Verified timing margins were still met.
  • Same hardware interface: PCB unchanged. Same pinout, same connectors. Only the MCU changed.
  • Full feature parity verified: every feature from the F103 version tested and confirmed working.

Impact

>$10/unit

BOM Cost Reduction

During shortage pricing

8KB

RAM Utilization

Down from 20KB "required"

~28KB

Flash Utilization

Down from 64KB available

Zero

Features Removed

Complete feature parity

Technologies

STM32G030Cortex-M0+DMASPI Pixel ProtocolsMemory OptimizationDMX512

Outcome

The G030-based controller shipped throughout the chip shortage while competitors struggled. The lesson: we were using resources because they were available, not because we needed them. Constraints revealed the true requirements.

09

System Architecture

$5/Month IoT Infrastructure

Hundreds of millions of messages, zero per-message fees

The Cost Spiral

End-user devices needed remote monitoring. Each device sends status updates once per second. As the deployment scales, the math becomes terrifying: 100 devices × 1 message/second × 86,400 seconds/day × 30 days = 259,200,000 messages per month. AWS IoT Core pricing: $1.00 per million messages for the first billion. That's ~$259/month just for message delivery — before storage, processing, dashboards, or any other infrastructure. Traditional architecture: devices → cloud broker → database → web app → user. Every component adds cost. Every component adds complexity. Every component is another thing to maintain, secure, and pay for.

Relocate the Intelligence

Why does the cloud need to do heavy lifting? Modern smartphones have multi-core processors, gigabytes of RAM, and gigabytes of storage. They're more powerful than the servers that ran entire companies a decade ago. What if the cloud did the minimum: route messages and enforce access control. Everything else — storage, visualization, historical analysis — happens on the client device. The MQTT broker becomes a relay, not a database. Messages flow through, not into. The phone stores its own data. The user's existing cloud backup (Google Drive, iCloud) handles persistence. The broker just needs to route messages and not fall over.

The Execution

  • AWS Lightsail instance: $5/month for a static IP, 1GB RAM, 1 vCPU, 2TB data transfer. More than enough for a message broker.
  • Mosquitto MQTT broker installed and configured: lightweight, battle-tested, handles thousands of connections trivially.
  • TLS encryption: proper certificates, encrypted connections. Security isn't optional.
  • ACL (Access Control List) configuration: each client gets a unique ID. ACL rules ensure clients only see their own devices' data. No cross-contamination, no data leaks.
  • Client authentication: unique credentials per deployment, tied to ACL rules.
  • Android application architecture: receives MQTT messages, stores locally in SQLite, renders dashboards from local data.
  • Backup strategy: follows the WhatsApp model — chat history stored locally, backed up to user's cloud (Google Drive). Server doesn't hold historical data.
  • One-day buffer on broker: retained messages provide the last 24 hours of data for new connections or reconnections.
  • Message size allowance: up to 128KB per message — plenty for embedded telemetry, even for complex status reports.
  • Horizontal scaling plan: when load approaches limits, spin up another $5 instance. Linear, predictable cost growth.

Impact

$5

Monthly Cost

Fixed, regardless of message volume

~260M/month

Message Capacity

Well under 2TB transfer limit

$0

Per-Message Cost

No usage-based pricing

+$5

Scaling Cost

Per additional instance

Technologies

AWS LightsailMosquitto MQTTTLS/SSLACLAndroidSQLiteFlask API

Outcome

The infrastructure has supported deployments at a fraction of what cloud IoT services would cost. When you question where computation and storage should live, the answer often isn't "the cloud."

10

Full-Stack Embedded

LMS Clicker: POC to Mass Production

Three chip migrations, bare metal firmware, and a bridge for the software team

The Production Gap

An LMS (Learning Management System) needed physical clicker devices — students click responses, the system collects and analyzes. The proof-of-concept worked: ESP32 for the MCU, separate nRF24L01 module for radio, individual GPIO per button, I2C OLED display. Functional, but not production-ready. The issues: - ESP32 is power-hungry. Battery life in weeks, not years. - ESP32 + nRF24L01 = two expensive components where one could suffice. - Individual button GPIOs waste pins on a pin-constrained design. - Arduino framework adds overhead and limits optimization. - The price point required for mass production couldn't absorb the BOM cost. Oh, and the software team building the LMS couldn't interface with serial ports. They needed a REST API.

Every Component Is Negotiable

The nRF52 series combines a Cortex-M4 MCU with a 2.4GHz radio — the ESP32's MCU and the nRF24L01's radio in a single, ultra-low-power package. One chip replaces two. But the nRF52 has fewer GPIOs. Individual button pins won't work. Button matrix scanning is the standard solution — N×M buttons using N+M pins. But matrices typically need a dedicated wake pin for deep sleep. What if the matrix itself could trigger wake-up? That requires understanding the GPIO hardware at the register level, not the Arduino abstraction level. That means bare metal. And the software team needs an API? Then build them one. The device outputs serial data; a simple Python application can read that serial stream and expose a REST interface. The embedded developer solves the embedded problem and the software integration problem.

The Execution

  • Phase 1 — Platform Migration: ESP32 + nRF24L01 → nRF52840. Single-chip solution with integrated radio. Massive power reduction from eliminating WiFi stack overhead.
  • Phase 2 — GPIO Optimization: Individual buttons → matrix configuration. Implemented matrix scanning that works with deep sleep wake-up. This required bare-metal GPIO manipulation to configure wake-on-any-key without a dedicated interrupt pin.
  • Phase 3 — Framework Migration: Arduino/ESP-IDF → Nordic SDK with bare metal approach. Complete environment change. Learned the Nordic toolchain, SDK structure, and peripheral APIs from scratch.
  • Phase 4 — Peripheral Drivers: Wrote custom I2C display driver. Wrote custom button matrix driver with debouncing. Wrote custom radio protocol with acknowledgment and auto-pairing.
  • Phase 5 — Software Bridge: Software team couldn't handle serial port communication. Built a Python system tray application that reads serial data from the receiver, buffers it, and exposes a Flask REST API. The LMS queries the API; the Python app handles the hardware interface.
  • Phase 6 — Cost Reduction: nRF52840 at $2-3 still too expensive for target BOM. Migrated to nRF52810 at ~$1.16. Same firmware architecture, same functionality, tighter memory constraints.
  • Receiver design: handles pairing with unlimited devices (address space allows it), forwards data to host via serial, simple and reliable.

Impact

~$1.16

Final Chip Cost

nRF52810 at volume

~2 years

Battery Life

Calculated and verified

Matrix

Wake Capability

No dedicated wake GPIO needed

REST API

Software Integration

Via Python bridge

Technologies

nRF52840nRF52810Nordic SDKBare Metal CPythonFlaskButton MatrixDeep Sleep Optimization

Outcome

The clicker went from proof-of-concept to mass production. The embedded developer didn't just design hardware — they delivered firmware, receiver firmware, and a software bridge that let the application team integrate without learning embedded systems. The full stack, owned end-to-end.

10

Engineering Innovations

15+

Years of Experience

5+

Years Longest Deployment

0

Data Loss Events

Got an "Impossible" Problem?

Every innovation on this page was called impossible, too expensive, or "can't be done" at some point. They just needed someone willing to break them down to the DNA level.

Every claim on this page is backed by production systems. Not concepts. Not prototypes. Working systems deployed in the field.