Administrative analytics for LLM gateways has an uncomfortable systems requirement mix: the operator wants exact billing-adjacent numbers, “today” must stay fresh, range queries span from hours to months, and the underlying workload is an append-heavy log stream with latency, quota, cache, and user-activity semantics. This paper presents TokenRouter Analytics, a hierarchical rollup architecture that materializes exact closed buckets at hour, day, month, and year granularity and fuses them with a live raw-log tail at query time. The design stores additive aggregates, exact latency histograms, exact distinct-user period sets, and build-state rows that gate summary visibility. Queries are planned by largest aligned buckets first, while missing closed buckets are built on demand and recently closed buckets are refreshed in the background. Unlike a purely conceptual position paper, this work is grounded in a running implementation and a reproducible local benchmark. On a schema-compatible synthetic workload executed against the current implementation on SQLite 3.38.2, the benchmark dataset contains 360,000 log rows, 18,000 top-up rows, 2,400 users, and 32 channels over a 180-day horizon. Build-state timestamps show that the full closed-history materialization completed in 414 seconds on a local Apple M4 machine with 24 GB memory. For SQL-core query latency, median latency improves from 11.35 ms to 6.52 ms for a 1-day window, from 72.26 ms to 35.09 ms for 7 days, from 398.21 ms to 227.35 ms for 30 days, and from 3556.82 ms to 2072.15 ms for 180 days. The speedup is therefore consistent but not magical: exact distinct-user support still dominates some paths, while pre-aggregation removes expensive repeated group-by work over wide raw logs. The paper argues that this combination of hierarchical rollups, live-tail fusion, and exactness-oriented summary design is a practical middle ground between raw-scan dashboards and a separate heavyweight warehouse.
Min Wei (Mon,) studied this question.