AI & Data Infrastructure Architect · Distributed Systems · Cloud Platforms · Security Analytics · Scalable ML
Make AI, cloud, and data infrastructure ROI visible.
I help teams reduce infrastructure waste, control observability and data-platform costs, and build systems that scale without costs becoming unpredictable.

The real cost of invisible infrastructure
Most waste is invisible until it becomes a budget problem.
Five infrastructure layers, all flowing into the same bill. Width shows current cost share. Animation speed reflects growth rate. Most teams know the total. Few know which layer is accelerating.
Telemetry — mostly noise
~85% of ingested telemetry is never operationally queried.
AI runtime — the model isn't the cost
Context, retrieval, traces, and storage usually exceed model inference spend.
model inference = 12% of total AI cost
Cloud bill — can't attribute most of it
Most teams can't map infrastructure spend to workloads, features, or teams.
Data pipelines — low-value event load
Retries, heartbeats, and debug events compound storage and compute cost.
<25% of pipeline volume is actual product data
System scale — cost visibility lags behind
By the time the bill is a problem, the architecture is expensive to change.
Focus areas
Where systems get expensive, noisy, or hard to control.
Observability cost & ingestion design
Sampling, cardinality reduction, retention tiers. Reduce ingestion without losing operational coverage.
AI infrastructure overhead
Context management, retrieval cost, trace volume, embedding storage. Most of it isn't the model.
Data platform efficiency
Kafka, Flink, Spark, ClickHouse — ingestion design, backpressure, storage cost, event value.
Infrastructure cost attribution
Map cloud, Kubernetes, data, and AI spend to workloads, teams, and features.
Security telemetry correlation
Normalize and correlate findings so teams prioritize real risk rather than ingesting alert volume.
Cost & architecture
Same pattern, different system.
Most of these problems are predictable once you've seen them a few times.
Most teams collect far more telemetry than they operationally use.
The expensive part usually isn't the model. It's the context, traces, and retry loops around it.
Retention decisions get deferred until storage is already a cost problem.
Observability is a data architecture problem, not a tooling choice.
Many teams can scale infrastructure faster than they can explain the bill.
If you can't trace cost to a workload, you're guessing at what to fix.
Stack
Tools, not the strategy.
The value is owning the telemetry model and cost structure — not the tools.
Representative outcomes
What this work looks like.
No invented clients, no inflated numbers.
Identified high-cardinality telemetry causing disproportionate ingestion cost — reduced volume without operational coverage gaps.
Redesigned pipeline ingestion and backpressure — improved throughput while reducing compute and storage overhead.
Built workload-level cost attribution across cloud and Kubernetes infrastructure.
Mapped AI runtime cost across context, retrieval, traces, and storage.
Improved signal correlation across operational, security, and infrastructure data.
Reduced agent orchestration overhead by redesigning context management and tool-call patterns.
About
Daniel Brener
AI & Data Infrastructure Architect focused on distributed systems, cloud platforms, security analytics, and scalable ML infrastructure. Most of my work involves making infrastructure economics readable — where systems become expensive, noisy, or hard to control, and designing toward something clearer.
Primary domains
Infrastructure tends to get expensive
in specific, predictable ways.
I'm interested in practical conversations around cloud cost visibility, observability economics, AI infrastructure overhead, and data-platform efficiency. No pitch — just an exchange about the specific problem you're working on.