Insurance SaaS teams face a double squeeze: under pressure to cut cloud costs and scale efficiently, they must simultaneously meet stricter resilience and data security standards like SOC 2, ISO 27001, GDPR, and DORA.
Kubernetes adoption has doubled since 2024, now used by 61% of enterprises, while 32% of cloud spend remains wasted, yet FinOps-driven optimisation already saves teams an average of 23%.
For engineering leaders, the challenge is modernizing infrastructure for speed and savings without compromising compliance or operational stability.
Kubernetes provides the elastic infrastructure needed to scale dynamically during insurance spikes while enforcing standardized controls, but its ephemeral and distributed nature requires a precise architecture to ensure cost optimization and strict compliance don’t bottleneck engineering velocity.
In this article, we’ll explore how insurance SaaS companies can use Kubernetes to reduce infrastructure waste, improve operational resilience, and build audit-ready environments – without sacrificing the regulatory controls that modern insurance platforms depend on.
Key summary
- Kubernetes helps insurance SaaS platforms reduce infrastructure costs through dynamic scaling and better resource utilisation.
- Insurance companies must balance cost optimization with strict compliance requirements like SOC 2, ISO 27001, GDPR, and DORA.
- Building audit-ready Kubernetes environments requires strong observability, policy enforcement, runtime security, and supply chain controls.
- Successful adoption depends on designing for both high availability and compliance from the start – not treating them as trade-offs.
Why Kubernetes Is a Strong Fit for Insurance SaaS Platforms
Insurance SaaS is rarely a clean-slate build. Most platforms are layered systems – a modern API layer (typically Go or Node.js) sitting atop calculation engines that may be decades old, written in C++ or Java, and deeply embedded in pricing and underwriting logic. Beneath that, Python-based data science pipelines handle risk modelling, fraud detection, and actuarial computation.
Each layer has different runtime requirements, scaling characteristics, and release cadences.
Native Support for Heterogeneous Workloads. Rather than managing separate infrastructure for each type of component, Kubernetes provides a single orchestration layer across the entire stack. Teams define how each service should run, what resources it needs, and how it should behave under load – Kubernetes handles the scheduling. This maps cleanly onto insurance platforms where polyglot architectures are the norm, not the exception.
Scaling That Matches Insurance Traffic Patterns Insurance workloads are inherently uneven. Traffic spikes sharply during open enrollment, after catastrophic weather events trigger claims surges, and during scheduled batch runs like nightly actuarial modelling. Traditional VM infrastructure handles this poorly – provisioning takes minutes, which historically pushed teams toward over-provisioning and the ongoing cost of idle capacity. Containers start in seconds, and Kubernetes can scale them automatically against real-time demand signals, so infrastructure expenditure tracks actual usage rather than worst-case estimates.
Consistent Deployment Across Environments. Insurance SaaS teams frequently operate across multiple environments – development, staging, client-specific production tenants, and regulated cloud regions. Kubernetes provides a consistent deployment model across all of them, reducing the surface area for environment-specific bugs and simplifying compliance demonstrations.
The Ephemerality Trade-off. The same property that makes Kubernetes operationally efficient – containers are short-lived and replaceable – creates a genuine compliance challenge. Auditors accustomed to persistent servers with stable IPs and durable audit logs find the idea of constantly recycling infrastructure unsettling, and legitimately so. Addressing this, both technically through centralised logging and immutable audit trails, and organisationally through auditor education, is a real cost of adopting Kubernetes in regulated insurance environments. It’s manageable, but it shouldn’t be understated.
Why Insurance SaaS Teams Are Caught Between Cost Pressure and Compliance Mandates
Infrastructure budgets are under significant scrutiny. Cloud spend that once went unquestioned is now being reviewed line by line, and engineering leaders are being asked to justify every compute dollar. This pressure is not abstract – according to Flexera’s 2025 State of the Cloud Report, 84% of organisations cite managing cloud spend as their top cloud challenge, and estimated wasted cloud spend on idle or oversized resources remains at 27% of total cloud expenditure.
For insurance SaaS teams, that waste has a specific shape: over-provisioned infrastructure kept running to absorb unpredictable traffic spikes – enrollment surges, catastrophic weather events, nightly actuarial batch runs that may materialise for hours and then disappear entirely.
At the same time, insurance and financial services operate in one of the most heavily regulated sectors in the world. The compliance burden is not optional, and it doesn’t move at the speed of infrastructure modernisation.
SOC 2 Type II – continuous evidence, not a point-in-time snapshot
SOC 2 Type II requires organisations to demonstrate continuous operational security and data protection over a defined audit period — typically six to twelve months. In a containerised environment, this creates a specific challenge: audit logs must be durable and tamper-resistant even after the containers that generated them have been decommissioned. Ephemeral infrastructure and persistent evidence requirements pull in opposite directions, and bridging that gap requires deliberate architectural decisions.
ISO 27001 – systematic risk management at the infrastructure layer
ISO 27001 is less prescriptive about specific technical controls and more concerned with whether your organisation has a systematic, documented approach to identifying, assessing, and treating information security risks. For Kubernetes environments, this means clear ownership of cluster configuration, documented vulnerability management processes, and demonstrable access controls — all areas where dynamic infrastructure demands more rigour, not less.
GDPR and CCPA – data that doesn’t stay where you put it
Both frameworks impose strict requirements around data residency, encryption, and the right to erasure. In containerised environments, personal data can proliferate across application logs, persistent volumes, backups, and the in-memory state of running containers. A 2023 Securiti report found that 61% of organisations struggle to locate all instances of personal data across their infrastructure -a figure that rises sharply in dynamic, distributed architectures. “The right to be forgotten” requires not just a deletion mechanism but a comprehensive data classification and lineage strategy.
DORA – operational resilience as a regulatory requirement
The Digital Operational Resilience Act is increasingly relevant for any entity operating in or serving the EU financial sector. It mandates demonstrable operational resilience, defined recovery time objectives, and rigorous management of third-party technology risk. For insurtech platforms that rely on managed cloud services, this last requirement demands careful vendor dependency mapping and credible exit planning – not documentation that sits in a drawer, but controls that can be demonstrated to regulators.
These frameworks were largely written with traditional, persistent infrastructure in mind. Applying them to a dynamic, containerised environment requires interpretation, additional documentation, and deliberate design choices that many teams make too late.
Gartner has consistently noted that through 2025, the overwhelming majority of cloud security failures trace back not to the infrastructure itself, but to how organisations configure and govern it – a process and documentation problem, not a technology one.
The compliance challenge is entirely solvable but only if it’s treated as an architectural concern from the start, not a layer to be added before audit season.
How Your Kubernetes Setup Shapes Cost and Compliance
Before addressing compliance architecture, engineering teams face a foundational decision regarding Kubernetes for insurance that directly determines both their cost structure and their compliance burden: run it on a managed cloud service, on-premise, or in a hybrid configuration?
The managed services — AWS EKS, Google GKE, Azure AKS — are the right starting point for most insurtech teams. They handle control plane management, reduce the operational overhead of cluster maintenance, and integrate natively with the cloud security and monitoring tooling that compliance programmes depend on. For most organisations, the question is not whether to use managed Kubernetes, but which provider aligns with their existing cloud footprint and regulatory obligations.
That said, legitimate reasons exist for on-premise or hybrid configurations in insurance. Data sovereignty regulations in certain jurisdictions require that policyholder data never leaves a defined geographic boundary — and while major providers offer region-specific deployments, some organisations prefer the certainty of physical control. Legacy mainframe integrations with strict latency requirements can also push specific workloads toward on-premise or colocation infrastructure.
07 criteria determine which path makes sense and each carries a direct cost or compliance implication:

- Data residency requirements are often the deciding factor before anything else. If your regulatory environment mandates that policyholder data stays within a specific country or region, your infrastructure decision starts and may end there. Getting this wrong is not a configuration problem — it’s a regulatory breach.
- Team capacity and Kubernetes expertise determines how much of your engineering budget goes toward infrastructure rather than product. On-premise Kubernetes is a significant operational undertaking. Managed services reduce that burden substantially, but still require skilled practitioners — the cost shifts from hardware and maintenance to talent and configuration.
- Compliance tier influences how much control you need over the underlying infrastructure. Organisations subject to the most stringent regulatory regimes — particularly those with DORA obligations or strict SOC 2 Type II requirements — may find that shared cloud infrastructure requires additional compensating controls, adding cost and complexity.
- Legacy system integration — particularly mainframe or on-premise data warehouse dependencies — can constrain your architecture in ways that make purely cloud-native deployments expensive. Connectivity between cloud Kubernetes workloads and on-premise systems introduces latency, data transfer costs, and additional security controls that need to be factored into the total cost picture.
- Geographic distribution across multiple jurisdictions may require multi-region or multi-cloud configurations. Each additional region adds compliance scope — more data residency considerations, more audit surface, and more infrastructure to secure and monitor.
- Recovery time objectives (RTO) should inform your resilience architecture directly. Managed services typically provide stronger baseline availability than most teams can achieve running their own clusters, which matters both for operational continuity and for demonstrating DORA compliance around recovery commitments.
- Three-year total cost of ownership is often the tie-breaker. On-premise infrastructure may appear cheaper on a per-compute-hour basis, but the true cost includes engineering time, hardware refresh cycles, and the opportunity cost of your team managing infrastructure rather than building product. Managed services trade operational overhead for higher per-unit cost — for most insurtech organisations, that trade is favourable.
For most insurtech teams, the default answer is a managed service, with on-premise components reserved for workloads where data sovereignty or latency requirements genuinely demand it. That default also tends to produce a more defensible compliance posture from the start — managed services come with audit logging, identity integration, and security tooling that would otherwise need to be built and maintained independently.
A Four-Layer Compliance Architecture
The core concern about Kubernetes in regulated environments is ephemerality – if containers are constantly being created and destroyed, how do you maintain a durable audit trail? The answer is a layered observability architecture that doesn’t depend on any individual container surviving.
Layer 1. Runtime security monitors what’s actually happening inside running containers. Tools like Falco watch kernel-level system calls in real time, alerting on anomalous behaviour such as unexpected privilege escalation or credential file access. Events must stream to a tamper-resistant log store external to the container.
Layer 2. Control plane audit logs capture every change to cluster state through the Kubernetes API server – deployments, policy changes, access modifications. Policy enforcement tools (OPA, Kyverno) sit at this layer, rejecting non-compliant workloads before they run. Both the changes and the rejections become part of your audit trail.
Layer 3. Application observability addresses what auditors actually ask about: who accessed which data, when, and why. Applications must emit structured logs to a centralised store. A service mesh like Istio adds automatic mutual TLS and per-call telemetry across all service-to-service traffic — encryption and evidence, handled at the infrastructure layer.
Layer 4. Supply chain security governs what code runs in the first place. This means image scanning in CI/CD, a private registry with strict access controls, image signing for provenance, and admission policies that block images from unapproved sources. For SOC 2, a documented and enforced deployment pipeline with evidence that only approved, scanned images reach production – constitutes a meaningful control.
How Sun* Cut a Banking Client’s GKE Spend by 45% ?
A recent engagement in the digital banking sector involved optimizing a cloud-native transaction platform running on Google Kubernetes Engine (GKE). Our client faced rising infrastructure costs despite relatively stable traffic volumes, caused by memory leaks and inefficient autoscaling behavior within Kubernetes clusters. The Horizontal Pod Autoscaler (HPA) continuously scaled workloads based on inflated memory metrics, resulting in unnecessary compute consumption and runaway cloud spend.
The engineering team applied a structured Kubernetes optimization approach combining workload profiling, load testing, and code-level remediation. By fixing resource lifecycle issues, reducing object churn through Singleton patterns, and recalibrating autoscaling thresholds, the platform was able to return Pods to baseline memory usage after traffic spikes subsided. This stabilized cluster behavior without requiring a major architectural rebuild.
As a result, the banking platform achieved approximately 45% reduction in monthly GKE infrastructure costs while improving operational stability and scalability. The project also established a repeatable FinOps-oriented optimization methodology for future cloud governance and Kubernetes cost control initiatives.
Read our full case study: GKE Cost Optimization for Banking: How We Slashed Cloud Spend by Fixing Cluster Leaks
Getting Kubernetes For Insurance Right in Regulated Environments
Kubernetes is not a magic solution to the cost and compliance challenges facing insurance SaaS teams. It is a powerful tool that, when implemented with intention and expertise, can genuinely help organisations build more efficient and more resilient platforms.
The organisations that get the most out of Kubernetes in regulated environments are those that treat compliance as an architectural concern from the start – not a layer to be added after the fact. The four-layer framework described above is not a checklist to complete before going live; it’s a set of ongoing practices that generate the evidence base your compliance programme depends on.
The organisations that struggle are those that adopt Kubernetes primarily for its operational benefits and then discover, often during their first SOC 2 audit, that ephemerality is the enemy of the evidence trail auditors expect. Retrofitting compliance controls onto a running cluster is harder, more expensive, and more disruptive than building them in from the beginning.
We help insurance engineering teams design and ship Kubernetes environments that hold up under real compliance scrutiny.


