Wayfair, a leading e-commerce platform in the Home Goods market, recently undertook a significant migration in its Kubernetes environment, transitioning from OPA (Open Policy Agent) to Kyverno. With around 14,000 employees, 2,000 engineers, and a substantial presence on Google Kubernetes Engine (GKE), Wayfair processes approximately 15,000 production deploys each month, emphasizing the scale and complexity of its operations.
In a recent presentation by Zach Swanson at Wayfair, key insights were shared about the Kubernetes infrastructure at Wayfair as well as their Kyverno adoption journey. They run large multi-tenant clusters to accommodate their extensive developer community, treating each developer group as an isolated tenant. The use of Kyverno admission policies has become integral, managing around 56 validate rules and 20 mutate policies across the clusters. This approach allows Wayfair to protect its platform, preventing potential issues like misrouted traffic, insecure ingress configurations, and inadvertent resource mismanagement.
The utilization of Kyverno at Wayfair falls into two broad categories. Firstly, Kyverno is employed to protect the platform. Beyond standard pod security, it is used to prevent various scenarios, such as unauthorized changes to ingress hosts, TLS declarations, and the enabling of features that could complicate issue tracking. Secondly, Kyverno is instrumental in seamlessly evolving the platform without requiring developers to make extensive changes. This involves the automatic adjustment of deprecated configurations, image registry failovers, and enhancements to resource efficiency, resulting in significant cost savings.
Wayfair’s decision to migrate from OPA to Kyverno was driven by several compelling factors. OPA’s Rego language, while powerful, posed challenges in terms of complexity, especially in comparison to Kyverno. Documentation gaps and subtle differences between Gatekeeper (OPA-based) and OPA itself further contributed to the decision. Notably, Wayfair lacked a centralized policy team, and the versatility of Kyverno allowed them to adopt a more streamlined approach. The Kyverno community’s responsiveness, coupled with an extensive public policy library, further solidified the benefits of the migration.
The migration process at Wayfair was a well-structured and methodical approach. It began with a crucial concept demo, showcasing Kyverno’s ability to handle complex constraints. Subsequently, Gatekeeper constraints were systematically retooled into Kyverno policies, with parallel deployment and confidence-building through testing utilities. Policies were transitioned from auditing to enforcing mode, ensuring alignment with existing Gatekeeper policies. The gradual disabling of Gatekeeper constraints marked the successful completion of the migration, emphasizing the straightforward nature of transitioning from OPA to Kyverno.
Wayfair’s migration from OPA to Kyverno reflects a strategic move to enhance the manageability, simplicity, and responsiveness of their Kubernetes environment. The shift not only addressed challenges associated with OPA but also empowered Wayfair to seamlessly adapt its platform, safeguard against potential issues, and significantly reduce resource allocation. This case study serves as valuable insight for organizations considering a similar transition, highlighting the benefits of Kyverno in managing Kubernetes policies at scale.
Are you interested in learning more about how to secure your Kubernetes clusters using Kyverno? Check out this ebook: Securing Kubernetes Using Policy-as-code powered by Kyverno