October, 2024

Holistic AI Launches Open-Source Library to Advance Responsible AI

Holistic AI OSL provides the most comprehensive library for eliminating bias and improving explainability in AI systems available today

Oct. 22, 2024 – San Jose – (Business Wire) – Holistic AI, the leading AI governance platform for the enterprise, today announced the launch of Holistic AI OSL, an optimized open-source library designed to help developers build fair and responsible AI systems. AI architects and developers can now access the library, which provides advanced tools for eliminating bias and improving explainability. Holistic AI OSL empowers teams to create more transparent and trustworthy AI applications from the ground up, fostering a safer environment of innovation and experimentation to benefit society. For more information, visit the Holistic AI blog or download the library for Python, which is available today free of charge without any licensing requirements.

Organizations increasingly rely on AI systems in critical areas such as recruitment and onboarding, healthcare, loan approval and credit scoring, where fairness is paramount. It is essential to ensure that algorithms do not inadvertently discriminate, ensuring equal treatment for demographic groups and individuals. While AI has made significant advances in prediction accuracy, recent studies indicate that 65% of AI researchers and developers still identify bias as a major issue.

Holistic AI OSL tackles this challenge by providing tools that address the five key technical risks associated with AI systems, ensuring greater accountability. Specifically, OSL offers:

Bias Mitigation: Introduces over 35 bias metrics across five machine learning tasks and provides 30 strategies to help developers eliminate bias in their systems.‍
Explainability: Defines the system’s behavior by revealing how models make decisions and predictions, fostering transparency and building trust.‍
Robustness: Ensures models perform consistently, even when faced with challenges like adversarial attacks or variations to input data.‍
Security: Provides safeguards for user privacy through anonymization and defends against risks like attribute inference attacks, enhancing overall security.‍
Efficacy: Ensures models are not only accurate but maintain fairness, robustness, and security under various conditions, balancing these factors through detailed testing in real-world scenarios.

“Our new library equips organizations with tools for all AI risks, including explainability, robustness, and bias. It supports measurement, reporting, and mitigation at every stage of the AI lifecycle, offering one of the most advanced solutions for improving quality in AI applications today,”

said Adriano Koshiyama, Co-CEO of Holistic AI.

“Our goal is to help AI realize its full potential. Whether through this open-source library or our comprehensive AI governance platform, we are committed to empowering businesses to accelerate AI innovation across their enterprise—enabling them to complete more projects successfully without facing risks, compliance issues, or bias, all while tracking against the expected ROI.”

As one of the top global insurers operating in almost 40 countries across five continents and serving over 30 million customers worldwide, MAPFRE is leveraging AI as part of its innovation strategy around continuous improvement of its customer experiences, processes, and operations. Holistic AI OSL, as well as the full Holistic AI Governance Platform, are a part of MAPFRE’s technology line up.

“What sets this library apart is its depth—it’s not just about identifying AI risks but actively addressing them with proven, industry-ready mitigation techniques, making it an essential part of any ethical AI development toolkit,”

said César Ortega, Expert Data Scientist at MAPFRE.

About Holistic AI

Founded in 2020, Holistic AI’s mission is to empower enterprises to adopt and scale AI with confidence. Our purpose-built AI governance platform helps companies accelerate AI transformation across the organization – transparently, responsibly, and with ROI accountability for the C-Suite. With Holistic AI, businesses can increase visibility and control of AI projects, eliminate communication bottlenecks across teams, and significantly reduce AI risk at enterprise scale. Holistic AI is part of Microsoft’s Founders’ Hub, Pegasus Program, and Nvidia’s Inception program. Holistic AI founders are active members of the, experts on the UN AI Advisory Body, members of OECD’s Network of Experts on AI, advisors on the EU AI Act, and collaborators to the Alan Turin Institute.

For more information, see NIST AI Safety Institute, experts on the UN AI Advisory Body, members of OECD’s Network of Experts on AI, advisors on the EU AI Act, and collaborators to the Alan Turin Institute. For more information, see www.holisticai.com.

CoreStack Makes the Inc. 5000 List of America’s Fastest Growing Private Companies Two Years Running

NextGen Cloud Governance Company Places 12th Among the Fastest Growing Companies in the Seattle Area

BELLEVUE, WA — August 13, 2024 — CoreStack, a global multi-cloud governance provider, is proud to announce it has made the coveted Inc. 5000 list for the second year in a row. This list recognizes CoreStack as one of the fastest-growing private companies in the U.S., reflecting its dramatic growth and outsized influence within the cloud industry.

The Inc. 5000 class of 2024 represents companies that have driven rapid revenue growth despite strong economic headwinds. Of the 5,000 fastest-growing private companies on Inc.’s 2024 list, CoreStack ranks as No. 1013. The company ranks No. 121 in the Software category and No. 12 in the Seattle area.

Inc.’s annual ranking provides a data-driven look at the most successful companies within the economy’s most dynamic segment—its independent, entrepreneurial businesses. This year’s Inc. 5000 companies have added 874,458 jobs to the economy over the past three years.

“One of the greatest joys of my job is going through the Inc. 5000 list,” says Mike Hofman, who recently joined Inc. as editor-in-chief. “Congratulations to this year’s honorees for growing their businesses fast despite the economic disruption we all faced over the past three years.”

For complete results of the Inc. 5000, including company profiles and an interactive database that can be sorted by industry, location, and other criteria, go to www.inc.com/inc5000. The top 500 companies are featured in the September issue of Inc. magazine, available on newsstands beginning Tuesday, August 20.

“It is indeed an honor to make Inc.’s list of fastest-growing companies two years in a row,” says CoreStack’s CEO, Ezhilarasan (Ez) Natarajan. “This recognition reflects not only our robust growth but also the transformative value our cloud governance technology continues to deliver to partners and customers.”

Earlier this year, Inc. revealed that CoreStack ranked No. 41 on the Inc. 5000 Regionals: Pacific list, the most prestigious ranking of the fastest-growing private companies in the Pacific region, including California, Oregon, Washington, Alaska, and Hawaii. CoreStack ranked No. 42 on the Regionals: Pacific list in 2023. Inc. has also recognized the company as a Best Workplace for the last three years.

CoreStack offers a suite of NextGen Cloud Governance modules that leverage AI to provide continuous and autonomous multi-cloud governance through a unified dashboard for FinOps, SecOps, and CloudOps. NextGen Cloud Governance helps enterprises mitigate risk, accelerate delivery, optimize performance, and innovate faster. In addition, CoreStack offers assessments based on Well-Architected and custom frameworks. This solution streamlines the process of evaluating, improving, and maintaining cloud workloads across all environments.

What is DataOps

Data workflows today have grown increasingly intricate, diverse, and interconnected. Leaders in data and analytics (D&A) are looking for tools that streamline operations and minimize the reliance on custom solutions and manual steps in managing data pipelines.

DataOps is a framework that brings together data engineering and data science teams to address an organization’s data requirements. It adopts an automation-driven approach to the development and scaling of data products. This approach also streamlines the work of data engineering teams, enabling them to provide other stakeholders with dependable data for informed decision-making.

Initially pioneered by data-driven companies who used CI/CD principles and even developed open-source tools to improve data teams—DataOps has steadily gained traction. Today, data teams of all sizes increasingly rely on DataOps as a framework for quickly deploying data pipelines while ensuring the data remains reliable and readily accessible.

Gartner defines DataOps as —

A collaborative data management practice focused on improving the communication, integration and automation of data flows between data managers and data consumers across an organization.

Why DataOps is Important

Manual data management tasks can be both time-consuming and inefficient, especially as businesses evolve and demand greater flexibility. A streamlined approach to data management, from collection through to delivery, allows organizations to adapt quickly while handling growing data volumes and building data products.

DataOps tackles these challenges by bridging the gap between data producers (upstream) and consumers (downstream). By integrating data across departments, DataOps promotes collaboration, giving teams the ability to access and analyze data to meet their unique needs. This approach improves data speed, reliability, quality, and governance, leading to more insightful and timely analysis.

In a DataOps model, cross-functional teams—including data scientists, engineers, analysts, IT, and business stakeholders—work together to achieve business objectives.

DataOps vs DevOps: Is There a Difference?

Although DevOps and DataOps sound similar, they serve distinct functions within organizations. While both emphasize collaboration and automation, their focus areas are different: DevOps is centered around optimizing software development and deployment, whereas DataOps focuses on ensuring data quality and accessibility throughout its lifecycle.

The DataOps Framework

The DataOps framework joins together different methodologies and practices to improve data management and analytics workflows within organizations. It consists of five key components:

1. Data Orchestration

Data orchestration automates the arrangement and management of data processes, ensuring seamless collection, processing, and delivery across systems. Key elements include:

Workflow automation: Automates scheduling and execution of data tasks to enhance efficiency.
Data integration: Combines data from diverse sources into a unified view for consistency and accessibility.
Error handling: Detects and resolves errors during data processing to maintain integrity.
Scalability: Adapts to increasing data volumes and complexity without compromising performance.

2. Data Governance

Data governance establishes policies and standards that guarantee the accuracy, quality, and security of data, facilitating effective management of structured data assets. Key elements include:

Data quality management: Ensures data is accurate, complete, and reliable.
Data security: Protects data from unauthorized access and breaches through various measures.
Data lineage: Tracks the origin and transformation of data for transparency.
Compliance: Ensures adherence to regulatory requirements and industry standards, such as GDPR and HIPAA.

3. Continuous Integration and Continuous Deployment (CI/CD)

The CI/CD (Continuous Integration/Continuous Deployment) practices automates testing, integration, and deployment of data applications, enhancing responsiveness. Key elements include:

Continuous integration: Merges code changes into a shared repository with automated testing for early issue detection.
Continuous deployment: Automates deployment of tested code to production environments.
Automated testing: Includes various tests to ensure the correctness of data applications.

4. Data Observability

Data observability involves ongoing monitoring and analysis of data systems to proactively detect and address issues, delivering visibility into data workflows. Key elements include:

Monitoring: Tracks the health and performance of data pipelines and applications.
Alerting: Notifies teams of anomalies or performance issues in real time.
Metrics and dashboards: Detects and resolves errors during data processing to maintain integrity.
Scalability: Provides visual insights into key performance indicators (KPIs).

5. Automation

Automation minimizes manual intervention by utilizing tools and scripts to perform repetitive tasks, enhancing efficiency and accuracy in data processing. Key elements include:

Task automation: Automates routine tasks like ETL and reporting.
Workflow automation: Streamlines complex workflows using dependencies and scheduling.
Self-service: Enables users to access and analyze data independently through user-friendly interfaces.

How Does DataOps Works

DataOps primarily consists of the following four processes:

Data Integration: This process aims to create a cohesive view of fragmented and distributed organizational data through seamless, automated, and scalable data pipelines. The objective is to efficiently locate and integrate the appropriate data without sacrificing context or accuracy.
Data Management: This implies automating and optimizing data processes and workflows from creation to distribution, throughout the entire data lifecycle. Agility and responsiveness are essential for effective DataOps.
Data Analytics Development: This process facilitates rapid and scalable data insights by utilizing optimal, reusable analytics models, user-friendly data visualizations, and continuous innovation to enhance data models over time.
Data Delivery: The goal here is to ensure that all business users can access data when it is most needed. This extends beyond just efficient storage; it emphasizes timely data access with democratized self-service options for users.

In practice, the key phases in a DataOps lifecycle include:

Planning: Collaborating with teams to set KPIs and SLAs for data quality and availability.
Development: Building data products and machine learning models.
Integration: Incorporating code or data products into existing systems.
Testing: Verifying data against business logic and operational thresholds.
Release: Deploying data into a test environment.
Deployment: Merging data into production.
Operate: Running data in applications to fuel ML models.
Monitor: Continuously checking for anomalies in data.

This iterative cycle promotes collaboration, enabling data teams to effectively identify and prevent data quality issues by applying DevOps principles to data pipelines.

Who Owns DataOps

DataOps teams usually incorporate temporary stakeholders throughout the sprint process. However, each DataOps team relies on a core group of permanent data professionals, which typically includes:

The Executive (CDO, CTO, etc.): This leader guides the team in delivering business-ready data for consumers and leadership. They ensure the security, quality, governance, and lifecycle management of all data products.
The Data Steward: Responsible for establishing a data governance framework within the organization, the data steward manages data ingestion, storage, processing, and transmission. This framework serves as the foundation of the DataOps initiative.
The Data Quality Analyst: Focused on enhancing the quality and reliability of data, the data quality analyst ensures that higher data quality leads to improved results and decision-making for consumers.
The Data Engineer: The data engineer constructs, deploys, and maintains the organization’s data infrastructure, which includes all data pipelines and SQL transformations. This infrastructure is crucial for ingesting, transforming, and delivering data from source systems to the appropriate stakeholders.
The Data/BI Analyst: This role involves manipulating, modeling, and visualizing data for consumers. The data/BI analyst interprets data to help stakeholders make informed strategic business decisions.
The Data Scientist: Tasked with producing advanced analytics and predictive insights, the data scientist enables stakeholders to enhance their decision-making processes through enriched insights.

Benefits of DataOps

Adopting a DataOps solution offers numerous benefits:

1. Improved Data Quality

DataOps enhances data quality by automating traditionally manual and error-prone tasks like cleansing, transformation, and enrichment. This is crucial in industries where accurate data is vital for decision-making. By providing visibility throughout the data lifecycle, DataOps helps identify issues early, enabling organizations to make faster, more confident decisions.

2. Faster Analytics Deployment

Successful DataOps implementation can significantly decrease the frequency of late data analytics product deliveries. DataOps accelerates analytics deployment by automating provisioning, configuration, and deployment tasks, which reduces the need for manual coding. This allows data engineers and analysts to quickly iterate solutions, resulting in faster application rollouts and a competitive edge.

3. Enhanced Communication and Collaboration

DataOps fosters better communication and collaboration among teams by centralizing data access. This facilitates cross-team collaboration and improves the efficiency of releasing new analytics developments. By automating data-related tasks, teams can focus on higher-level activities, such as innovation and collaboration, leading to better utilization of data resources.

4. More Reliable and Efficient Data Pipeline

DataOps creates a more robust and faster data pipeline by automating data ingestion, warehousing, and processing tasks, which reduces human error. It improves pipeline efficiency by providing tools for management and monitoring, allowing engineers to proactively address issues.

5. Easier Access to Archived Data

DataOps simplifies access to archived data through a centralized repository, making it easy to query data compliantly and automating the archiving process to enhance efficiency and reduce costs. DataOps also promotes data democratization by making vetted, governed data accessible to a broader range of users, optimizing operations and improving customer experiences.

Best Practices for DataOps

Data and analytics (D&A) leaders should adopt DataOps practices to overcome the technical and organizational barriers that slow down data delivery across their organizations. As businesses evolve rapidly, there is an increasing need for reliable data among various consumer personas, such as data scientists and business leaders. This has heightened the demand for trusted, decision-quality data.

DataOps begins with cleaning raw data and establishing a technology infrastructure to make it accessible. Once implemented, collaboration between business and data teams becomes essential. DataOps fosters open communication and encourages agile methodologies by breaking down data processes into smaller, manageable tasks. Automation streamlines data pipelines, minimizing human error.

Building a data-driven culture is also vital. Investing in data literacy empowers users to leverage data effectively, creating a continuous feedback loop that enhances data quality and prioritizes infrastructure improvements. Treating data as a product requires stakeholder involvement to align on KPIs and develop service level agreements (SLAs) early in the process. This ensures focus on what constitutes good data quality within the organization.

To successfully implement DataOps, keep the following best practices in mind:

Define data standards early: Establish clear semantic rules for data and metadata.
Assemble a diverse team: Build a team with various technical skills.
Automate for efficiency: Use data science and BI tools to automate processing.
Break silos: Encourage communication and utilize integration tools.
Design for scalability: Create a data pipeline that adapts to growing data volumes.
Build in validation: Continuously validate data quality through feedback loops.
Experiment safely: Use disposable environments for safe testing.
Embrace continuous improvement: Focus on ongoing efficiency enhancements.
Measure progress: Establish benchmarks and track performance throughout the data lifecycle.

By treating data like a product, organizations can ensure accurate, reliable insights to drive decision-making.

Conclusion

By automating tasks, enhancing communication and collaboration, establishing more reliable and efficient data pipelines, and facilitating easier access to archived data, DataOps can significantly improve an organization’s overall performance.

However, it’s important to note that DataOps is not a one-size-fits-all solution; it won’t automatically resolve all data-related challenges within an organization.

Nevertheless, when implemented effectively, a DataOps solution can enhance your organization’s performance and help sustain its competitive advantage.

SQL Server lifecycle and considerations for enterprises

SQL Server is one of the most versatile databases which enterprises trust for their database workloads. It’s a traditional Online Transactional Processing (OLTP) database and over the years enterprises across different industry verticals like financial sector, healthcare, media and entertainment, manufacturing, insurance etc. have built plethora of applications using SQL Server. Every few years, Microsoft releases a new version of SQL Server (like 2014, 2016, 2017, 2019, 2022 editions) with new feature enhancements which make the product more secure, more compliant and with a performant database engine coping up with the growing needs of enterprise data. I’ve spent several years in the core SQL Server product team and can proudly vouch the rigorous testing’s which are done on the product prior to any release. SQL Server engineering and product teams have been known across the industry for their decades of engineering excellence in delivering such a robust engine impacting millions of customers worldwide.

Each version of SQL Server is backed by a minimum of 10 years support, which includes five years in mainstream support (includes functional, performance, scalability, and security updates), and five years in extended support (only security updates). For customers who are nearing their 10 years on a particular version they choose to either migrate to the cloud into Azure SQL, or to an Azure Virtual Machine for free extended security updates, or upgrade to a more recent version of SQL Server or purchase and extended security updates subscription with Microsoft. Enterprise customers typically choose to remain in n-1 or n-2 (n being the latest version) version of the product and prior to the 10 years end of life has to choose one of the options mentioned above. Several enterprise customers for their critical workloads and for business reasons need to remain on-premises and cannot move to the cloud. For them, they are tasked with migrating to the latest version of SQL Server along with upgrading their physical hardware. Recently July 9th, 2024, marked the end-of-life support for SQL Server 2014. On-premises customers will need to move to a recent version of SQL Server and also upgrade the necessary hardware to meet the system requirements. This involves significant cost and planning for enterprises.

Customers have built applications on SQL Server and most of these applications demand some form of reporting and Machine Learning capabilities on the data stored in SQL Server. Customers use SQL Server Machine Learning Services, launched in SQL Server 2016 with R support and 2017 with Python support to run any ML capabilities within the SQL Server database instances. However, when using the ML services the R or Python code is wrapped within an sp_execute_external_script stored procedure in T-SQL and customers miss getting any IntelliSense and debugging capabilities. I’ve seen instances where data scientists query the SQL instances and pull the data outside SQL Server to create their ML models and then store these ML models as binary object within SQL Server and then score against it. In this approach, the moment the data is pulled outside SQL Server the trust boundaries of the data are lost and customer data is potentially exposed to more surface areas for attack.

Now in 2024, we see a new advent of workloads where enterprise customers are trying to enable GenAI capabilities over their databases. Enterprises are either trying to improve efficiency for their customers to find information correctly or improve the overall experience of their applications. For outwards facing use cases, customers want to have capabilities like enterprise search on their data and replacing current drop downs and filters in their applications to just providing a simple search like experience for their customers to ask questions in natural language and get responses from their databases.

From both at my time in Microsoft and Amazon, I’ve seen BI teams being randomized with constant questions which leadership team asks on the data, and every time a new ad-hoc report gets created and enterprises end up creating hundreds of reports wasting both time and resources. We observe internal facing use cases where customers ask ad-hoc questions over their database instances and replacing manually created reports over their SQL instances in SSRS and PowerBI with asking questions in natural language. Imagine if enterprises had a natural language search bar for leadership to ask questions on their database instances which showed them all the results across thousands of tables.

In Tursio, we are turning SQL Servers into GenAI machines. Enterprise customers running SQL Server instances anywhere — on-premises (yes you heard, right !) and in the cloud can get an in-situ GenAI solution using Tursio. Tursio can be deployed entirely on-premises (without any cloud connectivity) where enterprise customers can ask questions in natural language and get responses from within their databases. All the data modeling happens inside SQL Server instances and there is zero data movement. None of the data ever leaves your SQL Server. Tursio understands the ontology of the data and as the underlying data changes the models are constantly refreshed providing customers with the accurate and updated results from the database whenever the question has been asked. Enterprises can invoke the same search bar using a simple Rest API endpoint from within their applications. Tursio tries to look beyond just answering questions which enterprises are asking but what value they are seeking once they get the answer — Are customers trying to predict demand? Are customers trying to find anomalies? Are customers trying to forecast? Are they trying to classify? Customers using the Tursio platform get predictive insights from the data allowing them to effectively make business decisions faster and improve time to value and all within 3 seconds. Customers can define their own KPI’s and Tursio constantly learns and fine tunes the data models providing accurate results from the data models created.

If you are a SQL Server customer and want to turbo charge your applications with GenAI capabilities without your data ever leaving SQL Server, feel free to drop a note below. In addition to SQL Server and Azure SQL, Tursio platform also supports additional databases and data warehouses like Microsoft Fabric, AWS Redshift, Snowflake, Google BigQuery, Teradata, PostgreSQL, MySQL etc. Here are some teaser screenshots of bringing generative AI to your data:

Example 1. Enterprise Search Questions using Tursio

Example 2. Understanding business KPIs using Tursio

Example 3. Analytical Questions using Tursio

Why & How Wayfair Migrated from OPA to Kyverno

Wayfair, a leading e-commerce platform in the Home Goods market, recently undertook a significant migration in its Kubernetes environment, transitioning from OPA (Open Policy Agent) to Kyverno. With around 14,000 employees, 2,000 engineers, and a substantial presence on Google Kubernetes Engine (GKE), Wayfair processes approximately 15,000 production deploys each month, emphasizing the scale and complexity of its operations.

In a recent presentation by Zach Swanson at Wayfair, key insights were shared about the Kubernetes infrastructure at Wayfair as well as their Kyverno adoption journey. They run large multi-tenant clusters to accommodate their extensive developer community, treating each developer group as an isolated tenant. The use of Kyverno admission policies has become integral, managing around 56 validate rules and 20 mutate policies across the clusters. This approach allows Wayfair to protect its platform, preventing potential issues like misrouted traffic, insecure ingress configurations, and inadvertent resource mismanagement.

Kyverno Use Cases

The utilization of Kyverno at Wayfair falls into two broad categories. Firstly, Kyverno is employed to protect the platform. Beyond standard pod security, it is used to prevent various scenarios, such as unauthorized changes to ingress hosts, TLS declarations, and the enabling of features that could complicate issue tracking. Secondly, Kyverno is instrumental in seamlessly evolving the platform without requiring developers to make extensive changes. This involves the automatic adjustment of deprecated configurations, image registry failovers, and enhancements to resource efficiency, resulting in significant cost savings.

Reasons for Migrating to Kyverno

Wayfair’s decision to migrate from OPA to Kyverno was driven by several compelling factors. OPA’s Rego language, while powerful, posed challenges in terms of complexity, especially in comparison to Kyverno. Documentation gaps and subtle differences between Gatekeeper (OPA-based) and OPA itself further contributed to the decision. Notably, Wayfair lacked a centralized policy team, and the versatility of Kyverno allowed them to adopt a more streamlined approach. The Kyverno community’s responsiveness, coupled with an extensive public policy library, further solidified the benefits of the migration.

Migration Process

The migration process at Wayfair was a well-structured and methodical approach. It began with a crucial concept demo, showcasing Kyverno’s ability to handle complex constraints. Subsequently, Gatekeeper constraints were systematically retooled into Kyverno policies, with parallel deployment and confidence-building through testing utilities. Policies were transitioned from auditing to enforcing mode, ensuring alignment with existing Gatekeeper policies. The gradual disabling of Gatekeeper constraints marked the successful completion of the migration, emphasizing the straightforward nature of transitioning from OPA to Kyverno.

Summary

Wayfair’s migration from OPA to Kyverno reflects a strategic move to enhance the manageability, simplicity, and responsiveness of their Kubernetes environment. The shift not only addressed challenges associated with OPA but also empowered Wayfair to seamlessly adapt its platform, safeguard against potential issues, and significantly reduce resource allocation. This case study serves as valuable insight for organizations considering a similar transition, highlighting the benefits of Kyverno in managing Kubernetes policies at scale.

Are you interested in learning more about how to secure your Kubernetes clusters using Kyverno? Check out this ebook: Securing Kubernetes Using Policy-as-code powered by Kyverno