‹ Blogs

See it, Hack It, Sort It: How Open Source Software Protects Our AI Enablers

Featured Image
Published on February 24, 2025
Author Marcus Tenorio, ControlPlane

GPU Security in Cloud Native Environments: Protecting AI Infrastructure

GPUs have become the cornerstone of AI infrastructure, powering 80% of modern AI workloads and transforming how we approach machine learning applications. However, with this critical role comes increased security challenges and potential vulnerabilities. In a recent presentation, Marcus Tenorio from ControlPlane demonstrated real-world GPU attack scenarios and defence strategies in cloud native environments. This blog post will summarise key lessons from his “See it, Hack It, Sort It” talk, focusing on model poisoning, memory exploitation, and practical security measures to protect GPU resources in modern cloud infrastructure.

From Gaming to AI: A GPU Security Journey

Picture this: you’re a kid, playing games on a GPU with just 16 megabytes of memory. Fast forward to today, and those same GPUs are powering 80% of the world’s AI workloads. That’s where our story begins - a journey from simple graphics processing to the complex world of AI security.

Why Should You Care About GPU Security?

GPUs were once primarily associated with gaming, but today, they underpin 80% of AI workloads, making their security a mission-critical concern. Today, while we’re all excited about AI and machine learning, there’s a darker side we need to talk about. Imagine running a healthcare system that detects cancer - what happens if an attacker manipulates the model to misclassify tumours? The consequences could be catastrophic, leading to misdiagnoses and loss of trust in AI-driven healthcare. That’s why GPU security isn’t just about protecting hardware - it’s about protecting lives.

Key Considerations

Effective GPU security requires a layered approach, combining resource management, real-time monitoring, and access control. By integrating GPUs with Kubernetes extended resources and enforcing strict quotas, organisations can prevent resource overuse. Regular security audits ensure no unauthorised processes exploit GPU workloads.

In addition to resource control, monitoring plays a critical role. By deploying specialised GPU metrics collectors and integrating them with existing observability stacks, teams can track anomalies in real-time—preventing stealthy attacks before they escalate

Scenario 1: The Healthcare System Attack

The first scenario demonstrates how attackers can manipulate AI models in critical systems, showcasing the real-world impact of GPU security breaches.

Attack Breakdown

  • Attackers identified training windows through exposed monitoring systems
  • Model poisoning was achieved through carefully crafted malicious inputs
  • Cancer detection systems were compromised, leading to potential misdiagnosis
  • The attack remained undetected through multiple training cycles

Protection Strategy

  1. Training Process Security

    • Implementation of strict data validation protocols
    • Continuous monitoring of training patterns
    • Establishment of trusted data sources
    • Regular model validation checks
  2. Infrastructure Protection

    • Deployment of comprehensive monitoring solutions
    • Implementation of network segmentation
    • Regular security audits of GPU resources
    • Integration with existing security frameworks

Scenario 2: Memory Exploitation and Resource Theft

This scenario illustrates how attackers leverage GPU memory architecture for unauthorised cryptocurrency mining and resource theft.

Attack Methodology

  • Exploitation of unified memory architecture vulnerabilities
  • Buffer overflow attacks leading to system compromise
  • Resource hijacking for cryptocurrency mining operations
  • Manipulation of memory tables for persistent access

Defence Framework

  1. Memory Security

    • Strict access control implementations
    • Regular memory usage auditing
    • Buffer overflow protection mechanisms
    • Real-time memory pattern analysis
  2. Resource Monitoring

    • Power consumption baseline establishment
    • Continuous performance metrics tracking
    • Anomaly detection system deployment
    • Automated response protocols

Essential Protection Strategies

The most effective GPU security strategies combine traditional security principles with GPU-specific protections:

Infrastructure Security

  1. Resource Management

    • Integration with Kubernetes extended resources
    • Implementation of strict resource quotas
    • Regular resource allocation audits
    • Comprehensive access control policies
  2. Monitoring and Detection

    • Deployment of specialised GPU metrics collection
    • Integration with existing monitoring systems
    • Real-time alerting on anomalies
    • Historical pattern analysis

Policy Implementation

  1. Security Policies

    • Development of GPU-specific security policies
    • Integration with Falco for runtime security
    • Implementation of network policies through Cilium
    • Regular policy effectiveness reviews
  2. Access Control

    • Strict authentication requirements
    • Regular permission audits
    • Role-based access control implementation
    • Continuous access monitoring

Real-World Applications

Recent developments, including NVIDIA’s release of seven new CVEs, highlight the dynamic nature of GPU security. Organisations must maintain vigilance and adapt their security measures accordingly.

Implementation Steps

Begin with a thorough assessment phase involving comprehensive resource inventory, security posture evaluation, vulnerability identification, and risk prioritisation. Following assessment, deployment should proceed in phases, integrating monitoring solutions while establishing both policy enforcement mechanisms and response protocols for when incidents occur.

Looking Forward

As GPU utilisation continues to grow in cloud native environments, security strategies must evolve to address emerging attack vectors, new GPU architectures, integration challenges, and performance considerations. The rapid advancement of GPU technology demands equally sophisticated security approaches that can anticipate threats before they materialise.

Comprehensive Takeaways

A new security approach forms the foundation of effective GPU protection. Rather than treating GPU security as an isolated concern, forward-thinking organisations are weaving protection measures into their existing security frameworks. This integration creates overlapping layers of defence that reinforce one another, significantly raising the bar for potential attackers. Regular security assessments reveal blind spots before they can be exploited, whilst continuous improvement processes ensure that security measures evolve alongside both threats and GPU technology.

The implementation of GPU security requires thoughtful execution. Organisations finding the most success are leveraging the rich ecosystem of open source security tools, many of which now offer GPU-specific capabilities. Comprehensive monitoring serves as both a detection system and a source of valuable insights for ongoing security refinement. Effective metrics provide early warning of suspicious activities, often revealing attack patterns before damage occurs. The most resilient systems combine automated responses with human oversight, creating security systems that can adapt to novel threats in real time.

Future preparedness remains essential in the rapidly evolving GPU landscape. Security awareness must be cultivated across development, operations, and leadership teams to create a security-conscious culture. As attack methodologies evolve, protection strategies must transform in response, sometimes anticipating threats before they manifest. Building security resilience—the ability to withstand, recover from, and learn from security incidents—creates an organisation capable of maintaining operations even when facing sophisticated attacks.

By embracing these principles and implementing appropriate security measures, organisations can protect their GPU resources whilst maintaining operational efficiency in cloud native environments. The future of GPU security isn’t about choosing between performance and protection—it’s about designing systems where security enables innovation by providing a foundation of trust.

To watch this presentation, click here. If you’re looking to secure your AI Infrastructure, Control Plane offers penetration testing, threat modelling, and workshops to help you build a secure-by-design infrastructure. Contact us to learn more!