bug bounty – TechKranti https://techkranti.com CyberSecurity Revolution Thu, 26 Nov 2020 05:37:53 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.2 169658937 [Video] HTTP Request Smuggling Explained: Part 1 https://techkranti.com/http-request-smuggling-part-1/ Thu, 15 Oct 2020 04:44:16 +0000 http://techkranti.com/?p=250 In this video, I have tried my best to explain the Request Smuggling attack by first explaining how a server handles HTTP requests based on Content-Length and Transfer-Encoding Headers.

I will soon follow up with another video on HTTP Request Smuggling where I will attempt to solve a PortSwigger Academy Lab on Request Smuggling.

I hope you learn something new from this. If something isn’t clear from the video, please leave a comment and I will try to get back to you with an answer.

Happy Learning!

]]>
250
Delete IDOR on a Fashion eCommerce Website https://techkranti.com/delete-idor-on-a-fashion-ecommerce-website/ Wed, 26 Aug 2020 17:21:37 +0000 http://techkranti.com/?p=223 Yes, another IDOR. I find a lot of IDORs. It’s my favourite class of bugs. This is a story of an IDOR I reported on an Asian fashion eCommerce website’s private program. Let’s get on with the story:

The Feature:

Like most eCommerce websites, this website provided a feature to store addresses in the customer’s account. The Account section had an option to manage addresses. The address management API calls looked like these:

Note: Requests have been URL decoded for better readability

Create An Address:
POST /customer/address/create/?ajax=true&address_id= HTTP/1.1

AddressForm[first_name]=Amey&AddressForm[last_name]=Test&AddressForm[cell_phone]=2418174&phone_prefix=8,9&phone_len=7&AddressForm[phone]=&AddressForm[address1]=2418174&AddressForm[address2]=&AddressForm[postcode]=241817&AddressForm[fk_customer_address_region]=&AddressForm[city]=Singapore&AddressForm[fk_country]=198&AddressForm[id_customer_address]=
Edit An Address:
POST /customer/address/edit/?ajax=true&address_id=2417909 HTTP/1.1

AddressForm[first_name]=Name&AddressForm[last_name]=Two&AddressForm[cell_phone]=98989898&phone_prefix=8,9&phone_len=7&AddressForm[phone]=&AddressForm[address1]=Lkb 122, asdsaasd sad Rd&AddressForm[address2]=12-321&AddressForm[postcode]=423122&AddressForm[fk_customer_address_region]=&AddressForm[city]=Singapore&AddressForm[fk_country]=198&AddressForm[id_customer_address]=2417909
Delete An Address:
GET /customer/address/delete/?address_id=2418179&YII_CSRF_TOKEN=token HTTP/1.1

Note that there are two injection points for triggering an IDOR in the above requests, except for delete. One in the URL query parameter and another in the POST body parameter. I tried all possible ways to impact an address that I did not own by changing the id’s being passed in the request. I tried things like Parameter Pollution, Passing multiple IDs in a array, varying values in the query parameter and body parameter. But none of them seemed to work.

While hunting further on the application, I noticed that it provided an option to add an address in the checkout flow as well. The thing that caught my attention was that the endpoint called to create an address from the checkout flow was different than the one we saw above. The create address request from checkout flow looked like this:

POST /checkout/shipment/saveaddress/ HTTP/1.1

AddressForm[first_name]=Amey&AddressForm[last_name]=Test&AddressForm[cell_phone]=2418174&phone_prefix=8,9&phone_len=7&AddressForm[phone]=&AddressForm[address1]=2418174&AddressForm[address2]=&AddressForm[postcode]=241817&AddressForm[id_customer_address]=

I immediately tried manipulating the value of the “AddressForm[id_customer_address]” parameter and set it to an address ID belonging to another test account I had created. I checked the ID of the newly created address and the ID returned from the server matched the ID I had entered when creating the request. Feeling elated, I headed over to the other test account to check if the address was deleted from there. I refreshed the page, but somehow the address with the same ID was present there as well and the information contained in the address was unaffected. The manipulated request had caused no visible impact to the victim account’s address.

While I was still logged into the victim’s account, I attempted to edit the target address in the victim’s account. To my surprise, as soon as I tried to edit it, the address disappeared from the list of address in the victim’s account. I realized that something has happened because of my malformed request.

I used the same technique again to delete another address from victim’s account. This time to verify the change, I logged out and logged in again to the victim’s account. This time when I visited the Address Management page, the target address was no longer present in the victim’s account. This proved that I was able to delete addresses from other users’ accounts by manipulating the “AddressForm[id_customer_address]” parameter.

I came to the conclusion that a logout was required to see the change because the request was probably being served by a cache server and hence the address was still visible in the victim’s account after deletion. This is true for a lot of websites. Any unintended changes to a user’s information are reflected when a new session is created.

If you have any feedback or would like to ask any questions related to hacking, hit me up on twitter @ameyanekar. My DMs are open.

]]>
223
What is “Content-Type: application/x-protobuf”: Protobuf Explained For Hackers https://techkranti.com/what-is-protobuf-explained-for-hackers/ Wed, 01 Apr 2020 08:45:29 +0000 http://techkranti.com/?p=143 As security researchers, we are required to look under the hood of various applications. The normal user looks at the UI of the application and ignores whatever happens on the backend, but security researchers always concentrate on what is happening on the backend.

So, this happened today: A friend wrote to me saying that she and her friends have deleted their HouseParty accounts because it was reported in the news that the application is basically a Trojan Horse capable of compromising other sensitive applications installed on the device such as Paypal and Netflix. To any sane power user, this claim itself outrageous.

Nonetheless, I decided to look under the hood to see what’s happening. I wasn’t much interested in finding whether the app really attempts to compromize other applications, because no serious startup would try to do that when they have an awesome product and are really getting good traction.

Hello Weird Protocol!

I had to setup my laptop again to bypass SSL pinning implemented by HouseParty and intercept the traffic on BurpSuite. When I started looking at the pattern of the traffic being transmitted from & to the app, it did not make sense to me at first. Following is an example of a response the server returns when searching all users with the string “amey”:

Protocol Buffer Response

The formatting looked weird. I could read most of the text elements being transmitted, but the traffic also contained these strange unicode characters. And overall there wasn’t much of a visible structure to the data. I wasn’t able to understand how would the server make sense of this data and at the same time how is the App making sense of the data received in the same format.

Whatever limited sense I could make of the data was this:

  1. The 24-character hexadecimal string are probably MongoDB Object IDs representing the user unique identifier in the database
  2. It’s followed with what looks like a username
  3. Followed by the user’s full name
  4. Next line has the word “relationship” which I couldn’t make much sense of
  5. Some users also have a string with the words “stell-prod-up”. I figured out from Burp logs that these are URLs to the user’s profile image.
  6. The structure ends with the same Object ID it started with, followed by the next object.

Well, let’s take a look at the “Content-Type” header on the response.

Content-Type: application/x-protobuf

What’s that? I have never come across that before. So, I turned to Google for answers and it turned out Google had all the answers. I mean not Google as a search engine, but Google as the technology giant. “protobuf” stands for Protocol Buffers and Google created this format back in 2001 for internal use and released it for public use in 2008. I started asking myself, what’s wrong with JSON. I always felt like JSON is the most beautiful data structure out there. It really is able to depict the real world through its format of nested relationships. Then why are developers turning to protobuf when JSON does all that an application needs to do for data transfer.

Well, there are two key answers: Performance and Seamless Backward Compatibility in case of future schema changes.

Now, I am not a developer to comment on performance and the backward compatibility. As a security researcher, it is enough for me at this point to understand how this data structure works and how I can attack an application using this format. All the answers are present at the below link, but I will try to summarize what I understood out of my study of this format.

https://developers.google.com/protocol-buffers

Protocol Buffers are language independent, platform independent mechanism for serializing structured data. If you want to understand what Data Serialization means read the Wiki.

How It Works?

Protocol Buffer requires the definition of a schema for the data it is representing. This schema can defined by the developers in a .proto file. This file uses the construct called as a “message” to define data objects. Messages can contain attributes related to the object. These attributes can be of Scalar types such as int, float, string, etc., Enums, Messages themselves to provide nesting or user-defined types within the .proto.

An example .proto file pulled straight from the documentation is as follows:

message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;

enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}

message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}

repeated PhoneNumber phone = 4;
}

Understanding The .proto:

If you have ever worked with a schema definition library like Mongoose, this will be easy to understand.

Here we see a “Person” object being defined with six attributes described below:

Attribute NameAttribute TypeAttribute Properties
namestringrequired
idint32required
emailstringoptional
PhoneTypeenumEnum options: MOBILE, HOME, WORK
PhoneNumbermessageNested Attributes:
number, string, required
PhoneType, type, optional, defaults to the value HOME
phoneuser-defined type:
PhoneNumber
repeated:
meaning there can be zero or more instances of this attribute

What about the numbers and the equal sign that follow the attributes?

As per the Protocol Buffer spec, all scalar attributes need to have a unique numbering within the message. These numbers can be in the range of 1 to 2^29. So, in the example .proto above, Person has 4 scalars and PhoneNumber has 2.

The values you see in front of the enum options are called enumerator constants. They start with 0 and are different from scalar identifiers explained above. Multiple options within an enum can share the same constant to provide for aliasing.

So, you have defined your .proto. What next?

Compiling .proto

This .proto file is then given to the protocol buffer compiler for your language of choice. This compiler then creates the code with the classes and function defined for you to include in your code. This functions can include setters such as person.set_name(), person.set_id(), person.set_email(), serializers such as person.SerializeToOstream(). These can be used to create & manipulate objects and convert them to serialized format for storage or transmission. On the other hand, the compiler builds deserealizing functions such as person.ParseFromIstream() to convert the input protocol buffer stream to a binary object and getters such as person.name(), person.email() or person.id().

The compiler does its job, now all the developers have to do is utilize these functions in their code.

How Can You Attack Protobuf?

We are security researchers. We are more interested in breaking things than building them. So, when I understood protobuf this, I asked myself: What kind of vulnerabilities can be introduced to a platform through the use of protocol buffer.

The first and the most obvious answer was Insecure Deserealization (A8 – OWASP Top 10, 2017). Yes, that’s right. If an attacker is able to modify the content of an input protocol buffer, and if this input stream is not validate before deserealization and gets used in the code, really bad things can happen. Worst: a Remote Code Execution. More info on deserializaiton can be found HERE.

I tried googling “Protobuf Deserialisation Attack” to see if anything had been written on this topic and the very first result was this awesome blog:

https://medium.com/@marin_m/how-i-found-a-5-000-google-maps-xss-by-fiddling-with-protobuf-963ee0d9caff

Security researcher Marin Mouldier was able to manipulate parameters serialized in the protobuf format for Google Maps to trigger an XSS in the scope of google.com. His writeup is very detailed and to be frank, I did not read every minute detail in it. But if you are interested to jump into the details, it is a great read.

Insecure deserialization is just a special case of missing input validation which is the basic defence against all web-request-based attack types. Once, you have understood how the data is encoded using protobuf for the platform you are testing, you can then modify and encode the parameters in the request to see if the backend misses the validation of any critical parameters. This opens the pathways to all sorts of bugs SQLi, XSS, SSRF, SSTI and Command Injection.

Why Protobuf, not JSON?

My thirst for understanding protobuf had quenched and I realized I learned something awesome and valuable today. Then I got thinking, what are the advantages of protobuf over JSON. So, I turned back to my browser and googled “protobuf vs JSON” and I was directed to this beautiful article by Anna Jones at bizety.com:

https://www.bizety.com/2018/11/12/protocol-buffers-vs-json/

It turns out that through multiple real world tests conducted by some renowned tech companies, protobuf was observed to be twice as fast compared to JSON. Google says that protobuf is 20 to 100 times faster than XML.

But, does that mean that everyone should right away drop JSON and start using protobuf for their data transfer between the fronted and the backend. For me as a security researcher that would be bad news because JSON helps you understand the data that is being sent to and from the server aiding in your understanding of the application at hand.

This blog on codeclimate.com provides five reasons when JSON makes better sense for developers as well. I am mentioning them over here verabtim:

“There do remain times when JSON is a better fit than something like Protocol Buffers, including situations where:

  • You need or want data to be human readable
  • Data from the service is directly consumed by a web browser
  • Your server side application is written in JavaScript
  • You aren’t prepared to tie the data model to a schema
  • You don’t have the bandwidth to add another tool to your arsenal
  • The operational burden of running a different kind of network service is too great”

It Helps To Be Armed With Knowledge

As security researches it is important to know these bits and pieces of so many different technologies. You never know the next platform you pick for bounty hunting or pentesting may very well be using protobuf and if you have taken time out in the past to understand this protocol, you can jump right into the exploitation phase and skip the learning curve.

Fin.

Side note: I wrote this blog because I realized that teaching a subject to others is the only way you can know whether you truly understand it. I love to learn, but I am bad at retaining stuff in my brain. Writing about the subject I just learned helps me retain the information longer.

]]>
143
How I Reported a DoS Vulnerability to AWS https://techkranti.com/how-i-reported-a-dos-vulnerability-to-aws/ Wed, 11 Mar 2020 00:01:09 +0000 http://techkranti.com/?p=112

As much as I love reading bug hunting stories, I enjoy writing them too. This story was waiting long in my drafts pending disclosure approval from two companies involved with this bug. In my bug hunting stories, I intend to document my thought process, which concentrates on what didn’t work as much as on what worked. So let’s get on with the story…

Chapter 1: Strange Behaviour Encountered
While hunting for bugs on a private program, I came across a weird endpoint. I had been hunting on this  platform for quite a few days until then. When I was going through the Sitemap on Burp, I realised an endpoint – ‘/administrator’. In so many days of my testing, I had never come across this endpoint. Obviously, this endpoint must have been hidden somewhere without an actual call to the endpoint ever being made and Burp picked it up through scraping. Excited on reading the word ‘administrator’, I jumped right into my browser and requested http://platform.com/administrator. In response, I got a JSON reading:

{"message": “[X.X.X.X] Thanks for the visit."}

where X.X.X.X was my IP address. “Hmm.. strange behaviour”, I thought at first. When I checked my Burp logs, I realized that this endpoint was redirecting to:

some-random-id.execute-api.eu-west-1.amazonaws.com/ProdStage/administrator

And this was the actual endpoint that was sending the above mentioned response. I tried to directory brute force it, but I got the same result on all requests. Not able to figure out what was going on, I decided to keep this aside and focus on finding bugs on the visible attack surface of the platform.

Chapter 2: It Got Stranger:
But guess what, I was no longer able to access the platform. Whenever I tried to, I was welcomed with CloudFront 403 Forbidden page:

CloudFront 403 Forbidden

Strange. Time for a hypothesis:

Hypothesis #1: My IP was blocked for visiting the /administrator endpoint

To test this hypothesis, I tried changing my IP address through a VPN and was able to access the platform again. Then I tried visiting this /administrator endpoint through the VPN. Now, when I tried accessing the site again – “Blocked”. So, Hypothesis #1 was proven. Visiting the /administrator gets your IP into some sort of a blacklist. My immediate intuition was that there is something very important at this endpoint for the platform to have it configured in such a restrictive way. So, I had to probe further.

Chapter 3: I thought I was on a winning trail
Excited to probe further, I built my next hypothesis.
Hypothesis #2: Something super critical is hidden behind this endpoint

I wanted to understand what was being used to perform build the blacklist. So, I did the following Google search:

execute-api "Thanks for the visit”

And the very first result was this SANS document: https://www.sans.org/cyber-security-summit/archives/file/summit-archive-1519149524.pdf.

On searching within the document for the term “Thanks for the visit”, I was thrown to some source code written on Page 27 of the document, an excerpt of which can be seen below.

Excerpt From The Source Code

On reading the heading, I was like: “WTF! A honeypot? I was lured by a honeypot and got myself blacklisted. Damn!”. So, Hypothesis #2 was disproven. There was nothing critical behind this endpoint.

Chapter 4: About To Drop Testing
I wasn’t finding anything interesting behind this endpoint and thought that I should drop testing it and figure out a way to get out of the blacklist by contacting the site support. As I was about to give up on this endpoint, I noticed this line of code in the SANS document:

source_ip = event['headers']['X-Forwarded-For'].encode('utf8').split(',')[0].strip()

“X-Forwarded-For, Interesting!”, I said to myself. So, the endpoint was using the XFF header to determine the IP address to be blacklisted. I had read bug reports earlier where the target server behaved differently when XFF header was sent by the client. Another hypothesis came to my mind.

Hypothesis #3: If I passed my own XFF header along with the request, I can fool the endpoint to pick the IP address in the header instead of picking my actual IP address.

Now, this was a far-fetched hypothesis which I had presumed to not work, but was anyways going to test it. So, lets see what happened when I tested this:

The Reflection

Well, well! My hypothesis was half proven. The endpoint responds with the IP address in the XFF header instead of returning my actual IP address. I say half-proven because I wanted to ensure that I am also able to get that IP address in the XFF header blocked. The test for this was simple.

Chapter 5: The Litmus Test

  • Installed a VPN extension, called Betternet, for Chrome and enabled the VPN.
  • Verified that I was able to access the site through the VPN.
  • Checked my VPN public IP address by googling ‘What is my IP’.
  • Sent the below request through Burp repeater where X.X.X.X was the VPN IP: (this request will not go through the VPN because the VPN is just running as an extension on Chrome)
GET /ProdStage/administrator HTTP/1.1
Host: some-random-id.execute-api.eu-west-1.amazonaws.com
X-Forwarded-For: X.X.X.X
Connection: close
  • The response reflects back the IP address from the XFF header as expected. Now, it was time to check whether the VPN IP was actually blacklisted.
  • I tried accessing the site through the same VPN and guess what! I was welcomed with a sweet CloudFront 403 Forbidden page. Hypothesis #3 proved!

Let me explain the impact of this vulnerability. An attacker who passes random IP addresses through the XFF header to this endpoint will end up blacklisting legitimate users from the platform. With enough brute force, the attacker can end up adding the entire public IP range to the blacklist effectively causing a massive Denial of Service. Also, an attacker can affect a platform’s Google ranking by blacklisting Google crawler IP addresses.

Chapter 6: Digging Further

Before reporting this right away to the program, my mind started to prepare another hypothesis. I asked myself, “Could this possibly be a problem with AWS?”.

Hypothesis #4: Since this endpoint is hosted on AWS, maybe this honeypot is part of an AWS service offering and a lot of people may be using it.

I tried to think of a Google Dork that could help me search for all such endpoints. But, then I realized that site admins would not be so stupid to have Google crawl this endpoint and block it from indexing the entire site. To see if I was right I headed to robots.txt of the site I was testing and there the very first entry of the Disallow directive was: /administrator. So, I knew right away that Google Dorking would be futile.

To know more about this endpoint, I started my research by googling “Bad Bot Parser Function” and somehow ended up on this Github Repository:

https://github.com/awslabs/aws-waf-security-automations

Specifically, on this file:

https://github.com/awslabs/aws-waf-security-automations/blob/master/source/access-handler/access-handler.py

The code in this file looked almost similar to the one I found in the SANS deck Page 27 above. This repository was owned by “Amazon Web Services – Labs” and it was a verified account, so this proved my hypothesis #4. This endpoint was offered by AWS as a feature for blocking automated bots, spiders from scraping the website.

Also, on reading the code I realized that this is an AWS Lambda function. Actually, I should have concluded this on seeing the “execute-api” endpoint itself. But, we learn from experience. If I had to report this to AWS, I should have concrete evidence that my hypothesis stands correct for this code as well and is not isolated to the program I was testing. Again, this can easily be proven by just looking at the handler function code. But, the auditor inside me said: “We need to find blood on the floor” to make a strong report.

Chapter 7: Demonstrating our hypothesis

Well, I followed the instructions on the repo to setup my own AWS Honeypot. This AWS CloudFormation Template made it trivial for me to set it up within 15 minutes. Once, I had this setup, and got the unique honeypot URL, it was time to test it out. I followed the same steps that I did for testing the program’s /administrator endpoint and saw that my Lambda function reflected back the IP address passed in the XFF header. So, this proved Hypothesis #4 yet again for a different sample.

Before, reporting this bug to AWS, I wanted to get to the root cause of why this was happening. I headed over to AWS CloudWatch for checking my Lambda instance logs. To give you some background, Lambda services expose an ‘event’ JSON object which can be used in the handler code. As can be seen from the access-handler.py source code, the source_ip is extracted as follows:

source_ip = event['headers']['X-Forwarded-For'].split(',')[0].strip()

So, I had to check what was the value of event[‘headers’][‘X-Forwarded-For’] in my logs. Below is a screenshot of one such log:

CloudWatch Logs

In the screenshot, 15.76.50.0 was the IP I passed through the XFF header and the masked IP was my actual IP.

I concluded from the logs that a reverse proxy was handling the request first and then setting the XFF header before forwarding it to our Lambda function. But instead of replacing the XFF header that was passed by the client, it appended IP addresses to the request. So, the user passed XFF header IP address will always end up in the first position in the comma separated list of IP addresses. So, when the above line of code splits the value by commas and picks the 0th element of the resultant array, it will always end up with the user-passed XFF header IP address.

Chapter 8: The Report

I reported this to AWS and also to the H1 program through which I found this. The engineers at the program company were quick to fix it and they also pointed me to the repo which holds the vulnerable code. They suggested that I open a public GitHub issue to have the repo fix it. But since this had security implications, I decided against opening a public issue at that point.

After exchanging a few emails with the AWS Security Team, they acknowledged the issue and suggested that I open up a pull request to get the attention of the repository owners. This got me a bit excited because I always wanted to contribute to open source projects and had never got a good use case where I could contribute.

Chapter 9: The Fix

I contributed one line of code to the repository to fix this vulnerability and I created my first ever GitHub pull request:

https://github.com/awslabs/aws-waf-security-automations/pull/123

Note the screenshot of the CloudWatch log above. The actual IP address of the client can be seen in two places:
1. event[‘headers’][‘X-Forwarded-For’]
2. event[‘requestContext’][‘identity’][‘sourceIp’]

For event[‘headers’][‘X-Forwarded-For’], I could see that the actual IP address is always the second to last element in the comma-separated list of IP addresses in the XFF header. So, instead of picking 0th element from the comma separated array, the code could pick (len-2)nd element. That would ensure that the correct IP address always gets picked. However, this solution is not future-proof. Say, a change is made to the network architecture which adds or removes components between the reverse proxy server and the endpoint and that new component is now adding its own IP address to the list. This would cause the reverse proxy server’s IP to be the 2nd last element and the endpoint would again end up blacklisting the wrong IP address.

event[‘requestContext’][‘identity’][‘sourceIp’], however, was quite reliable and free of any ambiguity. The actual IP address is just a string which can simply be sent to the WAF for blacklisting, avoiding the unnecessary logic of splitting and picking the source IP address. Here’s the simple code diff for my fix:

-        source_ip = event['headers']['X-Forwarded-For'].split(',')[0].strip()
+        source_ip = event['requestContext']['identity']['sourceIp']

This bug hunting experience was a great learning experience for me. It was fun as it allowed me to learn a lot of new things and really made me think from the defensive side as well.

Credits

Kudos to Dan and Zack from AWS Security for being supportive throughout the process.

Something about the format of my bounty hunting stories:

Thank you for reading so far. When trying to read bug hunting articles, I always try to understand the thought process of the hacker behind the find. Just concentrating on the PoC won’t take us far. Understanding and adapting your thought process is key to bounty hunting. Hence, instead of just writing what worked, I focus on writing what did not work on way to the final step that led to the attack.

Happy Hacking!

]]>
112
How I discovered an SSRF leading to AWS Metadata Leakage https://techkranti.com/ssrf-aws-metadata-leakage/ Mon, 10 Feb 2020 18:21:44 +0000 http://techkranti.com/?p=75 I am writing this story to share an experience I had recently discovering an SSRF vulnerability for a private program on H1. Since the program is private, I won’t divulge much information related to the platform, rather I will discuss my thought process when testing for this vulnerability.

The Vulnerable Endpoint

This platform allowed user to upload a bunch of filetypes. To be precise about the filetype the platform supported, I am just going to post the input file HTML tag straight from the page:

<input type="file" class="button button--invert" onclick="this.value = null;" ngf-accept="file_input_accept_string || utils.FILE_INPUT_ACCEPT_STRING" ngf-select="onFileSelect($files)" accept=".jpg,.jpeg,.png,.tiff,.eps,.ico,.gif,.bmp,.txt,.md,.pdf,.html,.htm,.rtf,.doc,.docx,.xls,.xlsx,.ppt,.pps,.ppsx,.pptx,.dot,.dotx,.xlt,.xltx,.odt,.oth,.odg,.odp,.ods,.odi,.oxt,.csv,audio/*,text/*,application/msword*,application/vnd.ms-*,application/vnd.oasis.opendocument.*,application/pdf">

Whatever filetype you upload to the platform, it converts the file to a PDF soon after upload. One filetype that caught my eye from the above list was “.html”. I started mentally reverse engineering the server backend responsible for generating the PDF from the HTML file.

Hypothesis #1:
My initial guess was that the server must be using some sort of a headless browser to render the HTML page and then exporting the page to PDF. My best guess was that it would be a headless Chromium running via Puppeteer. It was time to verify my hypothesis. I put together a basic HTML file and checked Burp for the response body when downloading the PDF file. Turned out, it wasn’t using Headless Chrome. PDFs created using headless Chrome have a Creator tag with the value “Skia/PDF m77”. Skia is graphics library created and maintained by Google. It is responsible for the ‘Save As PDF’ feature you use on Chrome. Hypothesis #1 disproven.

I had no other hypothesis to understand how this PDF must have been generated. Now, was the time to probe deeper. I wanted to understand if the rendering engine used by the backed attempted to access external resources referenced in the webpage.

Question #1: Does the renderer access external resources?
I put together this simple HTML to test if external resources are accessed.

<!DOCTYPE html>
<html lang="en">
  <head>
    <title>Attack</title>
  </head>
  <body>
    <img src="https://ssl.gstatic.com/ui/v1/icons/mail/rfr/logo_gmail_lockup_default_2x.png"/>
  </body>
</html>

Let’s see what the server produces for us:

The answer is YES. The server does download external resources. This is where I realized that I may find a wonderful SSRF here. Moving ahead. Next question that came to my mind.

Question #2: How does the HTTP request look like when the server fetches external resources?
To check for SSRF, most people like to use Burp Collaborator. I, on the other hand, like to use https://postb.in. So, I created a postbin and generated the following HTML file:

<!DOCTYPE html>
<html lang="en">
  <head>
    <title>Attack</title>
  </head>
  <body>
    <img src="https://postb.in/1581356305957-2842711033299?hello=world"/>
  </body>
</html>

Now, let’s see what the server gives out. I uploaded this file, waited for the PDF generation to complete and headed over to postb.in to check the fetch log. This is what postb.in had to show:

Sweet!

Large programs will accept and award such a report because it exposes the real IP address of the requesting server. But I did not stop here. I wanted real impact to show.

On further analyzing the HTTP request, I noticed this weird looking user-agent string: wkhtmltopdf. The name itself reveals half the story. This is probably some library used by the server to convert HTML files to PDF. I googled it and was directed to https://wkhtmltopdf.org/. OK, so my Question #1 was answered. This is how the server converted HTML to PDF. This also partially proves Hypothesis #1 because wkhtmltopdf is indeed a headless user agent.

Now that I had a confirmed SSRF, it was time to try accessing sensitive endpoints or attempt to access internal files. Through initial recon, I had learned that this server was hosted on AWS. So the most sensitive endpoint to test would be the AWS Instance Metadata Service. More about this here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

Hypothesis #2:
‘wkhtmltopdf’ will render sensitive data on the renderer before processing the output to a PDF. To test this hypothesis, I used the following payload:

  <body>
    <iframe
      src="http://169.254.169.254/latest/meta-data/"
      width="500"
      height="1000"
    ></iframe>
  </body>

And here’s what the server gives us:

Voila!

Hypothesis Proven. That’s impact right there. It feels great to report such critical findings to programs. I reported this vulnerability at night. It was triaged the next morning by H1 staff and rated Critical with a score of 10.0. Sweet!

As a side note, most of the SSRF PoCs will have an attempt to access the internal network. That is true for traditional on-premise hosted servers. On the cloud, that’s not true, unless the company is running a VPC.

Something about the format of my bounty hunting stories:

If you have read to this point, you would probably be wondering that this article does not deserve this length. It would have been easier for the reader if the main PoC was summed up in the first or second para. Yes, you are right. But, I write for myself. When trying to read bug bounty articles, I always try to look for the thought process of the hacker behind the find. Just concentrating on the PoC won’t take you far. Understanding and adapting your thought process is key to bounty hunting. Hence, instead of just writing what worked, I focus on writing how I reached to the final step that leads to the attack.

Happy Hacking!

]]>
75