What is “Content-Type: application/x-protobuf”: Protobuf Explained For Hackers

As security researchers, we are required to look under the hood of various applications. The normal user looks at the UI of the application and ignores whatever happens on the backend, but security researchers always concentrate on what is happening on the backend.

So, this happened today: A friend wrote to me saying that she and her friends have deleted their HouseParty accounts because it was reported in the news that the application is basically a Trojan Horse capable of compromising other sensitive applications installed on the device such as Paypal and Netflix. To any sane power user, this claim itself outrageous.

Nonetheless, I decided to look under the hood to see what’s happening. I wasn’t much interested in finding whether the app really attempts to compromize other applications, because no serious startup would try to do that when they have an awesome product and are really getting good traction.

Hello Weird Protocol!

I had to setup my laptop again to bypass SSL pinning implemented by HouseParty and intercept the traffic on BurpSuite. When I started looking at the pattern of the traffic being transmitted from & to the app, it did not make sense to me at first. Following is an example of a response the server returns when searching all users with the string “amey”:

Protocol Buffer Response

The formatting looked weird. I could read most of the text elements being transmitted, but the traffic also contained these strange unicode characters. And overall there wasn’t much of a visible structure to the data. I wasn’t able to understand how would the server make sense of this data and at the same time how is the App making sense of the data received in the same format.

Whatever limited sense I could make of the data was this:

  1. The 24-character hexadecimal string are probably MongoDB Object IDs representing the user unique identifier in the database
  2. It’s followed with what looks like a username
  3. Followed by the user’s full name
  4. Next line has the word “relationship” which I couldn’t make much sense of
  5. Some users also have a string with the words “stell-prod-up”. I figured out from Burp logs that these are URLs to the user’s profile image.
  6. The structure ends with the same Object ID it started with, followed by the next object.

Well, let’s take a look at the “Content-Type” header on the response.

Content-Type: application/x-protobuf

What’s that? I have never come across that before. So, I turned to Google for answers and it turned out Google had all the answers. I mean not Google as a search engine, but Google as the technology giant. “protobuf” stands for Protocol Buffers and Google created this format back in 2001 for internal use and released it for public use in 2008. I started asking myself, what’s wrong with JSON. I always felt like JSON is the most beautiful data structure out there. It really is able to depict the real world through its format of nested relationships. Then why are developers turning to protobuf when JSON does all that an application needs to do for data transfer.

Well, there are two key answers: Performance and Seamless Backward Compatibility in case of future schema changes.

Now, I am not a developer to comment on performance and the backward compatibility. As a security researcher, it is enough for me at this point to understand how this data structure works and how I can attack an application using this format. All the answers are present at the below link, but I will try to summarize what I understood out of my study of this format.

https://developers.google.com/protocol-buffers

Protocol Buffers are language independent, platform independent mechanism for serializing structured data. If you want to understand what Data Serialization means read the Wiki.

How It Works?

Protocol Buffer requires the definition of a schema for the data it is representing. This schema can defined by the developers in a .proto file. This file uses the construct called as a “message” to define data objects. Messages can contain attributes related to the object. These attributes can be of Scalar types such as int, float, string, etc., Enums, Messages themselves to provide nesting or user-defined types within the .proto.

An example .proto file pulled straight from the documentation is as follows:

message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;

enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}

message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}

repeated PhoneNumber phone = 4;
}

Understanding The .proto:

If you have ever worked with a schema definition library like Mongoose, this will be easy to understand.

Here we see a “Person” object being defined with six attributes described below:

Attribute NameAttribute TypeAttribute Properties
namestringrequired
idint32required
emailstringoptional
PhoneTypeenumEnum options: MOBILE, HOME, WORK
PhoneNumbermessageNested Attributes:
number, string, required
PhoneType, type, optional, defaults to the value HOME
phoneuser-defined type:
PhoneNumber
repeated:
meaning there can be zero or more instances of this attribute

What about the numbers and the equal sign that follow the attributes?

As per the Protocol Buffer spec, all scalar attributes need to have a unique numbering within the message. These numbers can be in the range of 1 to 2^29. So, in the example .proto above, Person has 4 scalars and PhoneNumber has 2.

The values you see in front of the enum options are called enumerator constants. They start with 0 and are different from scalar identifiers explained above. Multiple options within an enum can share the same constant to provide for aliasing.

So, you have defined your .proto. What next?

Compiling .proto

This .proto file is then given to the protocol buffer compiler for your language of choice. This compiler then creates the code with the classes and function defined for you to include in your code. This functions can include setters such as person.set_name(), person.set_id(), person.set_email(), serializers such as person.SerializeToOstream(). These can be used to create & manipulate objects and convert them to serialized format for storage or transmission. On the other hand, the compiler builds deserealizing functions such as person.ParseFromIstream() to convert the input protocol buffer stream to a binary object and getters such as person.name(), person.email() or person.id().

The compiler does its job, now all the developers have to do is utilize these functions in their code.

How Can You Attack Protobuf?

We are security researchers. We are more interested in breaking things than building them. So, when I understood protobuf this, I asked myself: What kind of vulnerabilities can be introduced to a platform through the use of protocol buffer.

The first and the most obvious answer was Insecure Deserealization (A8 – OWASP Top 10, 2017). Yes, that’s right. If an attacker is able to modify the content of an input protocol buffer, and if this input stream is not validate before deserealization and gets used in the code, really bad things can happen. Worst: a Remote Code Execution. More info on deserializaiton can be found HERE.

I tried googling “Protobuf Deserialisation Attack” to see if anything had been written on this topic and the very first result was this awesome blog:

https://medium.com/@marin_m/how-i-found-a-5-000-google-maps-xss-by-fiddling-with-protobuf-963ee0d9caff

Security researcher Marin Mouldier was able to manipulate parameters serialized in the protobuf format for Google Maps to trigger an XSS in the scope of google.com. His writeup is very detailed and to be frank, I did not read every minute detail in it. But if you are interested to jump into the details, it is a great read.

Insecure deserialization is just a special case of missing input validation which is the basic defence against all web-request-based attack types. Once, you have understood how the data is encoded using protobuf for the platform you are testing, you can then modify and encode the parameters in the request to see if the backend misses the validation of any critical parameters. This opens the pathways to all sorts of bugs SQLi, XSS, SSRF, SSTI and Command Injection.

Why Protobuf, not JSON?

My thirst for understanding protobuf had quenched and I realized I learned something awesome and valuable today. Then I got thinking, what are the advantages of protobuf over JSON. So, I turned back to my browser and googled “protobuf vs JSON” and I was directed to this beautiful article by Anna Jones at bizety.com:

https://www.bizety.com/2018/11/12/protocol-buffers-vs-json/

It turns out that through multiple real world tests conducted by some renowned tech companies, protobuf was observed to be twice as fast compared to JSON. Google says that protobuf is 20 to 100 times faster than XML.

But, does that mean that everyone should right away drop JSON and start using protobuf for their data transfer between the fronted and the backend. For me as a security researcher that would be bad news because JSON helps you understand the data that is being sent to and from the server aiding in your understanding of the application at hand.

This blog on codeclimate.com provides five reasons when JSON makes better sense for developers as well. I am mentioning them over here verabtim:

“There do remain times when JSON is a better fit than something like Protocol Buffers, including situations where:

  • You need or want data to be human readable
  • Data from the service is directly consumed by a web browser
  • Your server side application is written in JavaScript
  • You aren’t prepared to tie the data model to a schema
  • You don’t have the bandwidth to add another tool to your arsenal
  • The operational burden of running a different kind of network service is too great”

It Helps To Be Armed With Knowledge

As security researches it is important to know these bits and pieces of so many different technologies. You never know the next platform you pick for bounty hunting or pentesting may very well be using protobuf and if you have taken time out in the past to understand this protocol, you can jump right into the exploitation phase and skip the learning curve.

Fin.

Side note: I wrote this blog because I realized that teaching a subject to others is the only way you can know whether you truly understand it. I love to learn, but I am bad at retaining stuff in my brain. Writing about the subject I just learned helps me retain the information longer.