gyaan – TechKranti

What is “Content-Type: application/x-protobuf”: Protobuf Explained For Hackers

Amey Anekar — Wed, 01 Apr 2020 08:45:29 +0000

As security researchers, we are required to look under the hood of various applications. The normal user looks at the UI of the application and ignores whatever happens on the backend, but security researchers always concentrate on what is happening on the backend.

So, this happened today: A friend wrote to me saying that she and her friends have deleted their HouseParty accounts because it was reported in the news that the application is basically a Trojan Horse capable of compromising other sensitive applications installed on the device such as Paypal and Netflix. To any sane power user, this claim itself outrageous.

Nonetheless, I decided to look under the hood to see what’s happening. I wasn’t much interested in finding whether the app really attempts to compromize other applications, because no serious startup would try to do that when they have an awesome product and are really getting good traction.

Hello Weird Protocol!

I had to setup my laptop again to bypass SSL pinning implemented by HouseParty and intercept the traffic on BurpSuite. When I started looking at the pattern of the traffic being transmitted from & to the app, it did not make sense to me at first. Following is an example of a response the server returns when searching all users with the string “amey”:

Protocol Buffer Response

The formatting looked weird. I could read most of the text elements being transmitted, but the traffic also contained these strange unicode characters. And overall there wasn’t much of a visible structure to the data. I wasn’t able to understand how would the server make sense of this data and at the same time how is the App making sense of the data received in the same format.

Whatever limited sense I could make of the data was this:

The 24-character hexadecimal string are probably MongoDB Object IDs representing the user unique identifier in the database
It’s followed with what looks like a username
Followed by the user’s full name
Next line has the word “relationship” which I couldn’t make much sense of
Some users also have a string with the words “stell-prod-up”. I figured out from Burp logs that these are URLs to the user’s profile image.
The structure ends with the same Object ID it started with, followed by the next object.

Well, let’s take a look at the “Content-Type” header on the response.

Content-Type: application/x-protobuf

What’s that? I have never come across that before. So, I turned to Google for answers and it turned out Google had all the answers. I mean not Google as a search engine, but Google as the technology giant. “protobuf” stands for Protocol Buffers and Google created this format back in 2001 for internal use and released it for public use in 2008. I started asking myself, what’s wrong with JSON. I always felt like JSON is the most beautiful data structure out there. It really is able to depict the real world through its format of nested relationships. Then why are developers turning to protobuf when JSON does all that an application needs to do for data transfer.

Well, there are two key answers: Performance and Seamless Backward Compatibility in case of future schema changes.

Now, I am not a developer to comment on performance and the backward compatibility. As a security researcher, it is enough for me at this point to understand how this data structure works and how I can attack an application using this format. All the answers are present at the below link, but I will try to summarize what I understood out of my study of this format.

https://developers.google.com/protocol-buffers

Protocol Buffers are language independent, platform independent mechanism for serializing structured data. If you want to understand what Data Serialization means read the Wiki.

How It Works?

Protocol Buffer requires the definition of a schema for the data it is representing. This schema can defined by the developers in a .proto file. This file uses the construct called as a “message” to define data objects. Messages can contain attributes related to the object. These attributes can be of Scalar types such as int, float, string, etc., Enums, Messages themselves to provide nesting or user-defined types within the .proto.

An example .proto file pulled straight from the documentation is as follows:

message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;

enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}

message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}

repeated PhoneNumber phone = 4;
}

Understanding The .proto:

If you have ever worked with a schema definition library like Mongoose, this will be easy to understand.

Here we see a “Person” object being defined with six attributes described below:

Attribute Name	Attribute Type	Attribute Properties
name	string	required
id	int32	required
email	string	optional
PhoneType	enum	Enum options: MOBILE, HOME, WORK
PhoneNumber	message	Nested Attributes: number, string, required PhoneType, type, optional, defaults to the value HOME
phone	user-defined type: PhoneNumber	repeated: meaning there can be zero or more instances of this attribute

What about the numbers and the equal sign that follow the attributes?

As per the Protocol Buffer spec, all scalar attributes need to have a unique numbering within the message. These numbers can be in the range of 1 to 2^29. So, in the example .proto above, Person has 4 scalars and PhoneNumber has 2.

The values you see in front of the enum options are called enumerator constants. They start with 0 and are different from scalar identifiers explained above. Multiple options within an enum can share the same constant to provide for aliasing.

So, you have defined your .proto. What next?

Compiling .proto

This .proto file is then given to the protocol buffer compiler for your language of choice. This compiler then creates the code with the classes and function defined for you to include in your code. This functions can include setters such as person.set_name(), person.set_id(), person.set_email(), serializers such as person.SerializeToOstream(). These can be used to create & manipulate objects and convert them to serialized format for storage or transmission. On the other hand, the compiler builds deserealizing functions such as person.ParseFromIstream() to convert the input protocol buffer stream to a binary object and getters such as person.name(), person.email() or person.id().

The compiler does its job, now all the developers have to do is utilize these functions in their code.

How Can You Attack Protobuf?

We are security researchers. We are more interested in breaking things than building them. So, when I understood protobuf this, I asked myself: What kind of vulnerabilities can be introduced to a platform through the use of protocol buffer.

The first and the most obvious answer was Insecure Deserealization (A8 – OWASP Top 10, 2017). Yes, that’s right. If an attacker is able to modify the content of an input protocol buffer, and if this input stream is not validate before deserealization and gets used in the code, really bad things can happen. Worst: a Remote Code Execution. More info on deserializaiton can be found HERE.

I tried googling “Protobuf Deserialisation Attack” to see if anything had been written on this topic and the very first result was this awesome blog:

https://medium.com/@marin_m/how-i-found-a-5-000-google-maps-xss-by-fiddling-with-protobuf-963ee0d9caff

Security researcher Marin Mouldier was able to manipulate parameters serialized in the protobuf format for Google Maps to trigger an XSS in the scope of google.com. His writeup is very detailed and to be frank, I did not read every minute detail in it. But if you are interested to jump into the details, it is a great read.

Insecure deserialization is just a special case of missing input validation which is the basic defence against all web-request-based attack types. Once, you have understood how the data is encoded using protobuf for the platform you are testing, you can then modify and encode the parameters in the request to see if the backend misses the validation of any critical parameters. This opens the pathways to all sorts of bugs SQLi, XSS, SSRF, SSTI and Command Injection.

Why Protobuf, not JSON?

My thirst for understanding protobuf had quenched and I realized I learned something awesome and valuable today. Then I got thinking, what are the advantages of protobuf over JSON. So, I turned back to my browser and googled “protobuf vs JSON” and I was directed to this beautiful article by Anna Jones at bizety.com:

https://www.bizety.com/2018/11/12/protocol-buffers-vs-json/

It turns out that through multiple real world tests conducted by some renowned tech companies, protobuf was observed to be twice as fast compared to JSON. Google says that protobuf is 20 to 100 times faster than XML.

But, does that mean that everyone should right away drop JSON and start using protobuf for their data transfer between the fronted and the backend. For me as a security researcher that would be bad news because JSON helps you understand the data that is being sent to and from the server aiding in your understanding of the application at hand.

This blog on codeclimate.com provides five reasons when JSON makes better sense for developers as well. I am mentioning them over here verabtim:

“There do remain times when JSON is a better fit than something like Protocol Buffers, including situations where:

You need or want data to be human readable
Data from the service is directly consumed by a web browser
Your server side application is written in JavaScript
You aren’t prepared to tie the data model to a schema
You don’t have the bandwidth to add another tool to your arsenal
The operational burden of running a different kind of network service is too great”

It Helps To Be Armed With Knowledge

As security researches it is important to know these bits and pieces of so many different technologies. You never know the next platform you pick for bounty hunting or pentesting may very well be using protobuf and if you have taken time out in the past to understand this protocol, you can jump right into the exploitation phase and skip the learning curve.

Fin.

Side note: I wrote this blog because I realized that teaching a subject to others is the only way you can know whether you truly understand it. I love to learn, but I am bad at retaining stuff in my brain. Writing about the subject I just learned helps me retain the information longer.

What The Heck is CVSS – Part I

Amey Anekar — Sun, 24 Nov 2019 12:31:54 +0000

We have all seen the CVSS score for vulnerabilities listed on the National Vulnerability Database (NVD) when researching vulnerabilities. It is a numeric value between 0 to 10 and comes with a qualitative description of the score such as Low, Medium, High Or Critical. Because of the qualitative description of the score, I never bothered to dig deeper into how this was calculated.

TL;DR:

The Common Vulnerability Scoring System (CVSS) captures the principal technical characteristics of software, hardware and firmware vulnerabilities.
Its outputs include numerical scores indicating the severity of a vulnerability relative to other vulnerabilities
Rest is Mathematics. You cannot tl;dr Mathematics. ?

So, what exactly is CVSS?

The organization maintaining the specification for CVSS is the Forum of Incident Response and Security Teams (FIRST). FIRST defines CVSS as follows:

The Common Vulnerability Scoring System (CVSS) captures the principal technical characteristics of software, hardware and firmware vulnerabilities. Its outputs include numerical scores indicating the severity of a vulnerability relative to other vulnerabilities
first.org

Corporations, organisations and individuals interacting with Technology are dealing with vulnerabilities on a daily basis. With so many different minds looking at vulnerabilities subjectively, there are bound to be inconsistencies in the way severity of the vulnerabilities is perceived. CVSS eliminates the need of this subjectivity and defines clear, unambiguous parameters to quantify the severity of a vulnerability that all can agree upon. Other than providing a framework to calculate the intrinsic severity of the vulnerability (Base Metrics), it also provides an optional scoring system to gauge how severity modifies with time (Temporal Metrics) and environment of the organization (Environmental Metrics) that is assessing the vulnerability.

Why Should You Understand CVSS?

Well, what’s the practical use of understanding the methodology to derive CVSS score? I too had this notion until I read the actual specification myself and came to appreciate the beauty of this scoring system. I reckon that every security researcher, information security personnel or IT personnel should understand this scoring system well because this scoring system not only helps you understand the intrinsic severity of the vulnerability but it also provides a way to modify the severity level specific to your environment. Based on your organisation or application environment, the score might go higher or lower than the intrinsic score. This will help you to better prioritise remediation actions.

Let’s Dig Deeper into CVSS

CVSS parameters are broadly classified in the following three groups:

Base Metrics: represent the intrinsic characteristics of a vulnerability that are constant over time and across user environments. The score derived from these metrics is called the Base Score and this is the score displayed on vulnerability publishing websites such as NVD.

Temporal Metrics: reflect the characteristics of a vulnerability that may change over time but not across user environments. The score derived from these metrics is called the Temporal Score.

Environmental Metrics: represent the characteristics of a vulnerability that are relevant and unique to a particular user’s environment. The score derived from these metrics is called the Environmental Score.

I am going to cover Base Metrics in this part and leave Temporal and Environment Metrics for another day.

Understanding Base Metrics:

The Base Metric group is further classified into three sub-groups:

Exploitability
Impact
Scope

The four metrics available within the Exploitability sub-group are:

1. Attack Vector (abbreviated AV)

This metric describes how remote or local an attacker should be in order to exploit this vulnerability.

The below table indicates the possible values for this metric, its abbreviation, description and the numeric value that is used while calculating the CVSS score.

Value	Abbreviated	Description	Numeric Value
Network	N	Can be exploited over the network and beyond the local area network	0.85
Adjacent	A	Can be exploited over local area network	0.62
Local	L	Can be exploited only if attacker has logical access to the vulnerable system or component	0.55
Physical	P	Can be exploited only if the attacker has physical access to the vulnerable system	0.2

2. Attack Complexity (abbreviated AC)

This metric describes the conditions beyond the attacker’s control that must exist in order to exploit the vulnerability.

Value	Abbreviated	Description	Numeric Value
Low	L	Specialized conditions are not required to exist or attacker can exploit default installation of the component or system	0.77
High	H	One or more special conditions should be met for the attacker to exploit this vulnerability	0.44

3. Privileges Required (abbreviated PR)

This metric describes the level of privileges an attacker must possess before successfully exploiting the vulnerability.

Value	Abbreviated	Description	Numeric Value
None	N	Attacker requires no privileges on the vulnerable system or component	0.85
Low	L	Attacker requires non-administrative user capabilities on the vulnerable system or component	0.62
High	H	Attacker requires administrative privileges over the vulnerable system or component	0.27

4. User Interaction (abbreviated UI)

This metric describes whether a victim or any benign user other than the attacker should interact with the system to successfully exploit the vulnerability.

Value	Abbreviated	Description	Numeric Value
None	N	The vulnerable system can be exploited without interaction from any user.	0.85
Required	R	Successful exploitation of this vulnerability requires a user to take some action before the vulnerability can be exploited.	0.62

The Exploitability sub-group provides an Exploitability sub-score (ESS) which is calculated as follows:

ESS = 8.22 × Attack Vector × Attack Complexity × Privileges Required × User Interaction

Next, we move on to the Impact sub-group.

The three metrics available within the Impact sub-group are:

1. Confidentiality (abbreviated C)

This metric measures the impact to the confidentiality of the information resources managed by the vulnerability component due to a successfully exploited vulnerability.

Value	Abbreviated	Description	Numeric Value
High	H	All information is disclosed and/or some information that is disclosed is critical and/or attacker has complete control over what information can be disclosed.	0.56
Low	L	Some information is disclosed to the attacker and/or information that is disclosed is not critical and/or attacker has no control over what information can be disclosed.	0.22
None	N	No information is disclosed to the attacker.	0

2. Integrity (abbreviated I)

This metric measures the impact to integrity of information that resides on a successfully exploited system.

Value	Abbreviated	Description	Numeric Value
High	H	There is a total loss of integrity, or a complete loss of protection.	0.56
Low	L	Modification of data is possible, but the attacker does not have control over the consequence of a modification, or the amount of modification is limited.	0.22
None	N	No information can be modified by the attacker.	0

3. Availability (abbreviated A)

This metric measures the impact to the availability of the impacted component resulting from a successfully exploited vulnerability

Value	Abbreviated	Description	Numeric Value
High	H	There is a total loss of availability, resulting in the attacker being able to fully deny access to resources in the impacted component.	0.56
Low	L	Performance is reduced or there are interruptions in resource availability. Attacker does not have the ability to completely deny service to legitimate users.	0.22
None	N	There is no impact to availability within the impacted component.	0

The Impcat sub-group provides an Impact sub-score (ISS) which is calculated as follows:

ISS = 1 - [(1 - Confidentiality) × (1 - Integrity) × (1 - Availability)]

Scope Metric (Abbreviated S)

Scope is not a sub-group but a metric in itself with no numeric value.

Scope describes whether the attacker can access or impact resources, components or systems beyond the vulnerable component or system after successful exploitation of the vulnerability.

Value	Abbreviated	Description	Numeric Value
Changed	C	An exploited vulnerability can only affect resources managed by the vulnerable component or system.	See Below
Unchanged	U	An exploited vulnerability can affect resources other than those managed by the vulnerable component or system.	See Below

Scope metric does not have a direct numeric value. It rather causes changes to the Impact & Exploitability sub-scores as follows:

Changes to ESS
- If Scope is defined as Changed, ‘Privileges Required’ metric’s numeric value are changed as follows:
  - Low: 0.62 → 0.68
  - High: 0.27 → 0.5
- No changes are made to ESS if Scope is Unchanged

Changes to ISS
- If Scope is defined as Changed:
  Updated ISS = 7.52 ×(ISS- 0.029) – 3.25 ×(ISS- 0.02)¹⁵
- If Scope is defined as Unchanged:
  Updated ISS = 6.42 × ISS

Now that we have got the ESS & ISS, let’s understand the base score calculation. The final Base Score calculation formula differs based on Scope values and is as follows:

If Scope is Unchanged:

Base Score = Roundup(Minimum[(ISS + ESS), 10])

If Scope is Changed:

Base Score = Roundup(Minimum[1.08 ×(ISS + ESS), 10])

where Roundup( ) returns the smallest number, specified to 1 decimal place, that is equal to or higher than its input. For example, Roundup (4.02) returns 4.1; and Roundup (4.00) returns 4.0.

As you can see, a lot of information goes into calculating the Base Score. Hence, there needs to be a concise way of representing this information for the sake of brevity. That’s where the vector string notation comes into play.

Vector String Notation:

Excerpt From NVD

Let us look at an example of a CVSS vector string to understand it:

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

A vector notation starts with the words ‘CVSS’ followed by a colon and the version number of the specification, the latest being 3.1.

Following this are tuples containing the metric abbreviation for all base metrics and their corresponding value abbreviation separated by colons ( : ). Tuples are separated by slashes (/). For a vector string to be valid, it should contain all the base metrics. So, let’s decode the above string based on what we learnt earlier about the Base Metrics.

CVSS:3.1 = CVSS version is 3.1
AV:N = Attack Vector is Network
AC:L = Attack Complexity is Low
PR:N = Privileges Required is None
UI:N = User Interaction is None
S:U = Scope is Unchanged
C:H = Confidentiality Impact is High
I:H = Integrity Impact is High
A:H = Availability Impact is High

Let’s Calculate The Base Score For The Above Vector String

First, The Exploitability Sub-score:

ESS = 8.22 × Attack Vector × Attack Complexity × Privileges Required × User Interaction

First, let’s replace the variables with the metric value tuples from the vector string:
ESS = 8.22 × AV:N × AC:L × PR:N × UI:N

Now, let’s enter the numeric values for these metrics:
ESS = 8.22 × 0.85 x 0.77 x 0.85 x 0.85

ESS = 3.887042775

Now, for The Impact Sub-score:

ISS = 1 - [(1 - Confidentiality) × (1 - Integrity) × (1 - Availability)]

Replacing the variables:
ISS = 1 - [(1 - C:H) × (1 - I:H) × (1 - A:H)]

Entering the numeric values:
ISS = 1 - [(1 - 0.56) × (1 - 0.56) × (1 - 0.56)]

ISS = 0.914816

Now, let’s go for scope adjustment:

Scope = Unchanged
Hence, ESS remains the same.

And ISS
= 6.42 × ISS
= 6.42 × 0.914816
= 5.87311872

Now, finally, the Base Score:

Since Scope = Unchanged,

Base Score
= Roundup(Minimum[(ISS+ ESS), 10])
= Roundup(Minimum[(5.87311872+3.887042775), 10])
= Roundup(Minimum[9.760161495,10])
= Roundup(9.760161495)
= 9.8

So, that’s it. We have got the CVSS for our vulnerability.

Base Score = 9.8

The Vector String mentioned above was taken from CVE-2019-15107. Check out this NVD link to see if our numbers match.

Understanding The Qualitative Severity Rating Scale

CVSS provides the following qualitative rating scale to have a textual representation of the numeric values:

CVSS Score	Qualitative Rating
0.0	None
0.1 – 3.9	Low
4.0 – 6.9	Medium
7.0 – 8.9	High
9.0 – 10.0	Critical

Now, you may be wondering, how on earth are you going to perform this calculation manually for every vulnerability that you are assessing. Don’t worry. FIRST team has put up an awesome calculator on their website for you: https://www.first.org/cvss/calculator/3.1

Go ahead and give it a try.

The next time you look up a vulnerability on NVD, you would be better equipped to understand the severity of the same and would be in a position to gauge which vulnerability should be a priority to for you.

I will cover the Temporal Metrics & Environmental Metrics in a later article.