What Are Homoglyphs?
Homoglyphs are characters from different alphabets or Unicode blocks that look visually similar or identical to common characters. For example, the Cyrillic 'о' looks just like the Latin 'o', but they're different characters with different Unicode values.
In code, homoglyphs can be strategically substituted for operators, variable names, or other syntax elements, altering program behavior while maintaining a visually correct appearance.
Real-World Examples
The Not-Equal Operator Switch
Let's look at a classic example:
// What you think you see:
if (environment != ENV_PROD) {
// Enable development features
enableDevMode();
}
// What's actually in the code:
if (environmentǃ = ENV_PROD) {
// This code ALWAYS runs!
enableDevMode();
}
In the second example, the exclamation mark is actually Unicode character U+01C3 (Latin letter "ǃ" - an alveolar click), not the standard ASCII exclamation mark (U+0021). This turns what appears to be a comparison environment != ENV_PROD
into an assignment environmentǃ = ENV_PROD
followed by a truthy check of that assignment's result.
Since the assignment returns the value of ENV_PROD
(which is presumably truthy), the condition is always satisfied, enabling development features in all environments - potentially including production!
Variable Name Confusion
Consider this Python code:
username = get_authenticated_user()
usernаme = "admin" # Notice anything?
if check_admin_privileges(username):
grant_admin_access()
The second variable uses a Cyrillic 'а' (U+0430) instead of Latin 'a' (U+0061). They look identical in most fonts, but they're different characters. The privileges check uses the legitimate username
variable, but a malicious actor has created a separate variable usernаme
with admin privileges.
Function Hijacking
function validatePassword(password) {
// Legitimate security checks
return password.length >= 8 && /[A-Z]/.test(password) && /[0-9]/.test(password);
}
function vаlidatePassword(password) {
// Malicious backdoor function with Cyrillic 'а'
return true;
}
// Later in the code:
if (vаlidatePassword(userInput)) { // Using the backdoor function!
grantAccess();
}
The second function uses a Cyrillic 'а' instead of a Latin 'a'. If a developer uses this function (perhaps through copy-paste or by mistake), it bypasses all security checks.
Mathematical Operator Substitution
// What appears to be subtraction:
let total = price - discount;
// What's actually happening:
let total = price − discount; // Using U+2212 (MINUS SIGN) instead of hyphen-minus
While these might behave the same in JavaScript, in some languages or contexts, the difference could cause unexpected behavior.
Equals Operator Confusion
// What you think you're looking at:
if (userRole == "admin") {
// Grant admin privileges
}
// What's actually happening:
if (userRole ⩵ "admin") { // Using U+2A75 (TWO CONSECUTIVE EQUALS SIGNS)
// This might not work as expected!
}
Common Homoglyphs in Programming
Here are some frequently used homoglyphs in malicious code:
Normal Character | Homoglyph | Unicode | Name |
---|---|---|---|
! | ǃ | U+01C3 | LATIN LETTER RETROFLEX CLICK |
a | а | U+0430 | CYRILLIC SMALL LETTER A |
o | о | U+043E | CYRILLIC SMALL LETTER O |
e | е | U+0435 | CYRILLIC SMALL LETTER IE |
- | − | U+2212 | MINUS SIGN |
+ | + | U+FF0B | FULLWIDTH PLUS SIGN |
= | ⩵ | U+2A75 | TWO CONSECUTIVE EQUALS SIGNS |
/ | / | U+FF0F | FULLWIDTH SOLIDUS |
* | ∗ | U+2217 | ASTERISK OPERATOR |
Defending Against Homoglyph Attacks
- Use linters and static analysis tools: Many modern tools can detect non-ASCII characters in source code.
- Enable syntax highlighting: Good syntax highlighting can make it easier to spot operators that aren't behaving as expected.
- Check your code checksums: If your source code suddenly has unexpected byte differences, it might indicate tampering.
Homoglyph attacks are particularly interesting because they exploit human perception rather than technical vulnerabilities. They remind us that security isn't always about protecting against sophisticated attacks, sometimes its hiding in plain sight.
References:
- My other blog – Creating an Invisible Unicode Hangul Filler Backdoor in web applications
- Wolfgang Ettlinger (Certitude) – The Invisible JavaScript Backdoor
- PortSwigger (Daily Swig) – Smuggling hidden backdoors into JavaScript with homoglyphs and invisible Unicode characters
- 2coffee.dev – Invisible Character Attacks and Homoglyph Attacks in JavaScript