A Creative Utility for Transforming URLs into "oooooooooo"
Written on
The concept behind this tool is reminiscent of a compact chain platform, designed to elongate your URL into a string that resembles <code>ooooooooo</code>. Intrigued by how this is accomplished, I delved into the source code, which is both fascinating and innovative in its execution.
1. Prerequisite Knowledge
To begin, characters are converted into a UTF-8 array, where each character is represented by a unique value. For instance, the UTF-8 array for the string "http" is <code>[104, 116, 116, 112]</code>.
function toUTF8Array(str) {
var utf8 = [];
for (var i = 0; i < str.length; i++) {
var charcode = str.charCodeAt(i);
if (charcode < 0x80) utf8.push(charcode);
else if (charcode < 0x800) {
utf8.push(0xc0 | (charcode >> 6),
0x80 | (charcode & 0x3f));}
else if (charcode < 0xd800 || charcode >= 0xe000) {
utf8.push(0xe0 | (charcode >> 12),
0x80 | ((charcode >> 6) & 0x3f),
0x80 | (charcode & 0x3f));
}
else {
i++;
charcode = ((charcode & 0x3ff) << 10) | (str.charCodeAt(i) & 0x3ff);
utf8.push(0xf0 | (charcode >> 18),
0x80 | ((charcode >> 12) & 0x3f),
0x80 | ((charcode >> 6) & 0x3f),
0x80 | (charcode & 0x3f));
}
}
console.log(utf8, 'utf8');
return utf8;
}
The code above deals with encoding, while the following code handles decoding, transforming the UTF-8 array back into a string. For example, the UTF-8 array <code>[99, 111, 109]</code> corresponds to the string <code>com</code>.
function Utf8ArrayToStr(array) {
var out, i, len, c;
var char2, char3;
out = "";
len = array.length;
i = 0;
while (i < len) {
c = array[i++];
switch (c >> 4) {
case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7:
out += String.fromCharCode(c);
break;
case 12: case 13:
char2 = array[i++];
out += String.fromCharCode(((c & 0x1F) << 6) | (char2 & 0x3F));
break;
case 14:
char2 = array[i++];
char3 = array[i++];
out += String.fromCharCode(((c & 0x0F) << 12) |
((char2 & 0x3F) << 6) |
((char3 & 0x3F) << 0));
break;
}
}
return out;
}
The Number object can be represented as a string using <code>4</code>. The method ToString is frequently used, though it has limited scenarios for parameter input. This parameter, known as radix, is optional and specifies the base for conversion, which can range from <code>2 to 36</code>. If omitted, the default is the decimal system.
n.toString(4)
You can pad specified characters to the left of a string until it reaches a designated length. The syntax is <code>str.padStart(targetLength [, padString])</code>.
TargetLength: required, specifies the minimum expected string length. If the current string is shorter, it will be padded on the left with padString until the desired length is achieved.
PadString: optional, indicates the character used for padding; the default is <code>space</code>.
str.padStart(4, '0')
2. URL Encoding/Decoding
Next, we dive into the core logic behind URL encoding. The steps are as follows:
Convert to a UTF-8 array
Transform into base 4, padding with <code>0 to 4</code> digits on the left
Split into a string array and map to different forms of <code>o</code>
Reassemble into a string, resulting in the converted URL
let unversioned = this.toUTF8Array(url)
.map(n => n.toString(4).padStart(4, "0"))
.join("").split("")
.map(x => this.enc[parseInt(x)])
.join("");
Two critical points need clarification. First, mapping to different forms of <code>o</code> means that the resulting "o" is not a singular type, but rather four distinct types. However, their visual appearance is remarkably similar, which can be demonstrated through the encoded characters of <code>encodeURI</code>.
encodeURI('o-?-?-?')
// o-%CE%BF-%D0%BE-%E1%B4%8F
This explains why the conversion to base <code>4</code> includes zero-padding to four digits on the left. The <code>this.enc</code> defined in the code is as follows, as there are only four variations of <code>“o”</code>, and the base 4 system yields only <code>0, 1, 2, 3</code>. Thus, the converted UTF-8 characters correspond directly to these specific types of <code>“o”</code>.
enc = ["o", "?", "?", "?"]
The final effect example illustrates the conversion of the "http" character:
- Convert to UTF-8 array: <code>[ 104, 116, 116, 112 ]</code>
- Convert to base 4 and pad with zeros: <code>['1220', '1310', '1310', '1300']</code>
- Split into a string array: <code>['1', '2', '2', '0', '1', '3', '1', '0', '1', '3', '1', '0', '1', '3', '0', '0']</code>
- Map to different forms of o: <code>['?', '?', '?', 'o', '?', '?', '?', 'o', '?', '?', '?', '?', '?', '?', '?', '?']</code>
- Reassemble into the final string, yielding: <code>???o???o???o??oo</code>
Thus, the entire encoding process is complete. After reviewing this, do you find the design quite impressive? Following encoding, we move to decoding, which involves reversing the previous steps to restore the original URL. It is worth noting that four characters are parsed each time, using <code>parseInt</code> in base 4 format.
let b4str = ooo.split("").map(x => this.dec[x]).join("");
let utf8arr = []
for (let i = 0; i < b4str.length; i += 4)
utf8arr.push(parseInt(b4str.substring(i, i + 4), 4));return this.Utf8ArrayToStr(utf8arr);
Finally
Now that you've gained insight, it's time to put this knowledge to use.
Thank you for reading. I look forward to your continued engagement and exploring more quality articles.