Different programming languages use different methods of “escaping” such characters so they can be interpreted as data rather than carrying their usual syntactic meaning. And therein lies a problem for programmers. If you are dealing with multiple programming languages or systems, you sometimes need a way of encoding text so that ALL of the disparate systems it passes through will treat it as data and not mistake some character embedded within it as a signal to terminate the string or whatever.
Here were my requirements:
- Must be able to faithfully encode ANY text that’s valid in Unicode and then decode it to be identical to the original.
- After being encoded, the string must consist of nothing but the ASCII alphabetic letters (a-z, A-Z) and digits (0-9), plus a single “escape” character.
- An application must be able to designate ANY Unicode character code as the escape (except the ASCII letters and digits), so as to avoid collisions with whatever languages or systems it needs to be compatible with.
- It should not matter if the chosen escape character happens to occur in the text that is to be encoded.
- Any program must be able to decode the text without having prior knowledge of what character was used as the escape during encoding.
- The encoded text should be basically readable by a human.
I came up with a method that satisfies all of these requirements. I call it ZvvyCode. (That’s two v’s in “Zvvy”. I pronounce it “zivy”.)
The method is actually pretty simple. An encoded string consists only of the ASCII letters and digits (a-z, A-Z, 0-9) plus one special escape character that is designated at the time of encoding and may be anything other than the ASCII letters and digits. An encoded string can be decoded without advance knowledge of the escape character, because the escape character is read from the beginning of the encoded string.
The original ASCII letters and digits are kept in their original form. ALL other characters, including spaces, punctuation, extended ASCII codes, Unicode characters, etc. are converted to their hexadecimal character codes bounded on each end by the designated escape character. The encoder function accepts the original text and a numeric character code to be used as the escape. The encoded string begins with an 8-character sequence consisting of two instances of the escape character, the letters “zvvy”, and two more instances of the escape character.
The decoder only needs to be passed the encoded text. It looks at the first eight characters to determine (1) whether the string is, in fact, a ZvvyCode string, and (2) what character to interpret as the escape character for the decoding process. If the first eight characters do not match the ZvvyCode pattern, then the string is considered to not be ZvvyCode and is returned unchanged. Otherwise, the remaining string is decoded into its original form.
Example: Suppose you choose a slash as the escape character to encode this text:
I said, “Joe’s bag is 3/4 full!”
The ZvvyCode encoding would be:
Note that it doesn’t matter that the original text has a slash in it, even though the slash is to be used as the escape character. The slash is simply converted to /2f/
I use the characters “zvvy” simply because it’s a short sequence of letters that is very unlikely to appear in natural text. It’s even more unlikely that this sequence would naturally appear bounded on each end with two copies of some other character. Hence it’s a pretty safe bet that anything that begins with the 8-character ZvvyCode starting pattern is, in fact, a ZvvyCode string.