Creating a ROT13 Encoder/Decoder in Rust

Nov 1st, 2020

7 min read

Programming

Dipping my toes into Rust by making a CLI that will encode and decode ROT13 cyphers.

I've had my eye on Rust for a while, ever since it was a very new language. However, I haven't had much reason to use it. As a web dev, I haven't found a project that I would need to use Rust for over JavaScript. I've been watching WebAssembly closely, but again, I still needed a project.

One of my favorite time-wasting Google searches is "What to build in x language", and "Best projects for x language". Someone on Reddit mentioned building command-line apps in Rust and since I've been building some random little utility CLIs for myself in Python and JavaScript, I thought I'd try one in Rust. I'd also read that one of the biggest advantages to Rust is that you can write a CLI and then compile it to WebAssembly to use as part of a web interface. I love that idea.

An author I follow, Chuck Wendig, likes to put spoilers for shows in ROT13, and while there are websites that will do the deciphering, I thought I'd make my own little command-line ROT13 parser in Rust.

I'm using the "Command Line Applications in Rust" book as a jumping-off point for my CLI app. Also, I've had Rust installed on my computer for a while now, so I'm not going over how to do that.

Getting Started

First, I'm going to create my new project by running cargo new rot13.

I know I'm going to use structopt to parse the text I want to encode/decode, so I add it to my .toml file dependencies.

[dependencies]
structopt = "0.3"

I add structopt to my main.rs file, as well as a struct for my command-line parameters. Right now I just want to take the string that I want to encode or decode.

use structopt::StructOpt;

#[derive(StructOpt)]
struct Cli {
    message: String
}

Next, I replace the default code in main() to parse the argument and print it.

fn main() {
    let message = Cli::from_args().message;
    println!("{}", message);
}

Running the code now with cargo run -- test should print out "test". For a full sentence, the string should be surrounded by quotes cargo run -- "this is a test".

I'm printing out everything along the way as I go, since I'm still very new to the Rust language. This probably isn't super necessary since Rust will tell me where my errors are, but I like to go step-by-step to be sure I'm learning it properly, and don't end up with frustrating errors that cause me to quit.

My next step is making a new function that will accept the provided string and return the ROT13 of the string. First, I'm going to create the function and just have it print out the string again.

fn main() {
    let message = Cli::from_args().message;
    println!("{}", rot13(message));
}

fn rot13(message: String) -> String {
    message
}

That works. Now it's time to write the part that actually does the encoding and decoding.

I write a for loop to iterate over and print the characters in the message, and return the full message at the end.

fn rot13(message: String) -> String {
    for c in message.chars() {
        println!("{}", c);
    }
    message
}

And to return the Unicode charcode:

println!("{}", c as u32);

In this algorithm, I'm taking the character code of the current character, subtracting the character code for "a", adding 13, modul0 26, then adding back the character code for "a", getting the correct lowercase character code for the cipher.

let a_code = 'a' as u32;
let rotcode = ((charcode - a_code + 13) % 26) + a_code;

Then, I get the character from the rotated character code. At the top of the file, I've declared use std::char;.

char::from_u32(rotcode).unwrap();

My ROT13 algorithm

Here, I've added if cases for both lowercase and uppercase characters, as well as other characters that will simply pass through the current character. I've also added a mutable string at the top of the function. After each charcode rotation, we push the new character to the string, which is returned back to main(). This function works to encode and decode ROT13 strings.

fn rot13(message: String) -> String {
    let mut coded_string = String::from("");
    for c in message.chars() {
        let charcode = c as u32;
        if c.is_lowercase() {
            let a_code = 'a' as u32;
            let rotcode = ((charcode - a_code + 13) % 26) + a_code;
            coded_string.push(char::from_u32(rotcode).unwrap());
        } else if c.is_uppercase() {
            let a_code = 'A' as u32;
            let rotcode = ((charcode - a_code + 13) % 26) + a_code;
            coded_string.push(char::from_u32(rotcode).unwrap());
        } else {
            coded_string.push(c);
        }
    }
    coded_string
}

While my code works, I want to see if there is a better way of going about this. I found a "more rustacean way" of implementing the cipher on Rosetta Code.

A Rosetta Code example

fn rot13(string: String) -> String {
     let alphabet = [
         'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
         'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'
     ];
     let upper_alphabet: Vec<_> = alphabet.iter().map(|c| c.to_ascii_uppercase()).collect();
 
     string.chars()
           .map(|c| *alphabet.iter()
                             .chain(alphabet.iter())
                             .chain(upper_alphabet.iter())
                             .chain(upper_alphabet.iter())
                             .skip_while(|&x| *x != c)
                             .nth(13)
                             .unwrap_or(&c))
           .collect()
}

I understand some of this, but not all of it. I want to pick this apart so I fully understand how it works before I implement it.

alphabet is an array of the lowercase alphabet. upper_alphabet seems to be a vector created from the alphabet array. I had to look up what Vec<_> means. It creates a vector and infers what the type in the vector should be. .iter() creates the iterable object, and .map() does the iterating.

The code inside of map says that for every alphabet character "c", map will cast it to its uppercase counterpart. .collect() transforms the iterator into a collection. This is something that I've used in some sample projects but never really studied. Here is the documentation.

Now we have the array alphabet of lowercase letters, and the vector upper_alphabet of uppercase numbers.

The code uses string.chars(), the same as my code, but then calls .map() on the chars iterator. I should've done that in my code instead of using for c in message.chars().

Next, we have the map function, iterating over each string character "c".

At this point, I've spent hours trying to understand why *alphabet exists. I understand that it's dereferencing either alphabet or the result of the entire function chain after it, but I don't know why it requires dereferencing. I have an inkling that it's possibly dereferencing because of the &c in .unwrap_or(&c)), but I'm not 100% sure. I found a cheatsheat of "Rust Converting Bytes, Chars, and Strings" that has a similar syntax, but no real explanation.

Skipping over the part that I've been researching, we take the alphabet iterable and chain onto it another alphabet iterable, followed by two upper_alphabet iterables. At this point I realized what this is doing. It's making a long iterable, searching for the first instance of the character, and then skipping forward 13 places. This way, there's no complicated math using modulo and offsets.

.skip_while(|&x| *x != c) goes through the iterator until it finds the first case where x == c. According to the skip_while documentation,

skip_while() takes a closure as an argument. It will call this closure on each element of the iterator, and ignore elements until it returns false.

Then, it returns the nth element of the iterator, in this case, the 13th. I also studied the documentation for .nth():

Note that all preceding elements, as well as the returned element, will be consumed from the iterator. That means that the preceding elements will be discarded, and also that calling nth(0) multiple times on the same iterator will return different elements.

Finally, the function calls .unwrap_or(&c), which returns either the Some value of the iterator, if the character was found in the iterator, or the character if it's not part of the English alphabet. Then, .collect() turns the iterator into a collection. Usually with .collect() you need to specify the type of the collection, but that doesn't happen here. I wonder if that's because the return type of the function specifies the type.

Adding my own twist

I decided that I didn't like the alphabet array at the top of the rot13 function, so I've rewritten it to iterate over the range of lowercase character codes and map them to the correct chars. I've also done the same with the upper_alphabet vector. This is my new rot13 function.

fn rot13(message: String) -> String {
    let alphabet: Vec<char> = (97..123).map(|n| char::from_u32(n).unwrap()).collect();
    let upper_alphabet: Vec<char> = (65..91).map(|n| char::from_u32(n).unwrap()).collect();

    message.chars()
        .map(|c| *alphabet.iter()
            .chain(alphabet.iter())
            .chain(upper_alphabet.iter())
            .chain(upper_alphabet.iter())
            .skip_while(|&x| *x != c)
            .nth(13)
            .unwrap_or(&c))
        .collect()
}

Finishing up

I haven't written any tests or anything yet, but I'm very impatient to build my first Rust program! I run cargo build --release because I wanted to see what a release version of my CLI app would be like. It takes surprisingly long for such a small program. Once it's finished compiling, I navigate to target/release and can run my program by simply typing ./rot13 "Whatever I want to encode!". It works!

I can also install it on my Ubuntu system by running sudo install ./rot13 /usr/bin/ on the command line. Now, I can run rot13 "Encode this" from any directory!

Now that I have a working program I'd like to add a few tests, and I'd like to try compiling to WASM and creating a web-based ROT13 converter. But for now, I've spent an entire Sunday on this project and football is on, so I'm going to call it quits for today.