Server-side Rendering and API Calls in Rust

Nov 10th, 2020

12 min read

Programming

Making one of my useless websites in Rust, with server-side rendering and parsing JSON from external APIs.

My first intentionally useless website is Jvxvcrqvn, which grabs a random Wikipedia entry and encodes it in ROT13. I had earlier made a ROT13 encoder in Rust, and wanted to make a full-fledged site out of it.

I had a few guidelines for myself when I first came up with the project. First, it had to be rendered server-side instead of doing the encryption in the browser. Second, it had to use an external API to get the data. And third, I wanted to use Rust for the whole thing. It would've been much easier to do this in Node, but I wanted to see if I could do it.

So my general plan was:

Have a main page that explains the site. Of course, it'll all be in ROT13, but if you take the time to translate it, it'll explain everything.
Have a second route that will pull a random Wikipedia article's text and display it encoded.

In hindsight, I could've done it all on one page, but I really wanted to see how routing would work in Rust.

If you want to see the code for the entire project, you can find it on Github.

Index

Setting up routes with Iron
Querying the Wikipedia API and parsing JSON with reqwest and serde
Removing HTML tags with scraper and regex
Adding in the ROT13 module
HTML templating with Tera
Deploying on Heroku

Setting up routes with Iron

First off, I need to say that if I was doing this again, I probably wouldn't use Iron for a few reasons. It seems to be somewhat abandoned, the repo hasn't been updated in over a year, and its main website is broken. I didn't see until I was well underway that r/rust says it's no longer maintained. Also, there's a very limited amount written about it. I ended up relying on the documentation a lot to get through this.

Ultimately, Iron was usable for this project, but I would almost 100% go with Rocket in the future. In fact, I'll probably make another intentionally useless website with Rocket soon.

If you do wish to flail with Iron, here's how I set up some routes.

In cargo.toml:

[dependencies]
iron = "0.6.*"
router = "*"

In main.rs:

use std::path::Path;
use iron::prelude::*;
use iron::status;
use iron::headers::ContentType;
use router::Router;

fn main() {
    let mut router = Router::new();
    router.get("/", handler, "index");
    router.get("/enaqbz", random_handler, "random");
    let port = env::var("PORT").unwrap_or_else(|_| "3000".to_string());
    let addr = format!("0.0.0.0:{}", port);

    Iron::new(router).http(addr).unwrap();
}

I set up the router with two GET requests, one for "/", and one for my random page, called "/enaqbz" in ROT13. Each route calls a function that will return a string at first, and eventually an HTML template.

Originally, I had addr pointing directly to "127.0.0.1:3000", but it turns out that Heroku wants you to point apps to "0.0.0.0" and have a dynamic port, so it makes sense to do it this way from the start. We'll get the port from Heroku's environment variable, and if that fails, default to port 3000. Then, concat to the address string, and finish setting up the router with Iron.

This is basically how I set up my early routes to see if they're even working. I had a button on a simple index.html page that would route to /enaqbz. Eventually, index.html will be replaced with a template page.

fn handler(_: &mut Request) -> IronResult<Response> {
    Ok(Response::with((status::Ok, Path::new("pages/index.html"))))
}

fn random_handler(_: &mut Request) -> IronResult<Response> {
    Ok(Response::with((status::Ok, "Hello!")))
}

Note that std::path::Path won't be necessary once we start templating, but it is necessary for grabbing a static file now.

This bit of code should be enough to route between two pages, one displaying a static HTML file, and one responding with "Hello!".

Querying the Wikipedia API and parsing JSON with reqwest and serde

To deal with the API calls, I decided to go with reqwest. I think I made the right decision to use reqwest, since it seems to be one of the most popular HTTP clients for Rust.

I needed to add both reqwest and serde_json to my cargo.toml dependencies:

reqwest = { version = "0.10", features = ["json", "blocking"] }
serde_json = "1.0"

And then added use serde_json::Value; to the top of main.rs. Really, we just need the Value type because without it, Rust doesn't understand what's going on when we start trying to parse the JSON.

fn random_handler(_: &mut Request) -> IronResult<Response> {
    let text = get_json().ok().unwrap();
    Ok(Response::with((status::Ok, text)))
}

fn get_json() -> Result<String, reqwest::Error> {
    // Get a random Wikipedia article.
    let url = "https://en.wikipedia.org/w/api.php?action=query&list=random&rnnamespace=0&rnlimit=1&format=json";
    let json: Value = reqwest::blocking::get(url)?.json()?;
    Ok(json.to_string())
}

Here, I've added a new function get_json, that is doing the first JSON fetch from Wikipedia. Since reqwest is async by default, I had to add on the blocking feature. I didn't want to deal with any async API calls. We give json the type of serde_json::Value here, which will be important in the next step. Then, we return the Result and cast json.to_string() since the result is expecting a String.

In random_handler, I've called get_json and unwrapped the result, which should be the entire JSON string. Then, I replaced "Hello!" with the result from the API call. Now, the /enaqbz route returns the JSON from Wikipedia.

Now, getting a random article from Wikipedia is a two-step process, at least, that's where my research led me. So this first API call gets a title and page ID for a random article, but not the article text, which is what I want. Looking at the JSON structure, it's pretty easy to parse out now that we have the JSON as a serde Value.

fn get_json() -> Result<String, reqwest::Error> {
    // Get a random Wikipedia article.
    let url = "https://en.wikipedia.org/w/api.php?action=query&list=random&rnnamespace=0&rnlimit=1&format=json";
    let json: Value = reqwest::blocking::get(url)?.json()?;
    let title = json["query"]["random"][0]["title"].as_str().unwrap();
    let page_id = json["query"]["random"][0]["id"].as_u64().unwrap();
    
    Ok(json.to_string())
}

The Wikipedia API query for getting the article extract uses a "titles" query string, so I needed to take that title string and percent-encode it.

Yet another cargo.toml dependency, for percent-encoding:

percent-encoding = "2.1"

And added to the top of main.rs:

use percent_encoding::{utf8_percent_encode, AsciiSet, CONTROLS};

And here's what I added to get_json() to percent-encode the title:

    const FRAGMENT: &AsciiSet = &CONTROLS.add(b' ').add(b'(').add(b')');
    let encoded_title = utf8_percent_encode(title, FRAGMENT).to_string();

Now, we have everything we need to get the extracted article text from Wikipedia. The reason I used this endpoint, with &prop=extracts, is because it returns text with minimum HTML markup. There is still some markup that I had to remove, but I couldn't find an endpoint that would just return plain text.

fn get_json() -> Result<String, reqwest::Error> {
    // Get a random Wikipedia article.
    let url = "https://en.wikipedia.org/w/api.php?action=query&list=random&rnnamespace=0&rnlimit=1&format=json";
    let json: Value = reqwest::blocking::get(url)?.json()?;
    let title = json["query"]["random"][0]["title"].as_str().unwrap();
    let page_id = json["query"]["random"][0]["id"].as_u64().unwrap();

    const FRAGMENT: &AsciiSet = &CONTROLS.add(b' ').add(b'(').add(b')');
    let encoded_title = utf8_percent_encode(title, FRAGMENT).to_string();

    // Take the random Wikipedia article info and get the actual article text.
    let page_url = "https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&titles=".to_owned() + &encoded_title;
    let page_json: Value = reqwest::blocking::get(&page_url)?.json()?;
    let html = page_json["query"]["pages"][page_id.to_string()]["extract"].as_str().unwrap();

    Ok(html.to_string())
}

This is very close to the final form of this function. I've added the second API call, and fetched the JSON in the same way with reqwest. Finally, I grab the article HTML by traversing the JSON structure. It got a little confusing having to use the page ID for traversal, but since it's available through the first API call, it gets set to the variable page_id and used here.

Now, going to the route /enaqbz should result in a Wikipedia article rendered in preformatted text with HTML markup. It's rendered in text because we're not sending a content type to Iron yet, but we'll do that later.

Removing HTML tags with scraper and regex

The problem with leaving any HTML tags in the returned text is that processing it with my ROT13 function will also encode the tags, leaving behind a mess. Luckily, I found that scraper was a good DOM node parser. I'm also going to need regex.

cargo.toml:

scraper = "0.12"
regex = "1"

And the top of main.rs:

use scraper::Html;
use regex::Regex;

Here's the basic idea of what I want to do: Because I want to wrap each paragraph in a <p> tag in my template, I'm going to want a vector of strings, with each string being one paragraph.

This next part is really hard to explain, and if you're not used to traversing DOM nodes in JavaScript this might be somewhat hard to follow. I'm going to make comments in the code to try to go through it step-by-step.

This is the finished parse_text function.

fn parse_text(html: String) -> Vec<String> {
    // Use scraper::Html to parse the incoming string.
    let scraped_text = Html::parse_fragment(&html);
    
    // Add a new empty vector to hold the parsed strings.
    let mut vec: Vec<String> = Vec::new();
    
    // This is an empty string for concatenating the paragraphs.
    // Because we're going to be dealing with a lot of inline HTML tags,
    // we only want to push the string to the vector when there's a newline.
    let mut str = "".to_owned();
    
    //Iterate over the DOM node tree.
    for node in scraped_text.tree {
    
        // Each node is either a Text or an Element node.
        // Element nodes are HTML tags.
        // We just want text nodes.
        if let scraper::node::Node::Text(text) = node {
        
            // Cast the text node to a String.
            let text_node = &text.text.to_string();
            
            // Check for text nodes that begin with "\n",
            // which indicates a new paragraph.
            let re = Regex::new(r"^\n").unwrap();
            
            // If the regex matches, it's a new paragraph,
            // so push the temp string to the vector,
            // and reinitialize an empty string.
            if re.is_match(text_node) {
            
                // Here's where I'm calling my rot13 function,
                // which will be added next.
                vec.push(rot13(str));
                str = "".to_owned();
            } else {
                
                // If there's no newline found, concatenate
                // onto the current temp string.
                str += text_node;
            }
        }
    }
    
    // One last push of the remaining temp string contents to the vec.
    vec.push(rot13(str));
    
    // Return the vector.
    vec
}

Since a vector of strings is being returned instead of a single string, the Result of get_json will have to be updated when parse_text is called from that function.

fn get_json() -> Result<Vec<String>, reqwest::Error> {
    ...
    Ok(parse_text(html.to_string()))
}

If you remove the calls to rot13() in parse_text, joining the vector in random_handler should print out the article text without any HTML tags.

fn random_handler(_: &mut Request) -> IronResult<Response> {
    let text = get_json().ok().unwrap().join("\n\n");
    Ok(Response::with((status::Ok, text)))
}

Adding in the ROT13 module

I already had my ROT13 command line utility I'd written earlier. What I ended up doing is adding just the rot13 function to another file in this src directory. I called the file cipher.rs because at one point I thought I'd create another function that would run a Caesar cipher based on a random number, so the random Wikipedia articles would be in a random encoding. I ended up dropping that idea... for now.

cipher.rs:

use std::char;

pub fn rot13(message: String) -> String {
    let alphabet: Vec<char> = (97..123).map(|n| char::from_u32(n).unwrap()).collect();
    let upper_alphabet: Vec<char> = (65..91).map(|n| char::from_u32(n).unwrap()).collect();

    message.chars()
        .map(|c| *alphabet.iter()
            .chain(alphabet.iter())
            .chain(upper_alphabet.iter())
            .chain(upper_alphabet.iter())
            .skip_while(|&x| *x != c)
            .nth(13)
            .unwrap_or(&c))
        .collect()
}

And then add to main.rs:

mod cipher;
use cipher::rot13;

Now, if you run cargo run with rot13() added back to the parsing function, you should see enciphered text.

HTML templating with Tera

For templating, I decided to use Tera, which is used by Rocket. At this point, I was really regretting not using Rocket, but Tera was easy to use for this project. Tera has great documentation on its site.

This is my last addition to cargo.toml, so here's what all my dependencies look like:

[dependencies]
iron = "0.6.*"
router = "*"
reqwest = { version = "0.10", features = ["json", "blocking"] }
serde_json = "1.0"
percent-encoding = "2.1"
scraper = "0.12"
tera = "1.5"
regex = "1"

Ditto for the top of main.rs. Here's what that looks like:

mod cipher;

use std::env;
use iron::prelude::*;
use iron::status;
use iron::headers::ContentType;
use router::Router;
use serde_json::Value;
use percent_encoding::{utf8_percent_encode, AsciiSet, CONTROLS};
use scraper::Html;
use tera::{Context, Tera};
use regex::Regex;
use cipher::rot13;

I've added a templates directory to the root of my project, and inside, created index.html and random.html files. I'll show you the abridged random.html file. Check out the github repo for the full files.

<head>
    <title>{{ title }}</title>
</head>
<body>
    <main>
        <button onclick="location.href='/';">{{ back_button }}</button>
        <button onclick="location.href='/enaqbz'">{{ refresh_button }}</button>
        <h2>{{ title }}</h2>
        {% for para in content %}
        <p>{{ para }}</p>
        {% endfor %}
        <button onclick="location.href='/';">{{ back_button }}</button>
        <button onclick="location.href='/enaqbz'">{{ refresh_button }}</button>
    </main>
</body>

I have variables for title, back_button (to the front page), refresh_button, and content. I'm looping over the content variable, which is that vector of strings, which puts each paragraph into its own <p> tag. Pretty simple.

One of the last things to do is add Tera to the random_handler function and send the template variables to this HTML file.

First, get_json() needs to send along the encrypted title along with the vector of paragraph strings. This involves updating the returned types in the Result.

fn get_json() -> Result<(String, Vec<String>), reqwest::Error> {
    ...
    Ok((rot13(title.to_string()), parse_text(html.to_string())))
}

Now, we can remove the .join("\n\n") from the text in random_handler(), because we'll be sending along the vector to the template.

We're also using ContentType::html().0 from Iron to send along the correct mime type, so the page will actually display as an HTML page.

fn random_handler(_: &mut Request) -> IronResult<Response> {
    let text = get_json().ok().unwrap();
    let title = text.0;
    let body = text.1;

    // Set the mime type for the page.
    let content_type = ContentType::html().0;
    let tera = Tera::new("templates/**/*").unwrap();

    // Template strings.
    let mut context = Context::new();
    context.insert("title", &title);
    context.insert("content", &body);
    context.insert("back_button", &rot13("Go back to the main page".to_owned()));
    context.insert("refresh_button", &rot13("Get new article".to_owned()));
    let rendered = tera.render("random.html", &context).expect("Failed to render template.");

    Ok(Response::with((content_type, status::Ok, rendered)))
}

Most of the Tera code comes straight from the documentation. It searches for the templates in the directory, creates a new context, and inserts each variable into the context. I wrote the back button and refresh button text in regular English and encode it here. I did the same thing with the handler() function that returns the index page.

fn handler(_: &mut Request) -> IronResult<Response> {
    // Set the mime type for the page.
    let content_type = ContentType::html().0;
    let tera = Tera::new("templates/**/*").unwrap();

    // Template strings.
    let mut context = Context::new();
    context.insert("title", &rot13("ROT13 Wikipedia".to_owned()));
    context.insert("description", &rot13("Wikipedia articles returned encoded in ROT13. Click the button below for a random article. This site is meant to be nonsense.".to_owned()));
    context.insert("article_button", &rot13("Get a random Wikipedia article".to_owned()));
    context.insert("my_button", &rot13("Check out my site".to_owned()));
    let rendered = tera.render("index.html", &context).expect("Failed to render template.");

    Ok(Response::with((content_type, status::Ok, rendered)))
}

Much easier to update than hard-coding the encoded text into the page.

That's basically the end of this project!

Deploying on Heroku

I'd never used Heroku before. I've been a longtime fan of controlling my own DigitalOcean droplets, but I've let those lapse for a while. I can imagine that doing all of this manually on DigitalOcean would've been more difficult than Heroku's automated build process. While there's something to be said for knowing how to sysadmin your own servers, I'm glad I just deployed this to Heroku this time.

It's a pretty straightforward process, but since there isn't native support for Rust on Heroku, I needed to use a buildpack. I used heroku-buildpack-rust, which seems to be the most widely used. The buildpack can be added on the settings tab.

I also had to add a Procfile to let Heroku know where my compiled binary would be:

web: ./target/release/jvxvcrqvn

I put my code on Github, connected with Heroku, and deployed. I had a bunch of failures because I didn't realize that the port needed to be dynamic, but after fixing that, everything worked!