Build Log: Send-to-Kindle Service + EPUB Conversion

Friday, 30 October 2020

Introduction

Amazon provides a Send to Kindle by E-mail service that is a great way to get non-Amazon ebooks onto the device wirelessly. It doesn’t work very well for EPUBs, though, which is unfortunately the format that a number of DRM-free ebook providers default to. A conversion service is provided:

PDFs can be converted to the Kindle format so you can take advantage of functionality such as variable font size, annotations, and Whispersync. To have a document converted to Kindle format (.azw), the subject line should be “convert” when e-mailing a personal document to your Send-to-Kindle address

In practice, the conversion is pretty shoddy, with typographical issues and no real chapter detection. My typical workflow, then, is:

  • Buy a DRM-free EPUB
  • Convert it to MOBI locally with (the excellent) Calibre:
    # Convert all EPUBs in the current directory to MOBI, 4 at a time.
    $ ls *epub | xargs -i{} -P4 -n1 /Applications/calibre.app/Contents/MacOS/ebook-convert '{}' '{}'.mobi
    
  • Email the MOBI to Amazon’s email service

This isn’t terrible, but is pretty frustrating when I’m away from my computer, say on an iPad or phone. I couldn’t find a hosted solution that provided this workflow (or similar), so I ended up building it for myself.

Implementation

I considered doing this in a few different languages (going as far as a tiny PoC in Rust), but eventually settled on Elixir. My main priority here was developer productivity: I wanted to get this done in ~2 days, so I didn’t want to spend too much time on “simple” things like parameter parsing and file uploads:

  • Rust: Much lower-level than I wanted. Things like handling multipart form uploads are extremely boilerplate-y.
  • Go: Webserver handlers are still fairly low-level. The module system was an immediate turn-off. No preemptive scheduler, which is a (minor, at this scale) downside here: long-running goroutines could back the runtime up, especially on the single-core VPS I intend to put this app on.
  • Rails: This seemed like a much better choice to whip this project up quickly, Rails is heavily optimized for developer productivity. However, the story for background processes hasn’t changed very much, so to run the Calibre converter I’d have to set up a background queue with workers reading off it, which is much too heavy for this project. Having everything run in a single process would be a significant improvement.
  • Elixir/Plug: This is pretty perfect. All the niceties of Ruby/Rails, but the process model allows me to orchestrate the Calibre conversion from the webserver process, so I’m responsible for fewer moving parts. The observability/hot-reloading capabilities are a nice bonus, as is the BEAM’s preemptive scheduler.

I’ve briefly used Elixir in the past (mainly to build a custom RSS server), so I wasn’t coming to it entirely fresh. This project has no real need for a database, so Phoenix seemed a bit much. I started off with a raw Plug app. I’m a fan of the amount of magic that Plug provides (which is to say, not much); my initial attempt at a solution was simply:

defmodule Ebook.Application do
  use Application

  @impl true
  def start(_type, _args) do
    children = [
      {Plug.Cowboy, scheme: :http, plug: Ebook.Webserver, options: [port: 4002]}
    ]

    opts = [strategy: :one_for_one, name: Ebook.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

defmodule Ebook.Webserver do
  use Plug.Builder

  plug Plug.Logger
  plug Ebook.Router
end

defmodule Ebook.Router do
  use Plug.Router
  import Plug.Conn
  require EEx

  plug :match
  plug Plug.Parsers, parsers: [:multipart]
  plug :dispatch

  EEx.function_from_file(:defp, :home_view, "template/home.eex", [])

  get "/" do
    send_resp(conn, 200, home_view())
  end

  post "/submit" do
    %{"who" => _who, "file" => file} = conn.params

    format = cond do
      String.match?(file.filename, ~r/epub$/iu) -> :epub
      String.match?(file.filename, ~r/mobi$/iu) -> :mobi
      true -> :none
    end

    conn = put_resp_content_type(conn, "application/json")

    if format == :none do
      send_resp(conn, 400, Jason.encode!(%{error: true, message: "Don't know how to handle files of this type"}))
    else
      {:ok, pid} = Task.start(fn ->
        if format == :epub do
          System.cmd("/Applications/calibre.app/Contents/MacOS/ebook-convert", [file.filename, "#{file.filename}.mobi"])
        end
      end)

      response = %{message: "Conversion started!"}
      send_resp(conn, 202, Jason.encode!(response))
    end
  end
end

This is succinct and gets the initial job done nicely. As usual, there is a long tail of smaller concerns to handle:

  • Email: I wanted to email the resulting MOBI to Amazon using my own Gmail account via SMTP. I fought with this for a long time before I realized that Google doesn’t really allow SMTP access with your Gmail username/password, even with the Less secure apps setting enabled. I eventually fixed this by creating an app password and using that instead.
  • Authentication: Überauth works well out of the box, although the documentation is a bit lacking/out-of-date. I’ve used CloudFront+Lambda@Edge to authenticate personal projects in the past (using cloudfront-auth), but there are benefits to having this app manage its own auth, like figuring out which Kindle email to send an upload to based on the logged-in user. This required the Plug session store and a couple of plugs to set up a Rails-like current_user mechanism.
  • Frontend: I very quickly outgrew the static HTML frontend. The very nature of this app is to click “Submit” and wait for 30-60 seconds while the conversion occurs, so something dynamic helps fill that time better. I could’ve gotten away with something simpler, but I used create-react-app with TypeScript, mainly to learn. A “Live View”-esque approach would’ve worked here, too. Annoyingly, the React development server takes 10 seconds to start, while the Elixir server starts up in a second or so.
  • File handling: Plug’s multipart handler deletes uploaded files when the process dies (when the server finishes processing a request, essentially). This isn’t good enough for us, because we want the file around after the initial request has completed, so we have to make a copy (and delete it when we’re done).
  • Progress notification: How does the frontend know when conversion has completed? Elixir’s Registry is probably a good way to solve this at a larger scale, but I went with something far more rudimentary. The file upload handler extracts the PID of the conversion process and returns it. The frontend uses a different (polling) endpoint to periodically check if that process is still alive or not; once that process doesn’t exist anymore, the job is done. The incantation to convert a string PID back to an actual PID was a bit tricky to find, but here it is:
    get "/poll/:pid" do
      pid = :erlang.list_to_pid('<#{pid}>')
    
      if Process.info(pid) do
        send_resp(conn, 200, Jason.encode!(%{exists: true}))
      else
        send_resp(conn, 200, Jason.encode!(%{exists: false}))
      end
    end
    

Error Handling / Resilience

At this point this service is passable, but has one glaring flaw: there’s pretty much no error handling. The conversion process could have crashed, but the frontend will happily assume that the absence of the process indicates success. How do we disambiguate these end states better?

Elixir provides a GenServer class that seemed like a good fit. From the docs:

A behaviour module for implementing the server of a client-server relation. A GenServer is a process like any other Elixir process and it can be used to keep state, execute code asynchronously and so on. The advantage of using a generic server process (GenServer) implemented using this module is that it will have a standard set of interface functions and include functionality for tracing and error reporting.

There are a couple of different ways this could be structured; here’s the approach I went for:

  • Every incoming request spawns a new cowboy process.
  • The /submit ile upload handler then starts an instance of the JobServer GenServer, passing in a function that does the actual work (convert to MOBI, email).
  • The JobServer internally spawns a new (raw) process that executes this “job function”.
  • The JobServer passes a reference to itself to the job function, and exposes a few methods that allow the function to check in with the JobServer about its progress. In this way, the JobServer is notified when the job function reaches various checkpoints, and can collect log output.
  • When the job function is spawned, we call Process.monitor(pid) on it so the JobServer is notified when the job finishes (either successfully or because of a crash) with a :DOWN message.
  • The JobServer maintains the status of the running job function as internal state, and passes this state back to any /poll requests that come in.
  • Once the job function is done, the JobServer sets a timeout to shut itself down in 1 minute (we poll every second, so this is reasonable).

With that out of the way, here’s what the JobServer looks like (a quick read of the GenServer docs is probably a prerequisite to understand this; it’s a bit cryptic at first glance):

defmodule JobServer do
  use GenServer

  # Client: the process that sets up and starts this server
  def start(task_fn), do: GenServer.start(__MODULE__, task_fn)
  def poll(pid), do: GenServer.call(pid, :poll)

  # Job: the job process/function that this server is executing
  def mark_checkpoint(pid, checkpoint), do: GenServer.cast(pid, {:checkpoint, checkpoint})
  def record_log(pid, log), do: GenServer.cast(pid, {:log, log})

  # Server: callbacks
  @impl true
  def init(task_fn) do
    {:ok, task_fn, {:continue, :start_task}}
  end

  @impl true
  def handle_continue(:start_task, task_fn) do
    parent = self()
    pid = spawn(fn -> task_fn.(parent) end)
    Process.monitor(pid)

    {:noreply, %{logs: []}}
  end

  @impl true
  def handle_call(:poll, _from, state = %{done: true}), do: {:reply, state, state, 60000}
  def handle_call(:poll, _from, state = %{crashed: {true, _}}), do: {:reply, state, state, 60000}
  def handle_call(:poll, _from, state), do: {:reply, state, state}

  @impl true
  def handle_cast({:log, l}, state = %{logs: logs}), do: {:noreply, Map.put(state, :logs, [l | logs])}
  def handle_cast({:checkpoint, c}, state), do: {:noreply, Map.put(state, :checkpoint, c)}

  @impl true
  def handle_info({:DOWN, _ref, :process, _object, :normal}, state), do: {:noreply, Map.put(state, :done, true), 60000}
  def handle_info({:DOWN, _ref, :process, _object, reason}, state), do: {:noreply, Map.put(state, :crashed, {true, reason}), 60000}
  def handle_info(:timeout, state), do: {:stop, :shutdown, state}
end

And the “job function” looks like:

fn (pid) ->
  JobServer.mark_checkpoint(pid, :started)

  filename = if format == :epub do
    JobServer.mark_checkpoint(pid, :conversion)
    JobServer.record_log(pid, "Attempting to convert #{filename} to MOBI")
    {:ok, path} = Ebook.Conversion.convert(filename)
    JobServer.record_log(pid, "Converted #{filename} to #{path}")
    path
  else
    JobServer.record_log(pid, "Received MOBI file; no conversion necessary.")
    filename
  end

  JobServer.mark_checkpoint(pid, :email)
  JobServer.record_log(pid, "Attempting to send MOBI file via email")
  email = Ebook.Email.SendToKindle.personal_document(who, filename)

  Ebook.Mailer.deliver!(email)
  JobServer.record_log(pid, "Email sent!")

  JobServer.mark_checkpoint(pid, :cleanup)
  JobServer.record_log(pid, "Cleaning up")
  File.rm(filename)
end

This is now far more robust. The frontend now has access to fine-grained state about the status of the running job function: did it crash? did it complete successfully? if it’s still running, how far has it gotten?

Streaming Logs

One last improvement I wanted to make was to have the frontend display raw logs from the System.cmd command. Calibre’s ebook-convert program is fairly verbose about what it’s doing, and having this stream scroll past in the frontend UI would be a nice touch. This was a bit more involved than I expected, though.

System.cmd allows the caller to pass an :into value, which “injects the result into the given collectable”. I had to look this up; Collectable is an Elixir protocol that is: “A protocol to traverse data structures”. This is a bit vague, but the documentation continues:

The Collectable module was designed to fill the gap left by the Enumerable protocol. Collectable.into/1 can be seen as the opposite of Enumerable.reduce/3. If the functions in Enumerable are about taking values out, then Collectable.into/1 is about collecting those values into a structure.

If I’m reading that correctly, all we need is to write a Collectable that calls JobServer.record_log on every line of output it receives:

defmodule SystemCmdStreamingLogger do
  defstruct [:pid]

  defimpl Collectable do
    def into(logger) do
      {logger, fn
        (_, {:cont, term}) ->
          term = term |> String.trim |> String.replace("\n", ", ") |> String.trim
          JobServer.record_log(l.pid, term)
        (_, :done) -> :done
      end}
    end
  end
end

and pass that to System.cmd:

System.cmd(
  "/Applications/calibre.app/Contents/MacOS/ebook-convert",
  [filename, "#{filename}.mobi"]
  stderr_to_stdout: true,
  # `pid` is the process ID of the `JobServer` instance this conversion is running under
  into: %SystemCmdStreamingLogger{pid: pid}
)

Conclusion

This feels like a good stopping point; this implementation is likely to require minimal supervision/babysitting while doing its thing (which is an [if not the] important concern for weekend projects). On the whole, I’m very happy with the choice of Elixir here.

Edit