Escaping the Garden (Qt through premake)

I generally am not a fan of frameworks. Instead, I tend to prefer tools that maximize my control, which results in a love of things like C++, Arch Linux, and toolkits. For a recent side project at work, I needed to create a GUI interface. In the past, I had used wxWidgets but I decided it was time to expand my horizons and try something else, so I tried Qt, and I have to say its been amazing. Everything seems to flow so easily when writing GUIs in Qt. Unfortunately, Qt is very much a framework. Abandon your freedom of choice and never venture outside the walled garden and you will be happy.

I, however, prefer to take the red pill. One of the first things you'll notice when moving to Qt is that they have their own makefile system called qmake. This build system will ensure you have your Qt dependencies and will invoke Qt's Meta Object Compiler (moc). (Thats right, Qt has its own preprocessor). My target in sight, I decided to claim a small victory against the garden and continue to use my build system of choice premake.

Premake is a build system built on top of Lua. This allows for great flexibility, since its a full scripting language rather than a DSL like CMake. In order to accomplish my task, I needed to override some functions in premake's gmake generator. It is worth noting that I am doing this in premake4, premake5 is currently in alpha and has standardized facilities for overriding functions.

The Qt preprocessor parses through C header files and generates C source code files that need to be compiled into the final binary. For my purposes, the preparser was handling a C macro Q_OBJECT. This means that first I must identify files needing to be preprocessed. I decided my general structure would be:

src/
Original source code
moc/
Folder filled with generated code from the moc output

So first I have a utility function to simply print a lua table out to the console that helps with debugging:

function print_table(t)
   for key,value in pairs(t) do print(key,value) end
end

and a function that will translate the path of a file from src/*.h to moc/*moc.cpp (moc postfix to prevent name collisions)

function translate_file(path)
   return string.gsub(string.gsub(path, "src/", "moc/"), ".h", "moc.cpp")
end

Now I'll need to identify the files that need to be preprocessed, so I iterate over the files line-by-line looking for Q_OBJECT as a substring. Since I am not actually parsing the C code there is a change that I could have a false positive in the form of comments or blocked removed by the real C Preprocessor but for my purposes I haven't run into the issue:

function needs_preprocessing(path)
   for line in io.lines(path) do
      if string.find(line, "Q_OBJECT") ~= nil then
	 return true
      end
   end
   return false
end

Now, due to the structure of premake and the order it executes functions I will need to do the actualy transformation in two steps:

  1. Generate prebuild commands that run the header files through moc
  2. Add the generated files to the list of source files for compilation

It would be much cleaner if I could have done this all in one step, but that would involve editing premake.

For the first step, I will overload gmake_cpp_config(). First I will need to save the current gmake_cpp_config() function, then I will replace it with my own function that will invoke the saved function.

old_gmake_cpp_config = premake.gmake_cpp_config
-- Modify prebuild commands
premake.gmake_cpp_config = function(proj, cc)
   for i,k in pairs(os.matchfiles("src/**.h")) do
      if needs_preprocessing(k) then
	 processed_file = translate_file(k)
	 table.insert(proj.prebuildcommands, "mkdir -p $$(dirname " .. processed_file .. ") && moc -o " .. processed_file .. " " .. k)
      end
   end
   old_gmake_cpp_config(proj, cc)
end

As you can see, I am iterating over the src/ directory looking for header files and then executing moc on them if they need preprocessing. Now that we have the output files, we need to add them to the list of files to be compiled. This involves overriding make_cpp() in much the same way:

old_make_cpp = premake.make_cpp
-- Modify files
premake.make_cpp = function(proj)
   for i,k in pairs(os.matchfiles("src/**.h")) do
      if needs_preprocessing(k) then
	 processed_file = translate_file(k)
	 table.insert(proj.files, processed_file)
      end
   end
   old_make_cpp(proj)
end

After all that, I just need to make sure I link the proper Qt libraries to my project, and build! Yay freedom.

Adventures in Battery Life on Arch Linux

Recently, I purchased an Asus ux303ln which is advertised as having an 8 hour battery life. I promptly then wiped the drive and installed Arch Linux which has only been giving me about two hours of battery life. I started doing some research and found some suggestions on how to improve battery life, so I am going to go through each one and measure the power usage to see how each one individually affects battery life. To measure battery usage, I am going to use powerstat at boot while idling, and when I get down to 30% battery I am going to use uptime to measure how long the battery lasts. The uptime tests won't be "scientific" in that I will be using my laptop during the time the power is draining but I'll try to keep my usage as consistent as possible (no youtube, no video games).

Establish a baseline

First I will go through the motions for my unmodified system.

Running for 300.0 seconds (30 samples at 10.0 second intervals).
ACPI battery power measurements will start in 180 seconds time.

  Time    User  Nice   Sys  Idle    IO  Run Ctxt/s  IRQ/s Fork Exec Exit  Watts
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
 Average   0.2   0.0   0.0  99.8   0.0  1.0   85.4   22.5  0.0  0.0  0.4  13.10
  StdDev   0.1   0.0   0.0   0.1   0.0  0.0    4.9    1.8  0.2  0.0  1.7   0.09
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
 Minimum   0.1   0.0   0.0  99.7   0.0  1.0   76.3   19.8  0.0  0.0  0.0  13.03
 Maximum   0.3   0.0   0.1  99.9   0.0  1.0   98.0   28.6  1.0  0.0  9.0  13.48
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
Summary:
 13.10 Watts on Average with Standard Deviation 0.09

And my uptime at 30% battery:

15:53:59 up  1:59,  0 users,  load average: 0.51, 0.37, 0.24

Which assuming linear consumption of battery I'd theoretically have 2 hours and 50 minutes of battery life.

Enabling power saving on the intel graphics

To enable power saving on the Intel graphics all I need to do is create /etc/modprobe.d/i915.conf with these contents:

options i915 enable_rc6=1 enable_fbc=1 lvds_downclock=1
Running for 300.0 seconds (30 samples at 10.0 second intervals).
ACPI battery power measurements will start in 180 seconds time.

  Time    User  Nice   Sys  Idle    IO  Run Ctxt/s  IRQ/s Fork Exec Exit  Watts
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
 Average   0.2   0.0   0.0  99.8   0.0  1.1   85.9   23.9  0.0  0.0  0.4  12.46
  StdDev   0.1   0.0   0.0   0.1   0.0  0.2    9.1    6.8  0.2  0.0  1.5   0.09
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
 Minimum   0.1   0.0   0.0  99.6   0.0  1.0   73.8   17.4  0.0  0.0  0.0  12.37
 Maximum   0.3   0.0   0.1  99.9   0.0  2.0  119.2   55.1  1.0  0.0  8.0  12.82
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
Summary:
 12.46 Watts on Average with Standard Deviation 0.09

And my uptime at 30% battery:

21:33:49 up  2:05,  0 users,  load average: 0.05, 0.08, 0.12

Which assuming linear consumption of battery I'd theoretically have 2 hours and 58 minutes of battery life. This change did slightly improve battery life but it wasn't a significant change.

Enabling PCIe Force ASPM

Next I added pcie_aspm=force to the kernel boot line

Running for 300.0 seconds (30 samples at 10.0 second intervals).
ACPI battery power measurements will start in 180 seconds time.

  Time    User  Nice   Sys  Idle    IO  Run Ctxt/s  IRQ/s Fork Exec Exit  Watts
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
 Average   0.2   0.0   0.0  99.7   0.0  1.0   89.5   26.3  0.1  0.0  0.5  11.76
  StdDev   0.1   0.0   0.0   0.1   0.0  0.0   15.1    9.9  0.2  0.0  1.7   0.06
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
 Minimum   0.1   0.0   0.0  99.5   0.0  1.0   75.0   19.2  0.0  0.0  0.0  11.69
 Maximum   0.4   0.0   0.1  99.8   0.0  1.0  149.5   69.6  1.0  0.0  9.0  11.94
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
Summary:
 11.76 Watts on Average with Standard Deviation 0.06

And my uptime at 30% battery

03:54:08 up  2:12,  0 users,  load average: 0.02, 0.09, 0.10

Which assuming linear consumption of battery I'd theoretically have 3 hours and 8 minutes of battery life. This change did slightly improve battery life but it wasn't a significant change.

Install powerdown

Next I install powerdown-git from the aur. In addition for this step I keep backlight brightness at max

Running for 300.0 seconds (30 samples at 10.0 second intervals).
ACPI battery power measurements will start in 180 seconds time.

  Time    User  Nice   Sys  Idle    IO  Run Ctxt/s  IRQ/s Fork Exec Exit  Watts
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
 Average   0.2   0.0   0.0  99.7   0.0  1.1   95.2   26.7  0.0  0.0  0.3  11.11
  StdDev   0.1   0.0   0.0   0.1   0.0  0.4   32.8   16.3  0.2  0.0  1.5   0.10
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
 Minimum   0.1   0.0   0.0  99.6   0.0  1.0   78.3   19.3  0.0  0.0  0.0  11.00
 Maximum   0.3   0.0   0.1  99.8   0.1  3.0  265.3  112.9  1.0  0.0  8.0  11.45
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
Summary:
 11.11 Watts on Average with Standard Deviation 0.10

And my uptime at 30% battery

22:41:50 up  2:00,  0 users,  load average: 0.29, 0.30, 0.25

Oddly enough my power usage went down but the battery life did too. I suspect its just a symptom of my non-scientific test. The resting watts are really the only "scientific" part of this post.

Use Bumblebee to turn off nvidia card

Running for 300.0 seconds (30 samples at 10.0 second intervals).
ACPI battery power measurements will start in 180 seconds time.

  Time    User  Nice   Sys  Idle    IO  Run Ctxt/s  IRQ/s Fork Exec Exit  Watts
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
 Average   0.3   0.0   0.0  99.7   0.0  1.0   90.8   57.0  0.2  0.1  0.1   8.32
  StdDev   0.1   0.0   0.0   0.1   0.0  0.2    9.5    5.5  0.7  0.4  0.5   0.12
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
 Minimum   0.2   0.0   0.0  99.4   0.0  1.0   78.2   50.3  0.0  0.0  0.0   8.26
 Maximum   0.4   0.0   0.1  99.8   0.0  2.0  120.7   71.1  4.0  2.0  2.0   8.74
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
Summary:
  8.32 Watts on Average with Standard Deviation 0.12

My uptime at 30% battery:

18:31:11 up  2:57,  0 users,  load average: 0.27, 0.39, 0.24

Which would get us to 4 hours and 12 minutes total

Enable PSR (Panel Self Refresh)

In /etc/modprobe.d/i915.conf I added enable_psr=1. Check out http://blog.vivi.eng.br/?p=187 for details. You can check its status with:

sudo cat /sys/kernel/debug/dri/0/i915_edp_psr_status
Running for 300.0 seconds (30 samples at 10.0 second intervals).
ACPI battery power measurements will start in 180 seconds time.

  Time    User  Nice   Sys  Idle    IO  Run Ctxt/s  IRQ/s Fork Exec Exit  Watts
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
 Average   0.3   0.0   0.0  99.7   0.0  1.0   89.7   60.7  0.0  0.0  0.0   8.04
  StdDev   0.1   0.0   0.0   0.1   0.0  0.0    9.0   11.5  0.2  0.0  0.0   0.97
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
 Minimum   0.2   0.0   0.0  99.5   0.0  1.0   79.2   50.5  0.0  0.0  0.0   7.35
 Maximum   0.4   0.0   0.1  99.8   0.0  1.0  114.6   87.9  1.0  0.0  0.0  11.62
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
Summary:
  8.04 Watts on Average with Standard Deviation 0.97

My uptime at 30% battery:

00:42:13 up  3:49,  0 users,  load average: 0.00, 0.03, 0.09

which would get us to 5 hours and 27 minutes

Bonus round (changing brightness)

So far, all of my changes have been based on imperceptible changes to my experience using the laptop. For funsies, I decided I should see what power usage I am at when I let powerdown reduce my backlight (which it automatically does on battery power, but I've been turning the backlight back to max).

$ bc -l <<< "$(cat /sys/class/backlight/intel_backlight/brightness) / $(cat /sys/class/backlight/intel_backlight/max_brightness) * 100"
20.00000000000000000000

Looks like powerdown is setting my brightness to 20% when I unplug my laptop. And now the powerstat:

  Time    User  Nice   Sys  Idle    IO  Run Ctxt/s  IRQ/s Fork Exec Exit  Watts
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
 Average   0.3   0.0   0.1  99.6   0.0  1.1  127.0   38.6  0.1  0.0  0.3   4.43
  StdDev   0.0   0.0   0.0   0.1   0.0  0.3   11.7    6.9  0.2  0.0  1.6   0.01
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
 Minimum   0.2   0.0   0.0  99.5   0.0  1.0  114.1   34.3  0.0  0.0  0.0   4.41
 Maximum   0.4   0.0   0.1  99.7   0.1  2.0  172.4   70.8  1.0  0.0  9.0   4.44
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
Summary:
  4.43 Watts on Average with Standard Deviation 0.01

So it seems that I could get considerable boosts to my battery life if I allow my brightness to be turned down, perhaps exceeding the advertised 8 hours of battery life that I am supposed to get on windows.

Anything you zcat I zcat faster (under certain conditions)

Pleasant Surprises, or how I learned to stop worrying and benchmark the code

It all started with a flight to see my parents for christmas. During normal weeks I often find talks about programming that I wish to watch but can't find the time to, so when I fly, I download the videos to my phone and go through them. This has the side effect that every time I travel I come out of it with a strong desire to be a better programmer. In this case, the video that got to me was Plain Threads are the GOTO of todays computing. At my work, I frequently have to delve into thousands of gzipped logs each about 20MB compressed. In order to make sense of it all, I naively wrote a log parser that had some pretty bad design decisions that I had to fix.

Starting a full rewrite of my log parser, I had to first set up the decompressor. Once I had a decompressor implemented, I remembered the age old rule that performance is not always intuitive, so code should be benchmarked. During my benchmarking, I stumbled upon an interesting finding: I was beating all the standard tools.

The Benchmark Script

Upon discovering my unexpected performance, I decided that it was time to do extensive benchmarking to ensure it wasn't just a fluke. First I separated out the decompression code from the rest of the repository and put it up on github at https://github.com/tomalexander/fzcat. I needed to create a command line interface to the decompressor so I decided to just treat all commandline args as paths to files. This is a slight deviation from zcat which supports four commandline flags:

Flag Action Reason Ignored
-f –force Forces decompression We're going to try decompressing regardless
-h –help Display a help screen and quit We have no command line flags so help isn't necessary
-L –license Display the gzip license and quit The code is distributed under the Unlicense
-V –version Version. Display the version number and compilation options then quit The project isn't versioned

The benchmark script contains two phases: Warm and cold cache. We test both to ensure that theres no corner cases that would lead to poor performance. For testing We have 5 tests:

  1. zcat
  2. GNU parallel + zcat
  3. pigz
  4. GNU parallel + pigz
  5. fzcat (The program this blog post is about)

I wrote the benchmark script in python to keep things simple. All it does it invoke the 5 tests with warm and cold caches using all the files, and files inside folders that are passed in as commandline arguments.

I also wrote a validation script that just does zcat and fzcat to ensure that the output of the two scripts is identical.

Results

I ran the test against 189 files, each around 20MB in size when compressed. The full results are on the github page but there wasn't a significant difference between cold and warm caches so I'll use warm cache for the rest of this post:

Test Seconds
zcat 107.95278143882751
parallel zcat 42.61844491958618
pigz 55.39036321640015
parallel pigz 42.56399941444397
fzcat 28.799103498458862

Recommendations

fzcat will only be faster under certain conditions:

  1. You must be decompressing multiple files at once
  2. You must have a multicore machine
  3. You must have enough ram to fit 4 of the decompressed files entirely in memory

The 3rd restriction is to ensure that files are output to stdout in order. If you don't meet those requirements I recommend you use pigz for single files, or GNU Parallel + zcat for multiple files.

Theories

Those figures are all a practical person would need, but I can't let that go without proper investigation as to why we are getting these results. My theory as to why my performance was superior falls into the following:

  1. Using mmap instead of read leads to copying memory around less, so zcat and pigz probably are using read.
  2. zcat and pigz are processing a single file at a time, rather than buffered the decompressed output in ram like I am doing. This enables them to be used on machine without piles of ram.
  3. miniz is a superb implementation of zlib

Testing the theories

mmap vs read

In order to test this theory, I need to write a fork of the decompressor that uses read instead of mmap.

Type Time
read 27.672048330307007
mmap 27.314319849014282

This is interesting in that mmap didn't create as significant of an improvement as I expected. I must not be IO pegged. The code for this change (which won't be merged into master, we will keep mmap) is available at https://github.com/tomalexander/fzcat/tree/read_instead_of_mmap.

single vs multiple files

I think its safe to assume zcat is processing one file at a time, but pigz is a special threaded implementation of zlib so first I am going to investigate the pigz source code to see what its doing. First I downloaded the source tarball from http://zlib.net/pigz/. NOTE All code snippets from pigz that I include in this blog are under the pigz license. Upon inspection of the Makefile it looks like pigz.c is included in the final pigz binary so thats probably a good place to start looking. Inside main() there appears this block:

/* process command-line arguments, no options after "--" */
done = noop = 0;
for (n = 1; n < argc; n++)
    if (noop == 0 && strcmp(argv[n], "--") == 0) {
	noop = 1;
	option(NULL);
    }
    else if (noop || option(argv[n])) { /* true if file name, process it */
	if (done == 1 && g.pipeout && !g.decode && !g.list && g.form > 1)
	    complain("warning: output will be concatenated zip files -- "
		     "will not be able to extract");
	process(strcmp(argv[n], "-") ? argv[n] : NULL);
	done++;
    }

This is looping through the arguments in order, and passing the args to process() which, would support that theory. Digging into process further we see a lot of setup and then these lines:

/* process ind to outd */
if (g.verbosity > 1)
    fprintf(stderr, "%s to %s ", g.inf, g.outf);
if (g.decode) {
    if (method == 8)
	infchk();
    else if (method == 257)
	unlzw();
    else
	cat();
}

So pigz does indeed process each file in-order.

Since zcat is using read instead of mmap I will keep the decompressor from the above test and remove any threading so that it processes one file at a time.

Type Time
Single 64.25435829162598
Threaded 27.672048330307007

The code for this portion of the test is available at https://github.com/tomalexander/fzcat/tree/single_instead_of_threaded.

miniz vs zlib

For this final test, we will use the code from the above test which is using read and only a single thread. This should be enough to compare the raw performance of miniz vs zlib by comparing our binary vs zcat.

Type Time
fzcat (modified for test) 64.25435829162598
zcat 109.0133900642395

Conclusions

So it seems that the benefit of mmap vs read isn't as significant as I expected. THe benefit theoretically could be more significant on a machine with multiple processes reading the same file but I'll leave that as an excercise for the reader.

miniz turned out to be significantly faster than zlib even when both are used in the same fashion (single threaded and read). Additionally, using the copious amounts of ram available to machines today allowed us to speed everything up even more with threading.

OpenSSL Sockets in C++ (part 4)

We've come to it at last! The introduction of OpenSSL to our sockets. The general structure we're going to go for, is adding a make_secure() function to the ssl_socket class that initiations the SSL handshake. After that has occurred, we will want to have all of our read and write functions call their OpenSSL couterparts instead.

First in ssl_socket.h we will need to add a new import, some new fields, and two new functions.

#include <openssl/ssl.h>
SSL* ssl_handle;
SSL_CTX* ssl_context;
/**
 * Check to see if this socket is an encrypted socket
 */
bool is_secure() const { return ssl_handle != nullptr; }

/**
 * Perform the SSL handshake to switch all communications over
 * this socket from unencrypted to encrypted
 */
ssl_socket& make_secure();

We also should make sure to initialize the two new variables to nullptr in the constructor

ssl_socket::ssl_socket(const std::string & _host, const std::string & _port):
    address_info(nullptr),
    connection(-1),
    host(_host),
    port(_port),
    ssl_handle(nullptr),
    ssl_context(nullptr)
{}

Now we need to establish the function that does the handshake. First we need to create an OpenSSL context that configures what version of SSL/TLS we wish to use. Next we need to create a handle, which is much like the file descriptor opened from the socket call earlier.

#include <openssl/err.h>
///...
namespace
{
    std::string get_ssl_error()
    {
	return std::string(ERR_error_string(0, nullptr));
    }
}
///...
ssl_socket& ssl_socket::make_secure()
{
    ssl_context = SSL_CTX_new(TLSv1_client_method());
    if (ssl_context == nullptr)
    {
	throw ssl_socket_exception("Unable to create SSL context " + get_ssl_error());
    }

    // Create an SSL handle that we will use for reading and writing
    ssl_handle = SSL_new(ssl_context);
    if (ssl_handle == nullptr)
    {
	SSL_CTX_free(ssl_context);
	ssl_context = nullptr;
	throw ssl_socket_exception("Unable to create SSL handle " + get_ssl_error());
    }

    ///... more code goes here ...///

    return *this;
}

Now that we have a handle, we need to tie it together with the plain open socket we already have and initiate the handshake.

// Pair the SSL handle with the plain socket
if (!SSL_set_fd(ssl_handle, connection))
{
    SSL_free(ssl_handle);
    SSL_CTX_free(ssl_context);
    ssl_handle = nullptr;
    ssl_context = nullptr;
    throw ssl_socket_exception("Unable to associate SSL and plain socket " + get_ssl_error());
}

// Finally do the SSL handshake
for (int error = SSL_connect(ssl_handle); error != 1; error = SSL_connect(ssl_handle))
{
    switch(SSL_get_error(ssl_handle, error))
    {
      case SSL_ERROR_WANT_READ:
      case SSL_ERROR_WANT_WRITE:
	std::this_thread::sleep_for(std::chrono::milliseconds(200));
	break;
      default:
	SSL_free(ssl_handle);
	SSL_CTX_free(ssl_context);
	ssl_handle = nullptr;
	ssl_context = nullptr;
	throw ssl_socket_exception("Error in SSL handshake: " + get_ssl_error());
	break;
    }
}

We also need to add more to the disconnect() function to free OpenSSL resources.

void ssl_socket::disconnect()
{
    if (ssl_handle != nullptr)
    {
	SSL_shutdown(ssl_handle);
	SSL_free(ssl_handle);
	ssl_handle = nullptr;
    }

    if (ssl_context != nullptr)
    {
	SSL_CTX_free(ssl_context);
	ssl_context = nullptr;
    }

    if (connection >= 0)
    {
	close(connection);
	connection = -1;
    }

    if (address_info != nullptr)
    {
	freeaddrinfo(address_info);
	address_info = nullptr;
    }
}

Now we have the initiation and the cleanup handled, so we need to write the SSL calls for read / write.

ssl_socket& ssl_socket::write(const uint8_t* data, size_t length)
{
    for (const uint8_t* current_position = data, * end = data + length; current_position < end; )
    {
	if (!is_secure())
	{
	    ///... unencrypted handler here ...///
	} else {
	    ssize_t sent = SSL_write(ssl_handle, current_position, end - current_position);
	    if (sent > 0)
	    {
		current_position += sent;
	    } else {
		switch(SSL_get_error(ssl_handle, sent))
		{
		  case SSL_ERROR_ZERO_RETURN: // The socket has been closed on the other end
		    disconnect();
		    throw ssl_socket_exception("The socket disconnected");
		    break;
		  case SSL_ERROR_WANT_READ:
		  case SSL_ERROR_WANT_WRITE:
		    std::this_thread::sleep_for(std::chrono::milliseconds(200));
		    break;
		  default:
		    throw ssl_socket_exception("Error sending socket: " + get_ssl_error());
		    break;
		}
	    }
	}
    }
    return *this;
}
size_t ssl_socket::read(void* buffer, size_t length)
{
    if (!is_secure())
    {
	///... unencrypted handler here ...///
    } else {
	ssize_t read_size = SSL_read(ssl_handle, buffer, length);
	if (read_size > 0)
	{
	    return read_size;
	} else {
	    switch(SSL_get_error(ssl_handle, read_size))
	    {
	      case SSL_ERROR_ZERO_RETURN: // The socket has been closed on the other end
		disconnect();
		return 0;
		break;
	      case SSL_ERROR_WANT_READ:
	      case SSL_ERROR_WANT_WRITE:
		return 0; // Read nothing
		break;
	      default:
		throw ssl_socket_exception("Error reading socket: " + get_ssl_error());
		break;
	    }
	}
    }
}

Now our sockets themselves should be ready for use, though we're missing one key component: initializing and de-initializing the OpenSSL library. For this we're going to introduce a new class that initialzes the OpenSSL library in its constructor and de-initializes it in its destructor.

/**
 * Initializes and de-initializes the OpenSSL library. This should
 * only be instantiated and destroyed once
 */
class openssl_init_handler
{
  public:
    openssl_init_handler()
    {
	SSL_load_error_strings();
	SSL_library_init();
    }
    ~openssl_init_handler()
    {
	ERR_remove_state(0);
	ERR_free_strings();
	EVP_cleanup();
	CRYPTO_cleanup_all_ex_data();
	sk_SSL_COMP_free(SSL_COMP_get_compression_methods());
    }
};

Now we can make it a static variable inside the make_secure function so that it will be constructed the first time any socket because encrypted and destroyed at the end of the program.

ssl_socket& ssl_socket::connect()
{
    static openssl_init_handler _ssl_init_life;
    ///... the rest of the connect code here ...///
}

Great! Now everything should be in place, lets make two minor changes to main.cpp to point it at the https site. In the ssl_socket constructor change "http" to "https" and add a call to make_secure() before writing to the socket

int main(int argc, char** argv)
{
    try
    {
	ssl_socket s(HOST, "https");
	char buffer[BUFFER_SIZE];
	std::string http_query = "GET / HTTP/1.1\r\n"    \
	    "Host: " + std::string(HOST) + "\r\n\r\n";

	s.connect().make_secure().write(http_query);

Now all that is left is compiling and testing:

$ premake4 gmake
Building configurations...
Running action 'gmake'...
Generating Makefile...
Generating sockets_part_4.make...
Done.
$ scan-build make
...
scan-build: No bugs found.
$ valgrind --leak-check=full ./sockets_part_4
...
definitely lost: 24 bytes in 1 blocks

Uh-oh! Seems theres some small part of SSL_library_init I'm not freeing. If anyone out there knows what I'm missing please drop me a line.

See how easy it is to use OpenSSL now that we've established a common interface between plain sockets and encrypted sockets? Now theres plenty of room to build on top of this to create interesting things (for example, auto-reconnecting sockets with their own auto-firing handshakes for protocols like IRC or XMPP). Go forth and create. The source for this post can be found here under the ISC license.

OpenSSL Sockets in C++ (part 3)

For this post we're going to move all of our code from part 2 into its own class to facilitate the SSL transition. This will be the last post before we start using OpenSSL to encrypt the stream. We're going to create two files ssl_socket.h and ssl_socket.cpp. Since all the socket code was covered in part 1 and part 2 we're not going to go into too much detail with the code. Instead, we'll cover the structure of the class.

For our socket class error handling we're going to take a deviation from much of the networking functions we've been using. The BSD/posix networking functions traditionally either return an error number or set errno to indicate when theres a problem. Unfortunately, when doing this style of error handling, every call must be followed by a series of if statements for handling errors. This is due to the fact that they're implemented in C which lacked exception support. C++ support zero cost exceptions which incur zero run-time cost when an exception has not occurred. Since errors should be the exception (hahaha) to the rule, in terms of performance it makes sense to use them rather than rely on branch prediction to reduce the cost of if-statement error handling.

We're going to move most all of the code up to the send / recv block in a connect() function. We don't want this in the constructor to in order to allow for re-connecting the sockets on failure (useful in chat clients). We'll also introduce a disconnect function to allow for a socket to be disconnected without requiring the destruction of the object. The copy and assignment ctor will be disabled in order to prevent accidental copying.

/**
 * Unified interface for non-blocking read and blocking write, plain
 * and SSL sockets
 */
class ssl_socket
{
  public:
    /**
     * Construct a socket that will eventually connect to the given
     * host and port.
     * 
     * @param _host The hostname or ip address to connect to (ex: "fizz.buzz" or "208.113.196.82")
     * @param  _port The port or service name to connect to (ex: "80" or "http")
     */
    ssl_socket(const std::string & _host, const std::string & _port);
    virtual ~ssl_socket();
    ssl_socket(ssl_socket const&) = delete;
    ssl_socket& operator=(ssl_socket const&) = delete;

    /**
     * Perform a DNS request and establish an unencrypted TCP socket
     * to the host.
     * 
     * @return A reference to itself
     * @throw ssl_socket_exception if any part of the connection fails
     */
    ssl_socket& connect();

    /**
     * Disconnect from the host and destroy the socket
     */
    void disconnect();

    ///... more code here...///
};

We're going to introduce read and write functions. The read function will behave in a non-blocking fashion returning the data and the number of bytes read. The write function, however, we will make blocking for simplicity, so we don't have to have either a thread or another callback to constantly push more data in a queue across the socket.

/**
 * Blocking write of data to the socket
 * 
 * @param data pointer to raw bytes to write to socket
 * @param length number of bytes we wish to write to the socket
 * 
 * @return a reference to itself
 * @throw ssl_socket_exception if an error occurs other than EAGAIN/EWOULDBLOCK
 */
ssl_socket& write(const uint8_t* data, size_t length);

/**
 * Blocking write of a string to the socket (*does not write the
 * null terminator*)
 * 
 * @param data a string to write to the socket
 * 
 * @return a reference to itself
 * @throw ssl_socket_exception if an error occurs other than EAGAIN/EWOULDBLOCK
 */
ssl_socket& write(const std::string & data);

/**
 * Non-blocking attempt to read from the socket
 * 
 * @param buffer a block of memory in which the read data will be placed
 * @param length the maximum number of bytes we can read into buffer
 * 
 * @return The number of bytes read from the socket. Please note that 0 can be returned if theres no data available OR if the socket has closed. Use is_connected to determine if the socket is still open.
 * @throw ssl_socket_exception if an error occurs other than EAGAIN/EWOULDBLOCK
 */
size_t read(void* buffer, size_t length);

One thing you may have noticed is since we're returning the number of bytes read from the read function, and we're using it in a non-blocking fashion, we may return zero when there are no bytes available to read on the socket. This, however, used to be the signal that the socket had been closed on the other end. To allow the user to be able to check if the socket is open we're going to introduce an is_connected function to indicate the connected status of the socket.

/**
 * Check to see if the socket is still connected. If the socket
 * has been disconnected on the server side and no read or write
 * has occurred then it is possible for this to return true
 * because the disconnect has not yet been detected
 */
bool is_connected() const { return connection >= 0; }

Finally, we're going to need a main.cpp to use the socket. In this code we open a socket on the stack, which means it will automatically disconnect and clean itself up when it goes out of scope since we have our destructor set up to call disconnect.

int main(int argc, char** argv)
{
    try
    {
	ssl_socket s(HOST, "http");

	///... more code here ...///

    } catch (const ssl_socket_exception & e) {
	std::cerr << e.to_string() << '\n';
	return 1;
    }
    return 0;
}

Now we make our http query just like before, connect our socket, and write the query to it. Since write is a blocking call we don't have to worry about calling it multiple times.

char buffer[BUFFER_SIZE];
std::string http_query = "GET / HTTP/1.1\r\n"    \
    "Host: " + std::string(HOST) + "\r\n\r\n";

s.connect().write(http_query);

Finally, we create a loop contingent on the socket being connected that will poll for data available to read and echo it out to the shell.

while (s.is_connected())
{
    size_t length = s.read(buffer, BUFFER_SIZE);
    if (length == 0 && s.is_connected())
    {
	std::this_thread::sleep_for(std::chrono::milliseconds(200));
    } else {
	std::cout << std::string(buffer, length);
    }
}

Now lets build and test just like before (I've added premake to the folder now)

$ premake4 gmake
Building configurations...
Running action 'gmake'...
Generating Makefile...
Generating sockets_part_3.make...
Done.
$ scan-build make
...
scan-build: No bugs found.
$ ./sockets_part3
<html of page here>
$ valgrind --leak-check=full ./sockets_part_3
...
All heap blocks were freed -- no leaks are possible
...

Mission accomplished. All the code for this post is available here under the ISC license. We are finally ready to venture into the world of OpenSSL, which we will do in part 4.

OpenSSL Sockets in C++ (part 2)

For part two of the series we will be switch the code from part 1 to non-blocking sockets.

First we need to add to our includes:

#include <fcntl.h>
#include <thread>

Then we need to go to where we connect the socket and use fcntl(3) to set the socket option to O_NONBLOCK. Its important, for our uses, that setting the socket to non-blocking occurs after the connect(3) call so we don't have to deal with the complexity of half-open sockets.

if (fcntl(connection, F_SETFL, O_NONBLOCK) < 0)
{
    error_string = "Unable to set nonblocking: " + std::string(strerror(errno));
    close(connection); // Cleanup
    connection = -1;
    continue;
}

If you compile the program now and run it you may notice some odd behavior; the program doesn't print any output! This is because when we call recv we're actually getting back a -1 and its setting errno to either EAGAIN or EWOULDBLOCK. In those cases, we need to keep polling for the information. Since our program does nothing else we're going to be throwing it in an endess loop producing the same behavior as a blocking socket, but its important to know that for more complicated programs we would want to be doing other behaviors when data isn't available.

We're going to replace our read loop with the following that will continuously attempt to read until either an error that isn't EAGAIN or EWOULDBLOCK occurs or the socket closes.

for (bool stop = false; !stop;)
{
    ssize_t read_size = recv(connection, buffer, BUFFER_SIZE, 0);
    switch (read_size)
    {
      case -1: // We got an error, check errno
	if (errno == EAGAIN || errno == EWOULDBLOCK)
	{
	    std::this_thread::sleep_for(std::chrono::milliseconds(200));
	} else {
	    std::cerr << "Error reading socket: " << strerror(errno) << '\n';
	    stop = true;
	}
	break;
      case 0: // The socket has been closed on the other end
	stop = true;
	break;
      default: // We actually read some data
	std::cout << std::string(buffer, read_size);
	break;
    }
}

Its important to note that technically the send call in both the blocking and non-blocking examples could fail to send the entire request. Considering the miniscule size of our http request its pretty safe to assume that send will always send the full amount of data.

Once again, lets compile, run, and test the program with valgrind and clang-analyzer.

$ clang++ -std=c++11 -o sockets_part2 files/post_files/sockets_part_2/sockets_part2.cpp
$ ./sockets_part2
<html of page should show up here>
$ valgrind --leak-check=full ./sockets_part2
...
All heap blocks were freed -- no leaks are possible
...
$ scan-build clang++ -std=c++11 -o sockets_part2 files/post_files/sockets_part_2/sockets_part2.cpp
...
scan-build: No bugs found.

Looks good! In part 3 we will move all of our socket code into its own class so we can easily re-use it and also use a common interface for SSL sockets and normal sockets. The full source code for this post is available here under the ISC license.

OpenSSL Sockets in C++ (part 1)

The goal of this tutorial series is to walk through using posix sockets, from the ground up. Projects merely wishing to add networking would probably be best advised to look at already well established abstraction layers like Boost Asio.

To start off, we're going to create a basic http request. To keep things simple, for the first iteration, we're going to use a plain TCP blocking socket. First create a cpp file (mine is named sockets\part1.cpp). First some constants and includes:

#include <iostream>
#include <string>
#include <sys/socket.h>
#include <netdb.h>
#include <unistd.h>

namespace
{
    const char HOST[] = "fizz.buzz";
    const size_t BUFFER_SIZE = 1024;
}

int main(int argc, char** argv)
{
    return 0;
}

The first step to most network connections is doing a DNS request to convert a hostname like "fizz.buzz" to an ip address like "208.113.196.82". For illustration purposes you could manually do a DNS request from the shell with the following command:

$ dig +short fizz.buzz
208.113.196.82

To make a DNS request we will be using getaddrinfo(3) which will set the address of an addrinfo pointer passed into it.

struct addrinfo* address_info;
int error = getaddrinfo(HOST, "http", nullptr, &address_info);
if (error != 0)
{
    throw std::string("Error getting address info: ") + std::string(gai_strerror(error));
}

The second parameter to getaddrinfo defines the port. This can be a string for a protocol like "http" or "https", or it can be a numeric string like "80" and "443".

The third parameter to getaddrinfo is a set of "hints" indicating what type of connection we're looking to open. The hints param is optional and could just be a nullptr in this case without issue. In our hints we're setting ai_family to PF_UNSPEC to indicate that we are fine with any protocol. We're also setting ai_socktype to SOCK_STREAM to indicate that we wish to open a TCP byte stream.

The addrinfo struct that address_info now points to looks like this:

addrinfo:
  ai_flags      0
  ai_family     2               # AF_INET
  ai_socktype   1               # SOCK_STREAM
  ai_protocol   6               # IPPROTO_TCP
  ai_addrlen    16              # Length in bytes for the next field (ai_addr)
  ai_addr       sockaddr_in
    sin_family  2               # AF_INET (ipv4)
    sin_port    80              # default http port
    sin_addr    208.113.196.82  # ipv4 address to fizz.buzz
  ai_canonname  <blank>
  ai_next       nullptr         # Forms a linked list

As you can see we have all the details set for a TCP socket on port 80 to the ip address of fizz.buzz. Next we need to open a connection to the server. In the addrinfo struct we just generated there is an ai_next field that forms a singly-linked list, allowing getaddrinfo to return multiple values (for instance in the case where ipv4 and ipv6 would be supported. To handle that we will have to loop over the list trying to connect to each entry until we have a successful connection.

int connection = -1;
std::string error_string = "";
for (struct addrinfo* current_address_info = address_info; current_address_info != nullptr; current_address_info = current_address_info->ai_next)
{
    connection = socket(current_address_info->ai_family, current_address_info->ai_socktype, current_address_info->ai_protocol);
    if (connection < 0)
    {
	error_string = "Unable to open socket";
	continue;
    }

    if (connect(connection, current_address_info->ai_addr, current_address_info->ai_addrlen) < 0)
    {
	error_string = "Unable to connect";
	close(connection); // Cleanup
	connection = -1;
	continue;
    }

    break; // Success
}
if (connection < 0) // If we failed to connect
{
    throw error_string;
}

This loop is walking down the singly linked list trying each entry for a connection. First it attempts to open up a socket and checks to ensure that was successful. The opening of the socket is a local operation that doesn't involve any calls out to fizz.buzz. Next it tries to actually open the connection which is where the fizz.buzz server comes into play for the first time.

Now we're ready to make an HTTP request. We're going to make a very basic request for the home page with no special fields like cookies and user agents. The request string will look like this:

GET / HTTP/1.1
Host: fizz.buzz
<blank line>
std::string http_query = "GET / HTTP/1.1\r\n"       \
    "Host: " + std::string(HOST) + "\r\n\r\n";
send(connection, http_query.c_str(), http_query.size(), 0);

Finally we will need to read the result from the socket. Since we are using blocking sockets, the recv(3) call will wait until either there is data available or the connection has been closed before returning, which keeps this block of code simple.

char buffer[BUFFER_SIZE];

for (ssize_t read_size = recv(connection, buffer, BUFFER_SIZE, 0);
     read_size > 0;
     read_size = recv(connection, buffer, BUFFER_SIZE, 0))
{
    std::cout << std::string(buffer, read_size);
}

Now all we have left to do is cleanup after ourselves

close(connection);
freeaddrinfo(address_info);

Awesome! Lets compile and run the program

$ clang++ -std=c++11 -o sockets_part1 files/sockets_part1.cpp
$ ./sockets_part1
<html source of page should print here>

Lets also check for memory leaks and run some static analysis

$ valgrind --leak-check=full ./sockets_part1
$ scan-build clang++ -std=c++11 -o sockets_part1 files/sockets_part1.cpp

Looks good! In part 2 we will port this code over to non-blocking sockets. The source code for this post is available here under the ISC license.