Just Software Solutions

Blog Archive

Cryptography and Society

Tuesday, 05 May 2015

Politicians in both the UK and USA have been making moves towards banning secure encryption over the last few months. With the UK general election coming on Thursday I wanted to express why I think this is a seriously bad idea.

Context

Back in January there were some terrorist attacks in Paris. These attacks were and are a serious matter, and stopping such attacks in future should be something that governments concern themselves with.

However, one aspect of the response by politicians has been to call for securely encrypted communication outlawed. In particular, the British Prime Minister, David Cameron, asked

"In our country, do we want to allow a means of communication between people which, even in extemis with a signed warrant from the Home Secretary personally, that we cannot read?"

It was clear from the context that he thinks the answer is a resounding "NO", and from the further actions of politicians both here and in the USA it appears that others in government agree with that point of view. They clearly believe that the government should be able to read all communications.

I think in a fair, open, democratic society, the answer must be "YES": private individuals must be able to communicate without risk of eavesdropping by government officials.

Secure Encryption is Not New

Firstly, there have ALWAYS been means of communication between people that the government cannot read. You might be able to intercept a letter, and read the words written on the piece of paper, but if the message is not in the words as they appear, then you cannot read it.

Ciphers which could not be cracked by contemporary eavesdroppers have been used since at least the time of the Roman Empire. New technology merely provides a new set of such ciphers.

For example, the proper use of a one-time pad provides completely secure encryption. This technique has been in use since 1882, if not earlier.

Other technology in widespread use today "merely" makes it exceedingly difficult to break the cipher, requiring hundreds, thousands or even millions of years to crack with a brute-force method. These time periods are enough that these ciphers can be considered uncrackable for all intents and purposes.

Consequently, governments are powerless to actually prevent communication that cannot be read by the security services. All that can be done is to make it hard for the average citizen to use such communication.

Terrorists are Criminals

By their very nature, terrorists are criminals: terrorist acts themselves are illegal, and even the possession of the weapons used for the terrorist acts is often also illegal.

Therefore, terrorists will not be put off from using secure communication just because that too is illegal.

In particular, criminal organisations will not think twice about using whatever means is available to ensure that their communications are private: if something gives them an edge of the police or anyone else who would seek to stop them, they will use it.

Society relies on secure encryption

Secure encrypted communication doesn't just prevent government agencies reading the communications of criminals, it also prevents criminals reading the communications of ordinary citizens.

This website, in common with an increasingly large number of websites, uses HTTPS for all traffic. When properly configured this means that someone intercepting the website traffic cannot identify which pages on the website you visited, or extract any of the data sent by you as a visitor to the website, or by the website back to you.

This is crucial for facilities such as online banking: it prevents computer criminals from obtaining passwords and account data by intercepting the communications between you and your bank. If such communications could not be relied upon to be secure then online banking would not be viable, as the potential for fraud due to stolen passwords would be too great.

Likewise, many businesses use secure Virtual Private Networks (VPNs) which rely on secure encryption to transfer data between computers that are only connected to each other via the internet. This allows them to securely transfer data between sites, or between remote workers, without worrying about the communications being intercepted by criminals. Without secure encryption, many large multi-national businesses would be hugely impacted, as they wouldn't be able to rely on transferring data across the internet safely, and would instead have to rely on physical transfer via courier.

A "back door" or "government secret key" destroys the security of encryption

Some of the proposals from politicians have been to require that companies that provide encryption services must also provide a means whereby government security services can also decrypt the communications if required.

This requires that either (a) the company in question keeps a database of all the encryption/decryption keys used for all communications, or (b) the encryption algorithm used allows for decryption via a "back door" or "secret key" in addition to the standard decryption key, so that the government security services can gain access if required, without needing to know the customer's decryption key.

Keeping a database of the decryption keys just provides a direct target for attack by computer criminals. Once such a database is breached, none of the communications provided by that company can be considered secure. This is clearly not a good state of affairs, given the number of times that password databases get compromised.

That leaves option (b): providing a "back door" or "secret key", or other means whereby an otherwise-encrypted communication can be read by the security services. However, this fundamentally compromises that encryption.

Knowing that the back door exists, criminal computer crackers will work to ensure that they too can gain access to the communication, and they won't wait for a warrant from the Home Secretary or whatever government department is responsible for issuing such warrants! Any such group that does manage to obtain access would probably not make it public knowledge, they would merely use it to ensure that they could access communications that were relevant to them, whether that was because they had a direct use for the information, or because it could be sold to other criminal organisations.

If there is a single key that can decrypt all communication using a given system then that dramatically reduces the computation effort required to break the key: the larger the number of messages that are transmitted with a given key, the easier it is to identify the key, especially if you have access to the raw unencrypted message. The huge volume of electronic communications in use today would mean that the secret back door key would be much more readily compromised than any individual encryption key.

Privacy is a human right

The Universal Declaration of Human Rights was adopted by the UN in 1948. Article 12 states:

No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honour and reputation. Everyone has the right to the protection of the law against such interference or attacks.

Secure encrypted communication protects our correspondence from interference, including interference by the government.

Restricting the use of encryption is also a violation of the right to freedom of expression, guaranteed to us by article 19 of the Universal Declaration of Human Rights:

Everyone has the right to freedom of opinion and expression; this right includes freedom to hold opinions without interference and to seek, receive and impart information and ideas through any media and regardless of frontiers.

The restriction on our freedom of expression is easy to see: if I have true freedom of expression then I can impart any series of letters or numbers to anyone without interference. If that series of letters or numbers happens to be an encrypted message then that is of no consequence. Any attempt to limit the use of particular encryption algorithms therefore limits my ability to send whatever message I like, since particular sequences of letters and numbers are outlawed purely because of their meaning.

Human rights organisations such as Amnesty International use secure encrypted communications to communicate with their workers. If those communications could not be secured against interference then this would have a detrimental impact on their ability to do their humanitarian work, and could endanger their workers.

Encryption is mathematics

Computer encryption is just a mathematical algorithm applied to a series of numbers. It is ridiculous to consider that performing mathematical operations on a sequence of numbers could be outlawed merely because that sequence of numbers has meaning to someone.

End note

I strongly object to any move to restrict the use of encryption technology. It is technologically and morally unsound, with little or no upside and considerable downsides.

I urge politicians to likewise oppose any moves to restrict the use of encryption technology, and I urge those standing in the elections in the UK this week to make it known to their potential constituents that they will oppose such measures.

Finally, I think we should be encouraging the use of strong encryption rather than discouraging it, to protect us from those who would intercept our digital communication and use that for their gain and our detriment.

Posted by Anthony Williams
[/ general /] permanent link
Tags: , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Numbers in Javascript

Wednesday, 25 March 2015

I've been playing around with Javascript (strictly, ECMAScript) in my spare time recently, and one thing that I've noticed is that numbers are handled slightly strangely. I'm sure that many experienced Javascript programmers will just nod sagely and say "everyone knows that", but I've been using Javascript for a while and not encountered this strangeness before as I've not done extensive numerical processing, so I figured it was worth writing down.

Numbers are floating point

For the most part, Javascript numbers are floating point numbers. In particular, they are standard IEEE 754 64-bit double-precision numbers. Even though the IEEE spec allows for multiple NaN (Not-a-number) values, Javascript has exactly one NaN value, which can be referenced in code as NaN.

This has immediate consequences: there are upper and lower limits to the stored value, and numbers can only have a certain precision.

For example, 10000000000000001 cannot be represented in Javascript. It is the same value as 10000000000000000.

var x=10000000000000000;
if(x==(x+1))
    alert("Oops");

This itself isn't particularly strange: one of the first things you learn about Javascript is that it has floating-point numbers. However, it's something that you need to bear in mind when trying to do any calculations involving very big numbers (larger than 9007199254740992 in magnitude) or where more than 53 bits of precision is needed (since IEEE 754 numbers have binary exponents and mantissas).

You might think that you don't need the precision, but you quickly hit problems when using decimal fractions:

var x=0.2*0.3-0.01;
if(x!=0.05)
    alert("Oops");

The rounding errors in the representations of the decimal fractions here mean that the value of x in this example is 0.049999999999999996, not 0.05 as you would hope.

Again, this isn't particularly strange, it's just an inherent property of the numbers being represented as floating point. However, what I found strange is that sometimes the numbers aren't treated as floating point.

Numbers aren't always floating point

Yes, that's right: Javascript numbers are sometimes not floating point numbers. Sometimes they are 32-bit signed integers, and very occasionally 32-bit unsigned integers.

The first place this happens is with the bitwise operators (&, |, ^): if you use one of these then both operands are first converted to a 32-bit signed integer. This can have surprising consequences.

Look at the following snippet of code:

var x=0x100000000; // 2^32
console.log(x);
console.log(x|0);

What do you expect it to do? Surely x|0 is x? You might be excused for thinking so, but no. Now, x is too large for a 32-bit integer, so x|0 forces it to be taken modulo 2^32 before converting to a signed integer. The low 32-bits are all zero, so now x|0 is just 0.

OK, what about this case:

var x=0x80000000; // 2^31
console.log(x);
console.log(x|0);

What do you expect now? We're under 2^32, so there's no dropping of higher order bits, so surely x|0 is x now? Again, no. x|0 in this case is -x, because x is first converted to a signed 32-bit integer with 2s complement representation, which means the most-significant bit is the sign bit, so the number is negative.

I have to confess, that even with the truncation to 32-bits, the use of signed integers for bitwise operations just seems odd. Doing bitwise operations on a signed number is a very unusual case, and is just asking for trouble, especially when the result is just a "number", so you can't rely on doing further operations and having them give you the result you would expect on a 32-bit integer value.

For example, you might want to mask off some bits from a value. With normal 2s complement integers, x-(x&mask) is the same as x&~mask: in both cases, you're left with the bits set in x that were not set in mask. With Javascript, this doesn't work if x has bit 31 set.

var x=0xabcdef12;
var mask=0xff;
console.log(x-(x&mask));
console.log(x&~mask);

If you truncate back to 32-bits with x|0 then the values are indeed the same, but it's easy to forget.

Shifting bits

In languages such as C and C++, x<<y is exactly the same as x*(1<<y) if x is an integer. Not so in Javascript. If you do a bitshift operation (<<, >>, or >>>) then Javascript again converts your value to a signed integer before and after the operation. This can have surprising results.

var x=0xaa;
console.log(x);
console.log(x<<24);
console.log(x*(1<<24));

x<<24 converts x to a signed 32-bit integer, bit-shifts the value as a signed 32-bit integer, and then converts that result back to a Number. In this case, x<<24 has the bit pattern 0xaa000000, which has the highest bit set when treated as 32-bit, so is now a negative number with value -1442840576. On the other hand, 1<<24 does not have the high bit set, so is still positive, so x*(1<<24) is a positive number, with the same value as 0xaa000000.

Of course, if the result of shifting would have more than 32 bits then the top bits are lost: 0xaa<<25 would be truncated to 0x54000000, so has the value 1409286144, rather than the 5704253440 that you get from 0xaa*(1<<25).

Going right

For right-shifts, there are two operators: >> and >>>. Why two? Because the operands are converted to signed numbers, and the two operators have different semantics for negative operands.

What is 0x80000000 shifted right one bit? That depends. As an unsigned number, right shift is just a divide-by-two operation, so the answer is 0x40000000, and that's what you get with the >>> operator. The >>> operator shifts in zeroes. On the other hand, if you think of this as a negative number (since it has bit 31 set), then you might want the answer to stay negative. This is what the >> operator does: it shifts in a 1 into the new bit 31, so negative numbers remain negative.

As ever, this can have odd consequences if the initial number is larger than 32 bits.

var x=0x280000000;
console.log(x);
console.log(x>>1);
console.log(x>>>1);

0x280000000 is a large positive number, but it's greater than 32-bits long, so is first truncated to 32-bits, and converted to a signed number. 0x280000000>>1 is thus not 0x140000000 as you might naively expect, but -1073741824, since the high bits are dropped, giving 0x80000000, which is a negative number, and >> preserves the sign bit, so we have 0xc0000000, which is -1073741824.

Using >>> just does the truncation, so it essentially treats the operand as an unsigned 32-bit number. 0x280000000>>>1 is thus 0x40000000.

If right shifts are so odd, why not just use division?

Divide and conquer?

If you need to preserve all the bits, then you might think that doing a division instead of a shift is the answer: after all, right shifting is simply dividing by 2^n. The problem here is that Javascript doesn't have integer division. 3/2 is 1.5, not 1. You're therefore looking at two floating-point operations instead of one integer operation, as you have to discard the fractional part either by removing the remainder beforehand, or by truncating it afterwards.

var x=3;
console.log(x);
console.log(x/2);
console.log((x-(x%2))/2);
console.log(Math.floor(x/2));

Summary

For the most part, Javascript numbers are double-precision floating point, so need to be treated the same as you would floating point numbers in any other language.

However, Javascript also provides bitwise and shift operations, which first convert the operands to 32-bit signed 2s-complement values. This can have surprising consequences when either the input or result has a magnitude of more than 2^31.

This strikes me as a really strange choice for the language designers to make: doing bitwise operations on signed values is a really niche feature, whereas many people will want to do bitwise operations on unsigned values.

As browser Javascript processors get faster, and with the rise of things like Node.js for running Javascript outside a browser, Javascript is getting used for far more than just simple web-page effects. If you're planning on using it for anything involving numerical work or bitwise operations, then you need to be aware of this behaviour.

Posted by Anthony Williams
[/ javascript /] permanent link
Tags: ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Firefox is losing market share to Chrome

Monday, 09 March 2015

I read with interest an article on Computerworld about Firefox losing market share, wondering what people were using instead. Unsurprisingly, the answer seems to be Chrome: apparently Chrome now has a 27.6% share compared to Firefox's 11.8%. That's quite a big difference.

I checkout out the stats for this site for February 2015, and the figures bear it out: 30.7% of visitors use Chrome vs 14.9% Firefox and 12.8% Safari. Amusingly, 3.1% of visitors still use IE6!

What I did find interesting is the version numbers people are using: there were visitors using every version of Chrome from version 2 to version 43, and the same for Firefox — someone was even using Firefox 0.10! I'm a bit surprised by this, as I'd have thought that users of these browsers were probably amongst the most likely to upgrade.

Why the drop?

The big question of course is why the shift? I switched to Firefox because Internet Explorer was poor, and I've stuck with it, mainly through inertia, but I've used other browsers over the years, and still prefer Firefox. I've got Chrome installed on my desktop, but I don't particularly like it, and only really use it for cross-browser testing. I only really use it on my tablets, where it is the only browser I have installed — I tried Firefox for Android and was really disappointed.

Maybe that's the cause of the shift: everyone is using mobile devices for browsing, and Chrome/Safari are better than the others for mobile.

Which browser(s) do you use, and why?

Posted by Anthony Williams
[/ general /] permanent link
Tags: , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

just::thread C++11 and C++14 Thread Library V2.1 released

Tuesday, 03 March 2015

I am pleased to announce that version 2.1 of just::thread, our C++11 and C++14 Thread Library has just been released with support for new compilers.

This release adds the long-awaited support for gcc 4.8 on MacOSX, as well as bringing linux support right up to date with support for gcc 4.9 on Ubuntu and Fedora.

Just::Thread is now supported for the following compilers:

  • Microsoft Windows XP and later:
    • Microsoft Visual Studio 2005, 2008, 2010, 2012 and 2013
    • TDM gcc 4.5.2, 4.6.1 and 4.8.1
  • Debian and Ubuntu linux (Ubuntu Jaunty and later)
    • g++ 4.3, 4.4, 4.5, 4.6, 4.7, 4.8 and 4.9
  • Fedora linux
    • Fedora 13: g++ 4.4
    • Fedora 14: g++ 4.5
    • Fedora 15: g++ 4.6
    • Fedora 16: g++ 4.6
    • Fedora 17: g++ 4.7.2 or later
    • Fedora 18: g++ 4.7.2 or later
    • Fedora 19: g++ 4.8
    • Fedora 20: g++ 4.8
    • Fedora 21: g++ 4.9
  • Intel x86 MacOSX Snow Leopard or later
    • MacPorts g++ 4.3, 4.4, 4.5, 4.6, 4.7 and 4.8

Get your copy of Just::Thread

Purchase your copy and get started with the C++11 and C++14 thread library now.

Posted by Anthony Williams
[/ news /] permanent link
Tags: , , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Using Enum Classes as Bitfields

Thursday, 29 January 2015

C++11 introduced a new feature in the form of scoped enumerations, also referred to as enum classes, since they are introduced with the double keyword enum class (though enum struct is also permissible, to identical effect). To a large extent, these are like standard enumerated types: you can declare a list of enumerators, which you may assign explicit values to, or which you may let the compiler assign values to. You can then assign these values to variables of that type. However, they have additional properties which make them ideal for use as bitfields. I recently answered a question on the accu-general mailing list about such a use, so I thought it might be worth writing a blog post about it.

Key features of scoped enumerations

The key features provided by scoped enumerations are:

  • The enumerators must always be prefixed with the type name when referred to outside the scope of the enumeration definition. e.g. for a scoped enumeration colour which has an enumerator green, this must be referred to as colour::green in the rest of the code. This avoids the problem of name clashes which can be common with plain enumerations.
  • The underlying type of the enumeration can be specified, to allow forward declaration, and avoid surprising consequences of the compiler's choice. This is also allowed for plain enum in C++11. If no underlying type is specified for a scoped enumeration, the underlying type is fixed as int. The underlying type of a given enumeration can be found using the std::underlying_type template from the <type_traits> header.
  • There is no implicit conversion to and from the underlying type, though such a conversion can be done explicitly with a cast.

This means that they are ideal for cases where there is a limited set of values, and there are several such cases in the C++ Standard itself: std::errc, std::pointer_safety, and std::launch for example. The lack of implicit conversions are particularly useful here, as it means that you cannot pass raw integers such as 3 to a function expecting a scoped enumeration: you have to pass a value of the enumeration, though this is of course true for unscoped enumerations as well. The lack of implicit conversions to integers does mean that you can overload a function taking a numeric type without having to worry about any potential ambiguity due to numeric conversion orderings.

Bitmask types

Whereas the implicit conversions of plain enumerations mean that expressions such as red | green and red & green are valid if red and green are enumerators, the downside is that red * green or red / green are equally valid, if nonsensical. With scoped enumerations, none of these expressions are valid unless the relevant operators are defined, which means you can explicitly define what you want to permit.

std::launch is a scoped enumeration that is also a bitmask type. This means that expressions such as std::launch::async | std::launch::deferred and std::launch::any & std::launch::async are valid, but you cannot multiply or divide launch policies. The requirements on such a type are defined in section 17.5.2.1.3 [bitmask.types] of the C++ Standard, but they amount to providing definitions for the operators |, &, ^, ~, |=, &= and ^= with the expected semantics.

The implementation of these operators is trivial, so it is easy to create your own bitmask types, but having to actually define the operators for each bitmask type is undesirable.

Bitmask operator templates

These operators can be templates, so you could define a template for each operator, e.g.

    template<typename E>
    E operator|(E lhs,E rhs){
        typedef typename std::underlying_type<E>::type underlying;
        return static_cast<E>(
            static_cast<underlying>(lhs) | static_cast<underlying>(rhs));
    }

Then you could write mask::x | mask::y for some enumeration mask with enumerators x and y. The downside here is that it is too greedy: every type will match this template. Not only would you would be able to write std::errc::bad_message | std::errc::broken_pipe, which is clearly nonsensical, but you would also be able to write "some string" | "some other string", though this would give a compile error on the use of std::underlying_type, since it is only defined for enumerations. There would also be potential clashes with other overloads of operator|, such as the one for std::launch.

What is needed is a constrained template, so only those types which you want to support the operator will match.

SFINAE to the rescue

SFINAE is a term coined by David Vandevoorde and Nicolai Josuttis in their book C++ Templates: The Complete Guide. It stands for "Substitution Failure is Not an Error", and highlights a feature of expanding function templates during overload resolution: if substituting the template parameters into the function declaration fails to produce a valid declaration then the template is removed from the overload set without causing a compilation error.

This is a key feature used to constrain templates, both within the C++ Standard Library, and in many other libraries and application code. It is such a key feature that the C++ Standard Library even provides a library facility to assist with its use: std::enable_if.

We can therefore use it to constain our template to just those scoped enumerations that we want to act as bitmasks.

    template<typename E>
    struct enable_bitmask_operators{
        static constexpr bool enable=false;
    };

    template<typename E>
    typename std::enable_if<enable_bitmask_operators<E>::enable,E>::type
    operator|(E lhs,E rhs){
        typedef typename std::underlying_type<E>::type underlying;
        return static_cast<E>(
            static_cast<underlying>(lhs) | static_cast<underlying>(rhs));
    }

If enable_bitmask_operators<E>::enable is false (which it is unless specialized) then std::enable_if<enable_bitmask_operators<E>::enable,E>::type will not exist, and so this operator| will be discarded without error. It will thus not compete with other overloads of operator|, and the compilation will fail if and only if there are no other matching overloads. std::errc::bad_message | std::errc::broken_pipe will thus fail to compile, whilst std::launch::async | std::launch::deferred will continue to work.

For those types that we do want to work as bitmasks, we can then just specialize enable_bitmask_opoerators:

    enum class my_bitmask{
        first=1,second=2,third=4
    }:
    template<>
    struct enable_bitmask_operators<my_bitmask>{
        static constexpr bool enable=true;
    };

Now, std::enable_if<enable_bitmask_operators<E>::enable,E>::type will exist when E is my_bitmask, so this operator| will be considered by overload resolution, and my_bitmask::first | my_bitmask::second will now compile.

Final code

The final code is available as a header file along with a simple example demonstrating its use. It has been tested with g++ 4.7, 4.8 and 4.9 in C++11 mode, and with MSVC 2012 and 2013, and is released under the Boost Software License.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Gotchas Upgrading Apache from 2.2 to 2.4

Wednesday, 10 December 2014

I finally got round to upgrading one of my servers from Ubuntu 12.04 (the previous LTS release) to Ubuntu 14.04 (the latest LTS release). One of the consequences of this is that Apache gets upgraded from 2.2 to 2.4. Sadly, the upgrade wasn't as smooth as I'd hoped, so I'm posting this here in case anyone else has the same problem that I did.

Oh no! It's all broken!

It's nice to upgrade for various reasons — not least of which being the support benefits of being on the most recent LTS release — except after the upgrade, several of the websites hosted on the server stopped working. Rather than getting the usual web page, they just returned an "access denied" error page, and in the case of the https pages they just returned an SSL error. This was not good, and led to much frantic checking of config files.

After verifying that all the config files for all the sites were indeed correct, I finally resorted to googling the problem. It turns out that the default apache2.conf file was being used, as all the "important" config was in the module config files, or the site config files, so the upgrade had just replaced it with the new one.

Whereas the old v2.2 default config file has the line

Include sites-enabled/

The new v2.4 default config file has the line

IncludeOptional sites-enabled/*.conf

A Simple Fix

This caused problems with my server because many of the config files were named after the website (e.g. example.com) and did not have a .conf suffix. Renaming the files to e.g example.com.conf fixed the problem, as would have changing the line in apache2.conf so it didn't force the suffix.

Access Control

The other major change is to the access control directives. Old Allow and Deny directives are replaced with new Require directives. The access_compat module is intended to allow the old directives to work as before, but it's probably worth checking if you use any in your website configurations.

Exit Stage Left

Thankfully, all this was happening on the staging server, so the websites weren't down while I investigated. Testing is important — what was supposed to be just a simple upgrade turned out not to be, and without a staging server the websites would have been down for the duration.

Posted by Anthony Williams
[/ general /] permanent link
Tags:

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Migrating to https

Thursday, 30 October 2014

Having intended to do so for a while, I've finally migrated our main website to https. All existing links should still work, but will redirect to the https version of the page.

Since this whole process was not as simple as I'd have liked it to be, I thought I'd document the process for anyone else who wants to do this. Here's the quick summary:

  1. Buy an SSL certificate
  2. Install it on your web server
  3. Set up your web server to serve https as well as http
  4. Ensure that any external scripts or embedded images used by the website are retrieved with https
  5. Redirect the http version to https

Now let's dig in to the details.

1. Buy an SSL certificate

For testing purposes, a self-signed certificate works fine. However, if you want to have people actually use the https version of your website then you will need to get a certificate signed by a recognized certificate authority. If you don't then your visitors will be presented with a certificate warning when they view the site, and will probably go somewhere else instead.

Untrusted Certificate

The SSL certificate used on https://www.justsoftwaresolutions.co.uk was purchased from GarrisonHost, but there are plenty of other certificate providers available.

Purchasing a certificate is not merely a matter of entering payment details on a web form. You may well need to provide proof of who you are and/or company registration certificates in order to get the purchase approved. Once that has happened, you will need to get your certificate signed and install it on your web server.

2. Install the SSL certificate on your web server

In order to install the certificate on your web server, it first has to be signed by the certification authority so that it is tied to your web server. This requires a Certificate Signing Request (CSR) generated on your web server. With luck, your certification provider will give you nice instructions. In most cases, you're probably looking at the openssl req command, something like:

openssl req -new -newkey rsa:2048 -nodes -out common.csr \
-keyout common.key \
-subj "/C=GB/ST=Your State or County/L=Your City/O=Your Company/OU=Your \
Department/CN=www.yourcompany.com"

This will give you a private key (common.key) and a CSR file (common.csr). Keep the private key private, since this is what identifies the web server as your web server, and give the CSR file to your certificate provider.

Your certificate provider will then give you a certificate file, which is your web server certificate, and possibly a certificate chain file, which provides the signing chain from your certificate back to one of the widely-known root certificate providers. The certificate chain file will be identical for anyone who purchased a certificate signed by the same provider.

You now need to put three files on your web server:

  • your private key file,
  • your certificate file, and
  • the certificate chain file.

Ensure that the permissions on these only allow the user running the web server to access them, especially the private key file.

You now need to set up your web server to use them.

3. Set up your web server to serve https as well as http

I'm only going to cover apache here, since that's what is used for https://www.justsoftwaresolutions.co.uk; if you're using something else then you'll have to check the documentation.

Firstly, you'll need to ensure that mod_ssl is installed and enabled. Run

sudo a2enmod ssl

on your web server. If it complains that "module ssl does not exist" then follow your platform's documentation to get it installed and try again. On Ubuntu it is part of the basic apache installation.

Now you need a new virtual host file for https. Create one in the sites-available directory with the following contents:

    <IfModule mod_ssl.c>
    <VirtualHost *:443>
    ServerAdmin webmaster@yourdomain.com
    ServerName yourdomain.com
    ServerAlias www.yourdomain.com

    SSLEngine On
    SSLCertificateFile /path/to/certificate.crt
    SSLCertificateKeyFile /path/to/private.key
    SSLCertificateChainFile /path/to/certificate/chain.txt

    # Handle shutdown in broken browsers
    BrowserMatch "MSIE [2-6]" \
            nokeepalive ssl-unclean-shutdown \
            downgrade-1.0 force-response-1.0
    BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown

    DocumentRoot /path/to/ssl/host/files
    <Directory "/path/to/ssl/host/files">
    # directory-specific apache directives
    </Directory>

    # Pass SSL_* environment variables to scripts
    <FilesMatch "\.(cgi|shtml|phtml|php)$">
            SSLOptions +StdEnvVars
    </FilesMatch>

    </VirtualHost>
    </IfModule>

This is a basic configuration: you'll also want to ensure that any configuration directives you need for your website are present.

You'll also want to edit the config for mod_ssl. Open up mods-available/ssl.conf from your apache config directory, and find the SSLCipherSuite, SSLHonorCipherOrder and SSLProtocol directives. Update them to the following:

    SSLHonorCipherOrder on
    SSLCipherSuite ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS
    SSLProtocol all -SSLv2 -SSLv3
    SSLCompression off

This disables the protocols and ciphers that are known to be insecure at the time of writing. If you're reading this much after publication, or wish to be certain, please do a web search for the latest secure cipher lists, or check a resource such as https://wiki.mozilla.org/Security/Server_Side_TLS.

After all these changes, restart the web server:

sudo apache2ctl restart

You should now be able to visit your website with https:// instead of http:// as the protocol. Obviously, the content will only be the same if you've set it up to be.

Now the web server is running, you can check the security using an SSL checker like https://sslcheck.globalsign.com, which will test for available cipher suites, and ensure that your web server is not vulnerable to known attacks.

Now you need to ensure that everything works correctly when accessed through https. One of the big issues is embedded images and external scripts.

4. Ensure that https is used for everything on https pages

If you load a web page with https, and that page loads images or scripts using http, your browser won't be happy. At the very least, the nice "padlock" icon that indicates a secure site will not show, but you may get a popup, and the insecure images or scripts may not load at all. None of this leads to a nice visitor experience.

It is therefore imperative that on a web page viewed with https all images and scripts are loaded with https.

The good news is that relative URLs inherit the protocol, so an image URL of "/images/foo.png" will use https on an https web page. The bad news is that on a reasonably sized web site there's probably quite a few images and scripts with full URLs that specify plain http. Not least, because things like blog entries that may be read in a feed reader often need to specify full URLs for embedded images to ensure that they show up correctly in the reader.

If all the images and scripts are on servers you control, then the you can ensure that those servers support https (with this guide), and then switch to https in the URLs for those resources. For servers outside your control, you need to check that https is supported, which can be an issue.

Aside: you could make the URLs use https on https pages and http on http pages by omitting the protocol, so "http://example.com/images/foo.png" would become "//example.com/images/foo.png". However, using https on plain http pages is fine, and it is generally better to use https where possible. It's also more straightforward.

If the images or scripts are on external servers which you do not control, and which do not support https then you can use a proxy wrapper like camo to avoid the "insecure content" warnings. However, this still requires changing the URLs.

For static pages, you can do a simple search and replace, e.g.

sed -i -e 's/src="http:/src="https:/g' *.html

However, if your pages are processed through a tool like MarkDown, or stored in a CMS then you might not have that option. Instead, you'll have to trawl through the links manually, which could well be rather tedious. There are websites that will tell you which items on a given page are insecure, and you can read the warnings in your browser, but you've still got to check each page and edit the URLs manually.

While you're doing this, it's as well to check that everything else works correctly. I found that a couple of aspects of the blog engine needed adjusting to work correctly with https due to minor changes in the VirtualHost settings.

When you've finally done that, you're ready to permanently switch to https.

5. Redirect the http version to https

This is by far the easiest part of the whole process. Open the apache config for the plain http virtual host and add one line:

Redirect permanent / https://www.yourdomain.com

This will redirect http://www.yourdomain.com/some/path to https://www.yourdomain.com/some/path with a permanent (301) redirect. All existing links to your web site will now redirect to the equivalent page with https.

When you've done that then you can also enable Strict Transport Security. This ensures that when someone connects to your website then they get a header that says "always use https for this site". This prevents anyone intercepting plain http connections (e.g. on public wifi) and attacking your visitors that way.

You do this by enabling mod_headers, and then updating the https virtual host. Run the following on your web server:

sudo a2enmod headers

and then add the following line to the virtual host file you created above for https:

Header always set Strict-Transport-Security "max-age=63072000; includeSubDomains"

Then you will need to restart apache:

sudo apache2ctl restart

This will ensure that any visitor that now visits your website will always use https if they visit again within 2 years from the same computer and browser. Every time they visit, the clock is reset for another 2 years.

You can probably delete much of the remaining settings from this virtual host config, since everything is being redirected, but there is little harm in leaving it there for now.

All done

That's all there is to it. Relatively straightforward, but some parts are more involved than one might like.

Posted by Anthony Williams
[/ general /] permanent link
Tags: , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Test Driving User Interfaces

Friday, 24 October 2014

User interfaces are generally considered one of the harder things to develop with Test-Driven Development (TDD). So much so that a common recommendation is to make your user interface code as "thin" as possible, and test the layer behind the UI rather than the UI itself.

This is perfectly sound advice if you truly can get your thin UI layer so simple that it couldn't possibly break. It's also great if it means that code is being tested that previously wouldn't have been — any testing is better than no testing. However, if your UI layer is more than just a few simple controls with minimal behaviour then doing this properly requires that the "thin" UI layer actually ends up quite complex, as it has to pass through all the events generated by the UI, as well as provide facilities for any UI changes made through the UI API. At this point, testing behind the UI leaves quite a lot of complex code untested, and thus a prime breeding ground for bugs.

UI Testing with External Tools

One way to test the UI is to drive it with external testing tools. There are plenty of these around --- Wikipedia has a whole page of GUI testing tools like Rational Robot and Selenium.

In my experience, these tools are great for acceptance tests and testing the whole application through the UI to ensure that everything ties together correctly. More teams should use tools like this rather than testing manually, reserving the skill of their human tests for finding bugs in the edge cases rather than mindlessly clicking through the test script to ensure that the code works in the precise way defined by the test. That's what these tools are good at, so use them.

However, they don't really work for TDD precisely because they are external tools that drive the UI. For test-driving the UI code we need to be able to isolate just the UI layer, and ensure that it sends the right commands to the rest of the code, and is updated correctly when the rest of the code calls the provided API functions.

Test-driving the UI Layer

The best way to test drive the UI layer is to drive it from a test function written in the same language. If you're test-driving a JavaScript UI, your tests should be in JavaScript; if you're test-driving a C++ UI, your tests should be in C++.

You can use a test framework, but you don't have to. I generally find that the tests are easier to read and write if you use a framework, but for getting started it can be easier just to write some tests without a framework. The hard part is actually designing your code to support testing in this way: you need to introduce a seam between the UI code and the rest of the application, so that you can intercept calls to the backend in the tests. This is good software engineering anyway — cleanly separating concerns so the UI does UI stuff and only UI stuff — but it's not always the easiest direction to go at first.

A JavaScript Example

Suppose I'm testing a web app written in JavaScript using JQuery. Part of the app does some form of AJAX-based search — the user enters a search term in an edit box and clicks "search", and the app then does a search in the background and displays the results.

Here's our minimal HTML fragment:

Results

    <form class="search-form">
    <p><label for="search-term">Search Term:</label>
        <input name="search-term" id="search-term" type="text"></p>
    <button class="search-submit">Search</button>
    <h3>Results</h3>
    <div class="results">
    </div>
    </form>

For testing this, we need two things:

  • firstly, we need to be able to create the HTML fragment in our test, so we don't need to have a whole page dedicated for each test, and
  • secondly, we need to be able to trap the AJAX call: we're testing the JavaScript, not the full stack.

The first requirement means that we need a way of attaching our handlers to the HTML at runtime; we can't just attach them manually in the $(document).ready() handler.

The second requirement means that the code under test need to call a function we supply to do the AJAX call, rather than calling $.ajax() or one of the helpers like $.post() or $.get().

Explicitly separating things out like this can be hard at first, but it does make things easier in the long term.

A first test: If we don't click, there's no AJAX request

So, let's write our first test: . Firstly, let's define the HTML snippet for our form:

    var search_form='<form class="search-form">'+
      '<p><label for="search-term">Search Term:</label>'+
      '<input name="search-term" id="search-term" type="text"></p>'+
      '<button class="search-submit">Search</button>'+
      '<h3>Results</h3>'+
      '<div class="results">'+
      '</div>'+
      '</form>';

Ideally, we'd like to load this from the same place it is defined on our website so it is kept up-to-date when we modify the form. I use a "fragments" directory for this sort of thing, and the main page is then assembled from the fragments in PHP. For now, we can define it directly in the test script.

Now, we define a simple test function. First, we clear out the body of the web page, and add the form. Then we create a dummy AJAX post function that just records the supplied data, and pass it to our form handler creation function.

This is all just setup, the test itself comes next: we check that our dummy function did not record an entry (no request made), and show an alert if it did.

    function test_when_search_button_is_not_clicked_ajax_request_not_sent(){
        var body=$('body');
        body.empty();
        body.append(search_form);
        var form=body.find('form');
        var posted_ajax=[];
        var post_ajax=function(url,data,handler){
            posted_ajax.push({url:url,data:data,handler:handler});
        }
        setup_search_form(form,post_ajax);
        if(posted_ajax.length !=0){
            alert("Bogus AJAX Posted");
            return false;
        }
        return true;
    }

We then need to a minimal page that loads JQuery, loads our tests, and then runs our function when the page is loaded, displaying "Success" if the test succeeded.

    <html>
    <script type="text/javascript" src="http://code.jquery.com/jquery-1.11.1.min.js"></script>
    <script type="text/javascript" src="tddui.js"></script>

    <script type="text/javascript">
        $(document).ready(function(){
            if(test_when_search_button_is_not_clicked_ajax_request_not_sent()){
                alert("Success");
            });
    </script>
    </html>

If you load this page then all you'll see is the search form; you won't see either alert. The code will fail because setup_search_form is not defined, which will show up as an error in your browser's error log. In Firefox with Firebug I get:

ReferenceError: setup_search_form is not defined
    setup_search_form(form,post_ajax);

Let's define a minimal setup_search_form function that does nothing, just so it all runs:

    function setup_search_form(form,ajax){}

Now if you refresh the page then you get a nice "Success" alert.

This test didn't do much, but we've now got our code set up it's easy to add a second test. Let's test some actual behaviour.

A second test: Clicking sends an AJAX request

OK, so no AJAX request is sent when you don't click. Not exactly rocket science, but it gets us a framework for our tests. Let's add some behaviour: when the button is clicked, then the code should send an AJAX request.

Our second test is almost identical to the first. The key part here is what we do after setting up the form. We're going to click on the button, and the default action for a form button is to submit the form, so we first set the target to # so we don't navigate off the page. Then we use JQuery to click the button, and check that the AJAX was actually posted:

    function test_when_search_button_is_clicked_ajax_request_sent(){
        var body=$('body');
        body.empty();
        body.append(search_form);
        var form=body.find('form');
        var posted_ajax=[];
        var post_ajax=function(url,data,handler){
            posted_ajax.push({url:url,data:data,handler:handler});
        }
        setup_search_form(form,post_ajax);
        form.attr('action','#');
        form.find('button').click();
        if(posted_ajax.length !=1){
            alert("No AJAX Posted");
            return false;
        }
        return true;
    }

We then need to run our new test too, so update the driver page:

        $(document).ready(function(){
            if(test_when_search_button_is_not_clicked_ajax_request_not_sent() &&
               test_when_search_button_is_clicked_ajax_request_sent())
                alert("Success");
            });

If you now load the test page then you'll see the "No AJAX Posted" alert. Let's fix the test with the simplest possible click handler:

    function setup_search_form(form,ajax){
        form.find('button').click(function(){
            ajax('',{},function(){});
            return false;
        });
    }

Refreshing the page will now get us back to the "Success" message.

It'd be nice to clear up the duplication between our tests, but first, let's get this test finished.

Finishing the second test: checking the AJAX request is correct

All we've checked so far is that an AJAX request is sent. We actually need to check that the right AJAX request is sent, so let's do that. Add some more checks after the first one:

        if(posted_ajax[0].url!='/ajax.php'){
            alert("Wrong AJAX URL");
            return false;
        }
        if(posted_ajax[0].data.request!='search'){
            alert("AJAX request is wrong");
            return false;
        }

If you refresh the test page, then you'll find that you now get an alert saying that the URL is wrong. If you fix that, then you'll get an alert complaining about the request. Let's fix both:

    function setup_search_form(form,ajax){
        form.find('button').click(function(){
            ajax('/ajax.php',{request:'search'},function(){});
            return false;
        });
    }

This gets us back to our nice "Success" alert. Let's now clean up that duplication.

Removing duplication between tests

These tests share a common setup, so let's refactor to extract that and simplify the code:

    function setup_search_test(){
        var test_data={};
        test_data.body=$('body');
        test_data.body.empty();
        test_data.body.append(search_form);
        test_data.form=test_data.body.find('form');
        test_data.posted_ajax=[];
        test_data.post_ajax=function(url,data,handler){
            test_data.posted_ajax.push({url:url,data:data,handler:handler});
        }
        setup_search_form(test_data.form,test_data.post_ajax);
        return test_data;
    }

    function test_when_search_button_is_not_clicked_ajax_request_not_sent(){
        var test_data=setup_search_test();
        if(test_data.posted_ajax.length !=0){
            alert("Bogus AJAX Posted");
            return false;
        }
        return true;
    }

    function test_when_search_button_is_clicked_ajax_request_sent(){
        var test_data=setup_search_test();
        test_data.form.attr('action','#');
        test_data.form.find('button').click();
        if(test_data.posted_ajax.length !=1){
            alert("No AJAX Posted");
            return false;
        }
        if(test_data.posted_ajax[0].url!='/ajax.php'){
            alert("Wrong AJAX URL");
            return false;
        }
        if(test_data.posted_ajax[0].data.request!='search'){
            alert("AJAX request is wrong");
            return false;
        }
        return true;
    }

We can verify that everything is still working by refreshing our test page: we still get the "Success" alert, so no problems.

Now let's add some more behaviour.

A third test: Extracting data from the UI

For our next test, let's do a bit more work with the UI. It's all very well having the search button send an AJAX request, but we want to actually search for the supplied term, so let's do that. Here's our new test:

    function test_when_search_button_is_clicked_search_term_in_ajax(){
        var test_data=setup_search_test();
        test_data.form.attr('action','#');
        var search_term="green widgets";
        test_data.form.find('#search-term').val(search_term);
        test_data.form.find('button').click();
        if(test_data.posted_ajax.length !=1){
            alert("No AJAX Posted");
            return false;
        }
        if(test_data.posted_ajax[0].data.term!=search_term){
            alert("AJAX search term is wrong");
            return false;
        }
        return true;
    }

And here's our updated driver code:

        $(document).ready(function(){
            if(test_when_search_button_is_not_clicked_ajax_request_not_sent() &&
               test_when_search_button_is_clicked_ajax_request_sent() &&
               test_when_search_button_is_clicked_search_term_in_ajax())
                alert("Success");
            });

If you refresh the page now you'll see the "search term is wrong" error message. You should also see our search term ("green widgets") in the search box. Let's fix the error:

    function setup_search_form(form,ajax){
        form.find('button').click(function(){
            ajax('/ajax.php',
                 {request:'search',
                  term:form.find('#search-term').val()},
                 function(){});
            return false;
        });
    }

Which brings us back to our "Success" alert.

OK, so that's a lot of test code for a simple function, but we know that if we change it in a way that affects something then we'll know, and we're still completely separate from the backend code.

Let's add some UI updates for while we're waiting for the result.

Test four: Updating the UI

Our first few tests have been focused on getting the AJAX request right. However, we want the user to know that something is happening when they make their request, so let's handle that. If the user clicks the search button, both it and the search term box should be disabled, and the results block should show a "searching for ..." message.

    function test_when_search_button_is_clicked_UI_updated_to_show_searching(){
        var test_data=setup_search_test();
        test_data.form.attr('action','#');
        var search_term="red widgets";
        test_data.form.find('#search-term').val(search_term);
        test_data.form.find('button').click();
        if(!test_data.form.find('button').prop("disabled") ||
           !test_data.form.find('#search-term').prop("disabled")){
            alert("UI not disabled");
            return false;
        }
        if(test_data.form.find('.results').text()!="Searching for "+search_term){
            alert("Results field has wrong content");
            return false;
        }
        return true;
    }

If we add that test to our driver code, then we'll get an alert complaining about the UI not being disabled. Easily fixed:

    function setup_search_form(form,ajax){
        form.find('button').click(function(){
            $(this).prop("disabled",true);
            var term_field=form.find('#search-term');
            var term=term_field.val();
            term_field.prop("disabled",true);
            form.find('.results').text("Searching for "+term);
            ajax('/ajax.php',
                 {request:'search',
                  term:term},
                 function(){});
            return false;
        });
    }

In a real web app, you might add some form of animation, but for now this will do. A more important feature is actually displaying the results when they come back. But first: more duplication.

Eliminating more duplication

Almost all the tests clear the form action field because they click on the button. Let's move that into the setup function:

    function setup_search_test(){
        var test_data={};
        test_data.body=$('body');
        test_data.body.empty();
        test_data.body.append(search_form);
        test_data.form=test_data.body.find('form');
        test_data.form.attr('action','#');
        test_data.posted_ajax=[];
        test_data.post_ajax=function(url,data,handler){
            test_data.posted_ajax.push({url:url,data:data,handler:handler});
        }
        setup_search_form(test_data.form,test_data.post_ajax);
        return test_data;
    }

Now on to test five.

Test five: The results are in!

Way back at test one we allowed the caller to supply a handler to the ajax call, which we duly recorded, but haven't used for anything. Now it's time to use it: we can call it from the test to indicate that the results of the AJAX call are back.

The set up is similar to what we've done before: enter a search time and click search:

    function test_results_of_search_go_in_results_div(){
        var test_data=setup_search_test();
        var search_term="red widgets";
        test_data.form.find('#search-term').val(search_term);
        test_data.form.find('button').click();
        if(test_data.posted_ajax.length !=1){
            alert("No AJAX Posted");
            return false;
        }

Now we need some results to pass to the handler:

        var result_data={
            results:[
                "red spinning widgets",
                "fast red widgets",
                "big red widgets"
            ]
        };

        test_data.posted_ajax[0].handler(result_data);

And then we check the results. In this case, we're verifying that the results are stored in a <UL> tag that is the sole element in the results block. The final check for "spurious text" ensures that we've removed the "searching for" text we added previously.

        var result_div=test_data.form.find('.results');
        if(result_div.children().length!=1){
            alert("Should be exactly one child in result div");
            return false;
        }
        if(result_div.find('ul').length!=1){
            alert("Results are an unordered list");
            return false;
        }
        var list_entries=result_div.find('ul li');
        if(list_entries.length!=result_data.results.length){
            alert("One list element per result entry");
            return false;
        }
        for(var i=0;i<list_entries.length;++i){
            var entry=$(list_entries[i]);
            if(entry.text()!=result_data.results[i]){
                alert("Result entry " + i + " is wrong");
                return false;
            }
        }
        if(result_div.text() != result_div.find('ul').text()){
            alert("Spurious text");
            return false;
        }

        return true;
    }

If you add the test to the driver page then it will fail: the results block has no children until we add some.

Let's make it pass by implementing the handler function:

    function setup_search_form(form,ajax){
        var results_field=form.find('.results');
        var handle_results=function(data){
            var result_list=$('<ul></ul>');
            for(var i=0;i<data.results.length;++i){
                var entry=$('<li>');
                entry.text(data.results[i]);
                result_list.append(entry);
            }
            results_field.empty();
            results_field.append(result_list);
        };

The rest is pretty much as before, except we pass in our new handler function to the ajax call:

        form.find('button').click(function(){
            $(this).prop("disabled",true);
            var term_field=form.find('#search-term');
            var term=term_field.val();
            term_field.prop("disabled",true);
            results_field.text("Searching for "+term);
            ajax('/ajax.php',
                 {request:'search',
                  term:term},
                 handle_results);
            return false;
        });
    }

And we're back at "Success".

The search button and search term box are still disabled though, so let's fix that.

Test six: Re-enabling form fields

When the results come back, we want our form fields to be re-enabled. That's easy to test for:

    function test_when_results_back_enable_fields(){
        var test_data=setup_search_test();
        var search_term="red widgets";
        test_data.form.find('#search-term').val(search_term);
        test_data.form.find('button').click();
        if(test_data.posted_ajax.length !=1){
            alert("No AJAX Posted");
            return false;
        }
        var result_data={
            results:[
                "red spinning widgets",
                "fast red widgets",
                "big red widgets"
            ]
        };

        test_data.posted_ajax[0].handler(result_data);
        if(test_data.form.find('button').prop("disabled") ||
           test_data.form.find('#search-term').prop("disabled")){
            alert("UI not re-enabled");
            return false;
        }

        return true;
    }

Add to the driver, and refresh to check to you get the "UI not re-enabled" message, and then we can fix it: update the ajax result handler to enable the controls.

    function setup_search_form(form,ajax){
        var term_field=form.find('#search-term');
        var results_field=form.find('.results');
        var submit_button=form.find('button');
        var handle_results=function(data){
            var result_list=$('<ul></ul>');
            for(var i=0;i<data.results.length;++i){
                var entry=$('<li>');
                entry.text(data.results[i]);
                result_list.append(entry);
            }
            results_field.empty();
            results_field.append(result_list);
            submit_button.prop("disabled",false);
            term_field.prop("disabled",false);
        };

        submit_button.click(function(){
            $(this).prop("disabled",true);
            var term=term_field.val();
            term_field.prop("disabled",true);
            results_field.text("Searching for "+term);
            ajax('/ajax.php',
                 {request:'search',
                  term:term},
                 handle_results);
            return false;
        });
    }

Which brings us back to "Success".

What we haven't yet handled is what to do when the AJAX call fails. This is where separating the UI from the back-end code really helps us out — it's exceedingly hard to engineer failure conditions when you're doing whole-system testing through the UI, but we can just trigger failure because we feel like it. So, let's do it.

Test seven: Failing AJAX calls

In order to simulate failure, we need a failure handler for our AJAX calls. So let's add a parameter for handling failures to our dummy AJAX function:

        test_data.post_ajax=function(url,data,handler,failure_handler){
            test_data.posted_ajax.push(
                {url:url,data:data,handler:handler,failure:failure_handler});
        }

In the test, rather than supplying results, we can invoke the failure handler. That's easily done, but what do we want the result to be?

The easiest thing for now is to put some form of error status in the results block, and re-enable the search controls, so let's do that:

    function test_ajax_failure(){
        var test_data=setup_search_test();
        var search_term="red widgets";
        test_data.form.find('#search-term').val(search_term);
        test_data.form.find('button').click();
        if(test_data.posted_ajax.length !=1){
            alert("No AJAX Posted");
            return false;
        }
        if(!test_data.posted_ajax[0].failure){
            alert("No failure handler specified");
            return false;
        }
        test_data.posted_ajax[0].failure(404,"Not Found","");
        var result_div=test_data.form.find('.results');
        if(result_div.text()!="Unable to retrieve search results: error 404 (Not Found)"){
            alert("Result text is wrong");
            return false;
        }
        if(test_data.form.find('button').prop("disabled") ||
           test_data.form.find('#search-term').prop("disabled")){
            alert("UI not re-enabled");
            return false;
        }
        return true;
    }

Adding this to our test driver should give us a "no failure handler" error. This is easily fixed by adding a handler to our form setup:

        var handle_failure=function(status,error_text,response_data){
            results_field.empty();
            results_field.text(
                "Unable to retrieve search results: error " + status + " ("+error_text+")");
            submit_button.prop("disabled",false);
            term_field.prop("disabled",false);
        }

        submit_button.click(function(){
            $(this).prop("disabled",true);
            var term=term_field.val();
            term_field.prop("disabled",true);
            results_field.text("Searching for "+term);
            ajax('/ajax.php',
                 {request:'search',
                  term:term},
                 handle_results,
                 handle_failure);
            return false;
        });

Which brings us back to the familiar "Success" message.

I'll leave the example there. If this was part of a real web app then there's lots more that would need to be done, along with corresponding tests, but for our simple example this will suffice. I hope you can see how this could be extended to test other scenarios.

Check out the final driver page and JavaScript code for this example.

A real ajax function for this code

This code relies on an ajax function to request data from the server, which we have mocked out in the tests. Here is a simple implementation that uses JQuery's post() function:

    function jquery_ajax(url,data,handler,failure_handler){
        $.post(url,data,handler).fail(function(xhr){
            if(failure_handler){
                failure_handler(xhr.status,xhr.statusText,xhr.responseText);
            }
        });
    }

This could then be passed to our setup_search_form function in live code to make real AJAX requests.

Test frameworks

This example doesn't use any external code except JQuery, just to show how easy it is to get started, but there are plenty of test frameworks available that make it easier to write tests, or view the results. Personally, I like QUnit for JavaScript, but use whatever takes your fancy. A test framework will generally record how many of your tests passed or failed, rather than using alert() as we have here. They also tend to offer various checks like assertEquals(), or assertLessThan() which will record the supplied parameters as well as marking the test fail. This can make it easier to work out what went wrong if a test fails unexpectedly.

Other languages

This example was JavaScript, but the overall idea is the same in whatever language you use. Most GUI frameworks provide an API for querying the state of the UI, and can also be made to trigger events as-if a user has made an action. For example, when testing Windows applications in this way you can call SendMessage and PostMessage from within the tests to simulate the messages sent by the system when the user interacts with the application via the mouse or keyboard.

End note

As you've seen from this example, test-driving UIs is possible. It's still a good idea to make the UI layer as thin as possible, but that's just general good software engineering. Indeed, test-driving the UI can actually reduce coupling by forcing you to introduce an interface where previously you might have used another subsystem directly.

Posted by Anthony Williams
[/ tdd /] permanent link
Tags: , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Locks, Mutexes, and Semaphores: Types of Synchronization Objects

Tuesday, 21 October 2014

I recently got an email asking about locks and different types of synchronization objects, so I'm posting this entry in case it is of use to others.

Locks

A lock is an abstract concept. The basic premise is that a lock protects access to some kind of shared resource. If you own a lock then you can access the protected shared resource. If you do not own the lock then you cannot access the shared resource.

To own a lock, you first need some kind of lockable object. You then acquire the lock from that object. The precise terminology may vary. For example, if you have a lockable object XYZ you may:

  • acquire the lock on XYZ,
  • take the lock on XYZ,
  • lock XYZ,
  • take ownership of XYZ,
  • or some similar term specific to the type of XYZ

The concept of a lock also implies some kind of exclusion: sometimes you might be unable to take ownership of a lock, and the operation to do so will either fail, or block. In the former case, the operation will return some error code or exception to indicate that the attempt to take ownership failed. In the latter case, the operation will not return until it has taken ownership, which typically requires that another thread in the system does something to permit that to happen.

The most common form of exclusion is a simple numerical count: the lockable object has a maximum number of owners. If that number has been reached, then any further attempt to acquire a lock on it will be unable to succeed. This therefore requires that we have some mechanism of relinquishing ownership when we are done. This is commonly called unlocking, but again the terminology may vary. For example, you may:

  • release the lock on XYZ,
  • drop the lock on XYZ,
  • unlock XYZ,
  • relinquish ownership of XYZ,
  • or some similar term specific to the type of XYZ

When you relinquish ownership in the appropriate fashion then a blocked operation that is trying to acquire the lock may not proceed, if the required conditions have been met.

For example if a lockable object only allows 3 owners then a 4th attempt to acquire the lock will block. When one of the first 3 owners releases the lock then that 4th attempt to acquire the lock will succeed.

Ownership

What it means to "own" a lock depends on the precise type of the lockable object. For some lockable objects there is a very tight definition of ownership: this specific thread owns the lock, through the use of that specific object, within this particular scope.

In other cases, the definition is more fluid, and the ownership of the lock is more conceptual. In these cases, ownership can be relinquished by a different thread or object than the thread or object that acquired the lock.

Mutexes

Mutex is short for MUTual EXclusion. Unless the word is qualified with additional terms such as shared mutex, recursive mutex or read/write mutex then it refers to a type of lockable object that can be owned by exactly one thread at a time. Only the thread that acquired the lock can release the lock on a mutex. When the mutex is locked, any attempt to acquire the lock will fail or block, even if that attempt is done by the same thread.

Recursive Mutexes

A recursive mutex is similar to a plain mutex, but one thread may own multiple locks on it at the same time. If a lock on a recursive mutex has been acquired by thread A, then thread A can acquire further locks on the recursive mutex without releasing the locks already held. However, thread B cannot acquire any locks on the recursive mutex until all the locks held by thread A have been released.

In most cases, a recursive mutex is undesirable, since the it makes it harder to reason correctly about the code. With a plain mutex, if you ensure that the invariants on the protected resource are valid before you release ownership then you know that when you acquire ownership those invariants will be valid.

With a recursive mutex this is not the case, since being able to acquire the lock does not mean that the lock was not already held, by the current thread, and therefore does not imply that the invariants are valid.

Reader/Writer Mutexes

Sometimes called shared mutexes, multiple-reader/single-writer mutexes or just read/write mutexes, these offer two distinct types of ownership:

  • shared ownership, also called read ownership, or a read lock, and
  • exclusive ownership, also called write ownership, or a write lock.

Exclusive ownership works just like ownership of a plain mutex: only one thread may hold an exclusive lock on the mutex, only that thread can release the lock. No other thread may hold any type of lock on the mutex whilst that thread holds its lock.

Shared ownership is more lax. Any number of threads may take shared ownership of a mutex at the same time. No thread may take an exclusive lock on the mutex while any thread holds a shared lock.

These mutexes are typically used for protecting shared data that is seldom updated, but cannot be safely updated if any thread is reading it. The reading threads thus take shared ownership while they are reading the data. When the data needs to be modified, the modifying thread first takes exclusive ownership of the mutex, thus ensuring that no other thread is reading it, then releases the exclusive lock after the modification has been done.

Spinlocks

A spinlock is a special type of mutex that does not use OS synchronization functions when a lock operation has to wait. Instead, it just keeps trying to update the mutex data structure to take the lock in a loop.

If the lock is not held very often, and/or is only held for very short periods, then this can be more efficient than calling heavyweight thread synchronization functions. However, if the processor has to loop too many times then it is just wasting time doing nothing, and the system would do better if the OS scheduled another thread with active work to do instead of the thread failing to acquire the spinlock.

Semaphores

A semaphore is a very relaxed type of lockable object. A given semaphore has a predefined maximum count, and a current count. You take ownership of a semaphore with a wait operation, also referred to as decrementing the semaphore, or even just abstractly called P. You release ownership with a signal operation, also referred to as incrementing the semaphore, a post operation, or abstractly called V. The single-letter operation names are from Dijkstra's original paper on semaphores.

Every time you wait on a semaphore, you decrease the current count. If the count was greater than zero then the decrement just happens, and the wait call returns. If the count was already zero then it cannot be decremented, so the wait call will block until another thread increases the count by signalling the semaphore.

Every time you signal a semaphore, you increase the current count. If the count was zero before you called signal, and there was a thread blocked in wait then that thread will be woken. If multiple threads were waiting, only one will be woken. If the count was already at its maximum value then the signal is typically ignored, although some semaphores may report an error.

Whereas mutex ownership is tied very tightly to a thread, and only the thread that acquired the lock on a mutex can release it, semaphore ownership is far more relaxed and ephemeral. Any thread can signal a semaphore, at any time, whether or not that thread has previously waited for the semaphore.

An analogy

A semaphore is like a public lending library with no late fees. They might have 5 copies of C++ Concurrency in Action available to borrow. The first five people that come to the library looking for a copy will get one, but the sixth person will either have to wait, or go away and come back later.

The library doesn't care who returns the books, since there are no late fees, but when they do get a copy returned, then it will be given to one of the people waiting for it. If no-one is waiting, the book will go on the shelf until someone does want a copy.

Binary semaphores and Mutexes

A binary semaphore is a semaphore with a maximum count of 1. You can use a binary semaphore as a mutex by requiring that a thread only signals the semaphore (to unlock the mutex) if it was the thread that last successfully waited on it (when it locked the mutex). However, this is only a convention; the semaphore itself doesn't care, and won't complain if the "wrong" thread signals the semaphore.

Critical Sections

In synchronization terms, a critical section is that block of code during which a lock is owned. It starts at the point that the lock is acquired, and ends at the point that the lock is released.

Windows CRITICAL_SECTIONs

Windows programmers may well be familiar with CRITICAL_SECTION objects. A CRITICAL_SECTION is a specific type of mutex, not a use of the general term critical section.

Mutexes in C++

The C++14 standard has five mutex types:

The variants with "timed" in the name are the same as those without, except that the lock operations can have time-outs specified, to limit the maximum wait time. If no time-out is specified (or possible) then the lock operations will block until the lock can be acquired — potentially forever if the thread that holds the lock never releases it.

std::mutex and std::timed_mutex are just plain single-owner mutexes.

std::recursive_mutex and std::recursive_timed_mutex are recursive mutexes, so multiple locks may be held by a single thread.

std::shared_timed_mutex is a read/write mutex.

C++ lock objects

To go with the various mutex types, the C++ Standard defines a triplet of class templates for objects that hold a lock. These are:

For basic operations, they all acquire the lock in the constructor, and release it in the destructor, though they can be used in more complex ways if desired.

std::lock_guard<> is the simplest type, and just holds a lock across a critical section in a single block:

std::mutex m;
void f(){
    std::lock_guard<std::mutex> guard(m);
    // do stuff
}

std::unique_lock<> is similar, except it can be returned from a function without releasing the lock, and can have the lock released before the destructor:

std::mutex m;
std::unique_lock<std::mutex> f(){
    std::unique_lock<std::mutex> guard(m);
    // do stuff
    return std::move(guard);
}

void g(){
    std::unique_lock<std::mutex> guard(f());
    // do more stuff
    guard.unlock();
}

See my previous blog post for more about std::unique_lock<> and std::lock_guard<>.

std::shared_lock<> is almost identical to std::unique_lock<> except that it acquires a shared lock on the mutex. If you are using a std::shared_timed_mutex then you can use std::lock_guard<std::shared_timed_mutex> or std::unique_lock<std::shared_timed_mutex> for the exclusive lock, and std::shared_lock<std::shared_timed_mutex> for the shared lock.

std::shared_timed_mutex m;
void reader(){
    std::shared_lock<std::shared_timed_mutex> guard(m);
    // do read-only stuff
}
void writer(){
    std::lock_guard<std::shared_timed_mutex> guard(m);
    // update shared data
}

Semaphores in C++

The C++ standard does not define a semaphore type. You can write your own with an atomic counter, a mutex and a condition variable if you need, but most uses of semaphores are better replaced with mutexes and/or condition variables anyway.

Unfortunately, for those cases where semaphores really are what you want, using a mutex and a condition variable adds overhead, and there is nothing in the C++ standard to help. Olivier Giroux and Carter Edwards' proposal for a std::synchronic class template (N4195) might allow for an efficient implementation of a semaphore, but this is still just a proposal.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Comments on the C++ Concurrency TS

Wednesday, 28 May 2014

It's been a while since I wrote any papers for the C++ committee, but I've written two for the committee mailing prior to the upcoming committee in Rapperswil:

The first provides comments, and suggestions for improvements on the concurrency TS based on implementing continuations for Just::Thread V2, and executors for an unreleased internal build of Just::Thread.

The second proposes to standardize the synchronized_value class template from Just::Thread Pro, with a couple of modifications.

Let me know if you have any comments.

Posted by Anthony Williams
[/ news /] permanent link
Tags: , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Older entries

Design and Content Copyright © 2005-2015 Just Software Solutions Ltd. All rights reserved.