Thursday, September 22, 2011

More than we expected at the IEEE

If you have been reading this blog you will know that we sent Romana over to Okinawa to present our desire to see ad-hoc mode wifi improved, and also to consider how to use the baseband radio in mobile phones to support wifi on very low cost phones, and also to provide much longer range meshing using the ISM915/868 bands that offer very useful link budget advantages over the 2.4GHz band used by regular WiFi.

Well, we have been absolutely gobsmacked by the interest and positive response that Romana experienced at the meeting, and have the opportunity of taking up the baton to champion proposing these enhancements through the formal IEEE 802 standards process.

This is a great privilege, and also an opportunity that cannot be accepted without serious and solemn consideration, because it will basically require someone's effort, full time, for perhaps three years or more.

We are convinced that we want to, and that we should engage with this standards process, but just need to work out whether we have the resources to do it, without drawing all of our available resources away from progressing the development of the Serval mesh telephony software.

More news as we have it.

Tuesday, September 13, 2011

Why Serval is getting involved in the 802.11ah standards process

Serval is getting involved in the IEEE 802.11ah (WiFi on ISM bands below 1GHz) to try to make sure that it is well suited to infrastructure-independent ad-hoc and mesh networking at the IEEE meeting in Okinawa next week.

Here is our submission:

https://mentor.ieee.org/802.11/dcn/11/11-11-1138-00-00ah-packet-radio-mode-for-802-11ah-a-b-g-n.ppt

Basically we are asking for two things:

1. Improvements to ad-hoc mode (or provision of a new "packet radio" mode) that remove some of the problems currently faced when creating wifi-based ad-hoc mesh networks.

and

2. That the 802.11ah standard consider speeds below 1mbit and using cell phone baseband radios as a supported transport so that even cheap cell phones can form relatively long range mesh networks without any supporting hardware.

This second point is really important, because compared with 2.4GHz WiFi a mesh running in the ISM 915MHz band gains about +9db just from the change in frequency, which alone improves range by almost 3x.  If it supports lower bit rates, then further significant gains are possible, e.g., allowing 100kbit communications gives another +10db, for a total of 8x range versus WiFi.

There are some significant protocol challenges to be addressed, but if the standard doesn't support the use-case, then there will be no hardware and no chance.

These measures have the potential to push the indoor range up from WiFi's "about one house wide" to "about a block wide" and clear line-of-sight range up to a few km, which suddenly makes the formation of suburban ad-hoc mesh networks possible, which has profound impact for creating resilient infrastructure-independent communications solutions, for example for sustaining communications during disaster or enabling communications for rural and remote or developing populations.

We admit that we are very green to this process which we frankly find to be daunting.

But we feel compelled to try, regardless of what we perceive to be the odds of success.

Monday, September 12, 2011

More progress on the overlay mesh network

It is 5:58am and dawn is just beginning.

I have been up all night making the best of the need of an overnight sleep study for our son, which requires an adult to perform a specific function every 15 minutes throughout the night.

Programming in these little pieces feels a bit like the software development version of watching commercial television; just as you think you are about to witness something really interesting there is an interruption.

Nonetheless, I have made significant progress getting the overlay mesh to send and receive advertisements of a different kind; notices by one node that they can reach another node.

There are plenty of bugs that I know about, without even thinking about the ones that I don't know about, and also some code that is still plain old missing.  But even so, I have got the code forming a multi-hop mesh and keeping track of the routes to them.

I ran two nodes connected to two separate dummy networks, so that they couldn't see each other.  Then I connected an extra node in the middle that bridged the two networks, and after a short delay, viola, the nodes could all see each other.  The image shows the routing table from one of the nodes with only one network interface.


The 0000000* addresses in the neighbour table are clear evidence of some of those bugs I mentioned, as is the crazily large age field for one of those.

The FDB3BA1*, D64CEB4* and B167503* addresses are nodes that I had temporarily added to the mesh, and then removed again fairly recently, so the routes were (correctly) still hanging around with their link scores steadily dropping.

The important part is the route to D6850E3*, the node on the other network, for which the current route is via 30464E5*, which is the node bridging the two networks.  30464E5* was a node that I had previously bridging the networks just a short while earlier, and we can see the lingering route as it decays.

B167503* is the node currently bridging the link, and it's score is still climbing, and is about to overtake the recently removed node and become the route of choice.

So while I have a lot more work to do, the basic function of the mesh to discover multi-hop paths and switch routes based on appropriately calculated scores.

But right now, I am going to bed.

Why User-Land Mesh Networks Are Necessary For Heterogeneous WiFi Mesh Networks

The normal approach to creating a mesh network is to write a program that does some tricky stuff to work out how to get from node A to node Z via whatever nodes might be in the middle, and then tell this to the operating system so that packets get routed according to these instructions.

We are doing something different for Serval, in that we are having the program that does all that tricky stuff carry the messages between nodes, instead of asking the operating system to help out.

This has some disadvantages and some advantages.

First, the disadvantages, and why we think that they aren't such a problem for our application.

1. Such "user-land" overlay networks suffer from an inefficiency known as "double-copy", where every packet sent or received ends up getting copied not once by the kernel, but twice, once by the kernel, and once by the overlay program.  This is bad if you are processing lots of data.  But on a mesh network the actual throughput of data tends to be fairly low (it is one of the well known problems with mesh networks).  So this problem is kept under control.

2. Programs that want to know about the mesh network need to know about the overlay and have to use special calls to talk to the mesh.  This is bad if your primary goal is allowing people to run BitTorrent, SSH, for example, because you need to modify each of those programs to support the new network, and without operating system cooperation that can get a little messier than you might like.

However, for a mesh telephony platform, and where we know that bandwidth starvation is one of the greatest potential problems, there is actually merit in requiring people to go to some effort before they enable their application to consume the limited resources of the mesh.  We will be adding the support to the initial core applications we intend to run over the mesh, so this isn't really a big deal.

Now for some advantages:

1. We don't need operating system cooperation.  This makes it easier to port the mesh to all operating systems, both desktop and mobile.  It is especially important because the need to fiddle with routing tables requires root/administrative access on most platforms.

For mobile platforms, this is certainly true on Android, and I believe iOS, with together constitute the majority of smart phones.  This alone means that we have to take this path.

But it also gives us some additional upside here, in that it means we could make a USB key loaded with Serval mesh software that could be run on any desktop computer, and without administrative access or being "installed" could enable that computer to participate in the mesh.  In fact, it could probably even be an HTML5 application running in a web browser.

2. It separates us from the underlying network addressing. Many devices and networks only support IPv4, but there aren't enough IPv4 addresses to go around. But by pushing our mesh network addressing into a dedicated addressing layer we can solve this problem.

We use 256bit Elliptic Curve cryptography keys as the basis for our address

In fact, using this approach it is possible for every single Serval enabled device on a mesh to all have the same IPv4 address, provided we use only broadcast traffic, which we do for a very important reason:

Broadcast traffic is the only traffic which WiFi ad-hoc can reliably carry between devices with incompatible or buggy ad-hoc implementations.  For example, we cannot get unicast traffic to flow reliably between Mesh Potatoes and Huawei IDEOS U8150/U8180 phones.  However, we can get unicast traffic to flow between Mesh Potatoes and the original Android G1 developer phones, and between the G1 and the U8150/U8180 phones.

Unfortunately mesh routing protocols all tend to use broadcast traffic to discover available routes, and thus the routes choose the impossible Mesh Potato-U8150/U8180 links, which is very bad indeed.  But by making the mesh traffic itself broadcast we ensure that what the mesh routing protocol and what we are trying to do is the same, instead of merely similar.

Taking this a little further, by transporting the mesh topology management information in the same packets as the data we can directly measure the performance of the mesh delivering the traffic we really care about, rather than using different packets as an indirect measure -- especially since the distribution of packet sizes can otherwise be very different, and WiFi packet losses are not insensitive to packet length or differentiation between broadcast and unicast traffic.

Implementing this is of course a significant undertaking, and I have already spent a few weeks programming away.  But the effort is beginning to yield results.  Today for the first time I have the new overlay mesh detecting nodes and building the in-memory route table. You can see a screen grab here:


So here you can see some debug output from a Serval DNA node running in overlay mode using a file called dummy0 as a dummy network interface.

I realised early in the development of the overlay network that I would need some convenient simulation tools to make it easier to debug the behaviour of the mesh.  I decided the best way to do this was to make the software itself also a network simulator.  So I added support to pretend an ordinary file is a network interface.  Any instances of the overlay pointed to the same file would all append their "packets" to the end of the file, and remember where they had got up to reading from the file, and keep an eye out for any packets deposited by other nodes.

Each packet also has a header that has space for location information, transmission power, time, node velocity and heading and a whole pile of other information that can be used to drop packets that are deemed to have been sent from too far away, or suffering collision during reception and so on.

The file then also represent a log of all activity on the mesh which has already proven very helpful for debugging.  I will also create a tool that allows for the creation of visualisations of the mesh in action to help understand why things go wrong, and to explain the operation of the mesh.  I will post such animations here once I get that far.

Back to the trace, the error messages are mostly information for me for development (although there is on link-list bug that I can see I will need to fix to avoid a memory leak).

There are a few 256bit addresses that can be seen.  The all-f's is the mesh broadcast address, while the one that starts with 9d5f is the address of a node on the mesh.

These addresses are really big, and so I have already implemented a variety of mechanisms for abbreviating the addresses.

For example, the broadcast address is used so often, that it has a one-byte abbreviation (0x0f).  For other addresses, we only need to transmit as much of the address is required for the receiver to discriminate it from all other nodes.  The birthday paradox comes into play here, and means that we typically need about twice the number of bits as the base-2 log of the number of nodes on the network.  For a small network of a few hundred nodes 24 bits (3 bytes) of address is a pleasant over-kill.  Even at a global scale, we need to accomodate around 10 billion addresses = ~2^34 addresses, and so about 70 bits (7 bytes) should be enough.

We also have abbreviation strategies where neighbouring nodes that talk with each other a lot can allocate a one-byte short code for each other, however, as you might be able to guess from the trace that abbreviation method is temporarily disabled in the code to keep my life a little simpler at present -- which is a good move because of the density of new features in the code.

At the end of the trace we see a summary of the overlay routing table which shows 7-digit address prefixes for all know nodes on this network, and the link-scores to each via the various routes known to each.

The scores have two components. The first being a BATMAN-like score derived from beacon counting, although we take into account that the remote node might have different interfaces running at different beacon rates.  The second part is the number of Serval gateways that are on the path.

This gateway count is there to prevent routing loops and other nasties in meshes where the overlay has managed to infer some hierarchy, or where inter-mesh links have been purposely provisioned.

I'll talk about inferred hierarchy in another post, but it is one of several measures we are taking to ensure that a Serval mesh can scale beyond the normal bandwidth-starvation limit imposed by all nodes trying to know about all other nodes all of the time.

Our premise is that you need frequent information for nearby nodes because their "bearing" can change wildly over short time frames, but that for more distant nodes their "bearing" on the mesh should change much more slowly, and so we should be able to cope with less frequent updates about the position of such distant nodes.

But meanwhile, as you can see we are making good progress, and the Serval DNA overlay mesh is beginning to come together.

Thursday, September 8, 2011

A Thermo-Electric Saucepan?

(Please take a look at our crowd-funding campaign at igg.me/at/speakfreely.)

Today we received our thermo-electric saucepan from TES New Energy. You can see some pictures of it below.

This amazing device provides a full-powered USB port when placed on a fire by using thermo-electric generation.

The interesting innovation is that by incorporating the thermo-couples into a saucepan the water in the pot keeps one side of the junction cool to maximise the power output, while ensuring that the about 98% of heat that the thermo-couple cannot turn into electricity heats your water.

This also means that they can use different types of thermo-couples that work better at different temperatures.  The hot-side of the pan uses lower-efficiency thermo-couples that can tolerate the higher temperatures, and the "cold" side of that is joined to the hot side of the lower-temperature thermo-couple which is also a little more efficient, and the water in the pan helps to make sure that this lower-temperature thermo-couple does not over heat.  Together, they  avoid the complications that would normally be present if using only one of the thermo-couples, either lower yield, or the potential to be damaged by higher-temperature fires.

The saucepan approach is also an elegant way to circumvent the atrocious inefficiency that typically plagues thermo-electric generators, since in a diaster or remote location you probably need hot water, anyway, a realisation that TES NewEnergy had following the mega earthquake that struck Japan recently.

So this saucepan is about enough to charge a phone if you are in the middle of nowhere, and solar isn't an option for whatever reason -- perhaps because the power is out due to storm, volcano, or just during bad weather.

Anyway, we are looking forward to some fun testing Serval mesh phones powered by a high-tech saucepan on a campfire.