Configuring a home data center

I had this old school thought.  I need a 48u rack in the garage.  I’ll put a gigabit swatch at the top, and load it up with all these gigabit fast machines, and just party like a data center fool.

Then I went to Fry’s electronics to replace a failed hard disk in a very old Atom based machine.  One terabyte, for $59…  Because these are tiny little 5400 RPM laptop drives.

This made me rethink my rack madness.  First thought, what is storage all about these days?  First of all, there’s the cloud, with all its infinite amounts of storage at somewhat reasonable prices (if you’re a business).  But, what about the average home user.  What do you really store?  Well, there’s the Gobs and gobs of images that will never likely leave your phone.  Then there’s your aging DVD and CD collection, if you haven’t already gone full over to the streaming side of media consumption.  scanned documents? (all 1Gb of them).  What else is there?  Not much that I can think of really.  1 or 2 terabytes is plenty, and a NAS box that you never think about is probably the best way to go for most of that.

But, I want to do more with the bits and pieces of compute that I have laying around.  Alright, so long time back, I purchased two ASUS EeeBox EB1006-B machines.  Probably got them off Woot at a decent price.  Back then I wasn’t sure what I’d do with them, but I knew cheap was good.  I took one and eventually put it in my workshop, just to browse the internet on occasion.  The other sat in the box, until just recently.

These little boxen come with 1Gb of RAM, and a 160Gb hard disk.  The processor is an Atom N270, with who knows what kinds of graphics capabilities.  I upgraded the RAM to 2Gb, because $35.  I added the 1 Tb drive, because $59.  Now what?

The OS.  Well, they originally came with Windows XP, and it didn’t make sense to stick with that particular choice.  Nor did it make sense to upgrade to Windows 8.1, because that’s just not a match made in heaven.  So, I turned to… Linux.  I don’t really know what I’m going to use each box for, but I know that I can pretty much dedicate a single box to each feature I might want.  So, the first box I decided will be a proxy server for my home (outbound).  I have been playing with proxy servers at work for the past couple of years, so I thought it was high time that I actually use one at home for kicks.

Box1 – After some gnashing of teeth and wringing of hands, I settled on installing Arch Linux on the first box.  Why Arch?  Because I wanted a fairly minimal install.  I’ve installed Ubuntu on various machines in the past, and that’s a good enough environment.  Works great with just about any hardware I have.  But, for this proxy server, all I need is network and disk drive, and CPU cycles, and that’s about it.  I figure an Atom is a good enough processor for the types of proxying that are typical of home usage, so I don’t need some honking beefy server CPU here.  The 2Gb of RAM is plenty to hold the OS and most stuff that’s likely to be cached.  But, in case I want to cache large chunks of the internet, there’s the 1Tb drive sitting there doing mostly nothing most of the time.

I installed Arch, then I installed; alsa-utils (for audio, which I’m not using)

git – just in case I want to pull down and compile other interesting stuff

openssh – so I can manage the box without having an attached monitor

sudo – so I can sudo

nodejs – just in case I want to run some simple web server

squid – because that’s the actual proxy server that I need on the box


Realistically, I don’t need anything more than SSH and Squid, and if I reimage the machine, which is just a USB stick away, I’ll configure it with just those two packages.

After installing all the stuff, and configuring Squid (primarily cache location, and a couple of acls), I booted up.  I started by pointing FireFox from my desktop machine at the proxy.  That seemed to work.  Then I pointed the MacBook, and that works.  Then it was the iPad, which also seems to work.  To check and see if things are actually working as expected, I took a look at the Squid access log files, and sure enough, there were the expected entries for the web traffic.  Well, big woot!  Now I can go through the rest of the devices in the house and start pointing them at the proxy.

Now that the proxy machine is up and running, I can think about doing enforced, automatic proxy settings and the like, just like with big secure companies.  Then I want to play with fun ways to visualize the web accesses.  It would be really cool if I could integrate with Microsoft’s cloud app discovery service.  That would make it extra useful in terms of ready made visualizations.

The machine is nice and silent, just sitting there under a desk with it’s blue power indicator light, silenty proxying the internet.  I just put the other machine right next to it.  This one I think I’ll go with TinyCore Linux.  It’s even more stripped down than Arch Linux.  Almost nothing more than the kernel, shell and package manager.  But, when you’re going single purpose per device, that’s often enough.  For this machine, I’m thinking of making it a git server.  It’s a toss up though because my Synology box has git services as well, and for storage related stuff, the NAS is better equipped for dealing with redundancy, failures, and the like.  So, if not git server, then perhaps it will become the dhcp server for my network, relieving the router box of that particular duty.  Something like a pogoplug might be even more reasonable.  Very small compute required to serve this particular purpose.  If not, then it might just become a generalized compute node, perhaps server as a Docker thing, or as a TINN experimental server.

Besides these couple older boxes, I have a couple of Odroid XUs, some even more ancient x86 machines, and a beefy server from a bygone era (just put a new modern graphics card in it).  Each one of these devices can serve a single purpose.  This begs the question for me.  Do I need beefy multi-purpose machines in my home data center?  I think the answer is, I need a few beefy special purpose machines for certain purposes (storage, compute, graphics), and I need some more general purpose machines to do much lighter weight stuff (browsing, emailing, editing documents).

So, thus far, the home data center has gained a proxy server, recovered from a long decommissioned device.  I’m sure more specialized servers will come online over time, and I probably won’t be purchasing that 48u rack.

Microsoft Service Achievement Award

So, if you’ve been at Microsoft long enough, and you’ve done favorable work, and you’re of a certain level, you might be granted this MSAA. It’s basically time off, where you can think, rejuvenate, and come back swinging.  Some might call it a sabbatical, but you’re not headed off to another company to teach computing.

I was given one of these awards way back in the day, but never took the time… until now!

I’ve got 8 weeks, off the hook to play around, play with my kid, do some traveling, and of course some tinkering around with code, 3D printing, landscaping, and the inevitable home improvement projects.

I gave my coworkers the link to this blog so that they could follow along my exploits if they so choose.  The clock starts ticking on Sept. 29th, but I’ve already got a list of 20 things, which I know will not all get done in any way shape or form.  We’ll see.

For now, my short list is:

Write a simple graphics system in C (for what, the 3 or 4th time?)

Play around with FPGAs

Construct some cabinetry in the garage

Teach my son to walk, and the true meaning of ‘inside voice’


I’ve been at MS since Oct/Nov 1998, so coming on 16 years now.  I was recently doing some phone screens for college hires, and they invariably asked me the same question; “What motivates you to stay at Microsoft”.

There were two core answers that seemed to come to me easily.

1) Whenever we do anything at Microsoft, it has the potential have impacting a great many people around the world.  One key example I gave was, ‘we all Google, but Microsoft runs the ATMs and cash registers’.

2) I have been able to grow and learn a great many things within the company.  I’ve been a large scale manager, an individual contributor, worked internationally, worked on core frameworks, and whole cloud systems.  I’ve been able to switch teams and divisions, and the whole time, I’ve managed to keep a paycheck, and gather stock which is actually worth something.  Of course, I’m not a multi-billionaire, but, I’ve perfectly happy with the lifestyle my MS generated income affords me.

And so, instead of taking the payout for my sabbatical, I took the time off.  I’m looking forward to rejuvenating, ideating, and ultimately going back to work renewed and ready to kick some more serious computing butt!


Goodbye to colleagues

July 17 2014, some have called it Black Thursday at Microsoft.

I’ve been with the company for more than 15 years now, and I was NOT given the pink slip this time around.

Over those years, I have worked with tons of people, helped develop some careers, shipped lots of software, and generally had a good time.  Some of my colleagues were let go.  I actually feel fairly sad about it.  This is actually the second time I’ve known of colleagues being let go.  These are not people who are low performers.  In fact, last time around, the colleague found another job instantly within the company.

I remember back in the day Apple Computer would go through these fire/hire binges.  They’d let go a bunch of people, due to some change in direction or market, and then within 6 months end up hiring back just as many because they’d figured out something new which required those skilled workers.

In this case, it feels a bit different.  New head guy, new directions, new leadership, etc.

I’ve done some soul searching over this latest cull.  It’s getting lonely in My old Microsoft.  When you’ve been there as long as I have, the number of people you started with becomes very thin.  So, what’s my motivation?

It’s always the same I think.  I joined the company originally to work on the birth of XML.  I’ve done various other interesting things since then, and they all have the same pattern.  Some impossible task, some new business, some new technical challenge.

This is just the beginning of the layoffs, and I don’t know if I’ll make the next cull, but until then, I’ll be cranking code, doing the impossible, lamenting the departure of some very good engineering friends.  Mega corp is gonna do what mega corp’s gonna do.  I’m and engineer, and I’m gonna do some more engineering.


Fast Apps, Microsoft Style


That’s what I exclaimed at least a couple of times this morning as I sat at a table in a makeshift “team room” in building 43 at Microsoft’s Redmond campus. What was the exclamation for? Well, over the past 3 months, I’ve been working on a quick strike project with a new team, and today we finally announced our “Public Preview“.  Or, if you want to get right to the product: Cloud App Discovery

I’m not a PM or marketing type, so it’s best to go and read the announcement for yourself if you want to get the official spiel on the project.  Here, I want to write a bit about the experience of coming up with a project, in short order, in the new Microsoft.

It all started back in January for me.  I was just coming off another project, and casting about for the next hardest ‘mission impossible’ to jump on.  I had a brief conversation with a dev manager who posed the question; “Is it possible to reestablish the ‘perimeter’ for IT guys in this world of cloud computing”?  An intriguing question.  The basic problem was, if you go to a lot of IT guys, they can barely tell you how many of the people within their corporation are using, let alone DropBox from a cafe in Singapore.  Forget the notion of even trying to control such access.  The corporate ‘firewall’ is almost nothing more than a quartz space heater at this point, preventing very little, and knowing about even less.

So, with that question in mind, we laid out 3 phases of development.  Actually, they were already laid out before I joined the party (by a couple of weeks), so I just heard the pitch.  It was simple, the first phase of development is to see if we can capture network traffic, using various means, and project it up to the  cloud where we could use some machine learning to give an admin a view of what’s going on.

Conveniently sidestepping any objections actual employees might have with this notion, I got to thinking on how it could be done.

For my part, we wanted to have something sitting on the client machine (a windows machine that the user is using), which will inspect all network traffic coming and going, and generate some reports to be sent up to the cloud.  Keep in mind, this is all consented activity, the employee gets to opt in to being monitored in this way.  All in the open and up front.

At the lowest level, my first inclination was to use a raw socket to create a packet sniffer, but Windows has a much better solution these days, built for exactly this purpose.  The Windows Filter Platform, allows you to create a ‘filter’ which you can configure to callout to a function whenever there is traffic.  My close teammate implemented that piece, and suddenly we had a handle on network packets.

We fairly quickly decided on an interface between that low level packet sniffing, and the higher level processor.  It’s as easy as this:


int WriteBytes(char *buff, int bufflen);
int ReadBytes(char *buff, int bufflen, int &bytesRead);

I’m paraphrasing a bit, but it really is that simple. What’s it do? Well, the fairly raw network packets are sent into ‘WriteBytes’, some processing is done, and a ‘report’ becomes available through ‘ReadBytes’. The reports are a JSON formatted string which then gets turned into the appropriate thing to be sent up to the cloud.

The time it took from hearing about the basic product idea, to a prototype of this thing was about 3 weeks.

What do I do once I get network packets? Well, the network packets represent a multiplexed stream of packets, as if I were a NIC. All incoming, outgoing, all TCP ports. Once I receive some bytes, I have to turn it back into individual streams, then start doing some ‘parsing’. Right now we handle http and TLS. For http, I do full http parsing, separating out headers, reading bodies, and the like. I did that by leveraging the http parsing work I had done for TINN already. I used C++ in this case, but it’s all relatively the same.

TLS is a different story. At this ‘discovery’ phase, it was more about simple parsing. So, reading the record layer, decoding client_hello and server_hello, certificate, and the like. This gave me a chance to implement TLS processing using C++ instead of Lua. One of the core components that I leveraged was the byte order aware streams that I had developed for TINN. That really is the crux of most network protocol handling. If you can make herds or tails of what the various RFCs are saying, it usually comes down to doing some simple serialization, but getting the byte ordering is the hardest part. 24-bit big endian integers?

At any rate, http parsing, fairly quick. TLS client_hello, fast enough, although properly handling the extensions took a bit of time. At this point, we’d be a couple months in, and our first partners get to start kicking the tires.

For such a project, it’s very critical that real world customers are involved really early, almost sitting in our design meetings. They course corrected us, and told us what was truly important and annoying about what we were doing, right from day one.

From the feedback, it becomes clear that getting more information, like the amount of traffic flowing through the pipes is as interesting as the meta information, so getting the full support for flows becomes a higher priority. For the regular http traffic, no problem. The TLS becomes a bit more interesting. In order to deal with that correctly, it becomes necessary to suck in more of the TLS implementation. Read the server_hello, and the certificate information. Well, if you’re going to read in the cert, you might as well get the subject common name out so you can use that bit of meta information. Now comes ASN.1 (DER) parsing, and x509 parsing. That code took about 2 weeks, working “nights and weekends” while the other stuff was going on. It took a good couple of weeks not to integrate, but to write enough test cases, with real live data, to ensure that it was actually working correctly.

The last month was largely a lot of testing, making sure corner cases were dealt with and the like. As the client code is actually deployed to a bunch of machines, it really needed to be rock solid, no memory leaks, no excessive resource utilization, no CPU spiking, just unobtrusive, quietly getting the job done.

So, that’s what it does.

Now, I’ve shipped at Microsoft for numerous years. The fastest cycles I’ve usually dealt with are on the order of 3 months. That’s usually for a product that’s fairly mature, has plenty of engineering system support, and a well laid out roadmap. Really you’re just turning the crank on an already laid out plan.

This AppDiscovery project has been a bit different. It did not start out with a plan that had a 6 month planning cycle in front of it. It was a hunch that we could deliver customer value by implementing something that was challenging enough, but achievable, in a short amount of time.

So, how is this different than Microsoft of yore? Well, yes, we’ve always been ‘customer focused’, but this is to the extreme. I’ve never had customers this involved in what I was doing this early in the development cycle. I mean literally, before the first prototypical bits are even dry, the PM team is pounding on the door asking “when can I give it to the customers?”. That’s a great feeling actually.

The second thing is how much process we allowed ourselves to use. Recognizing that it’s a first run, and recognizing that customers might actually say “mehh, not interested”, it doesn’t make sense to spin up the classic development cycle which is meant to maintain a product for 10-14 years. A much more streamlined lifecycle which favors delivering quality code and getting customer feedback, is what we employed. If it turns out that customers really like the product, then there’s room to fit the cycle to a cycle that is more appropriate for longer term support.

The last thing that’s special is the amount of leveraging Open Source we are allowing ourselves these days. Microsoft has gone full tilt on OpenSource support. I didn’t personally end up using much myself, but we are free to use it elsewhere (with some legal guidelines). This is encouraging, because for crypto, I’m looking forward to using things like SipHash, and ChaCha20, which don’t come natively with the Microsoft platform.

Overall, as Microsoft continues to evolve and deliver ‘customer centric’ stuff, I’m pretty excited and encouraged that we’ll be able to use this same model time and again to great effect. Microsoft has a lot of smart engineers. Combined with some new directives about meeting customer expectations at the market, we will surely be cranking out some more interesting stuff.

I’ve implemented some interesting stuff while working on this project, some if it I’ll share here.

Microsoft Part II

I joined Microsoft in 1998 to work on MSXML. One of the reasons I joined way back then is because MS was in trouble with the DOJ, and competitors were getting more interesting. I thought “They’re either going down, or they’re going to resurge, either way, it will be a fun ride”.

Here it is, more than 15 years later, and I find my sentiment about the same. Microsoft has been in trouble the past few years. Missing a few trends, losing our way, catching our breath as our competitors run farther and faster ahead of us…

In the past 4 years, I’ve been associated with the rise of Azure, and most recently associated with our various identity services. In the past couple of months, I’ve been heads down working in an internal startup, which is about to deliver bits to the web. That’s 2 months from conception to delivery of a public preview of a product. That’s fairly unheard of for our giant company.

But, today, I saw a blizzard of news that made me think ye olde company has some life yet left in it.

The strictly Microsoft related news…
Windows Azure Active Directory Premium
C# Goes Open Source
TypeScript goes 1.0
Windows 8.1 is FREE for devices less than 9″!!

Of all of these, I think the Windows 8.1 going for free is probably the most impactful from a ‘game changer’ perspective. Android is everywhere, probably largely because it is ‘free’. I can’t spit in the wind without hitting a new micro device that runs Android, and doesn’t run Windows. Perhaps this will begin to change somewhat.

Then there’s peripheral news like…
intel Galileo board ($99) is fully programmable from Visual Studio
Novena laptop goes for crowd funding

The Novena laptop is very interesting because it’s a substantial offering created by a couple of hardcore engineers. It is clearly a MUST HAVE machine for any self respecting hard/software hacker. It’s not the most powerful laptop in the world, and that’s beside the point. What it does represent is that some good engineers, hooked up with a solid supply chain, can produce goods that are almost price competitive with commodity goods. That and the fact that this is just an extraordinary hack machine.

I find the Galileo interesting because other than some third party support for Arduino programming from MSVC, this is a serious support drive for small things, from Microsoft. Given the previous news about the ‘free’, this Galileo support bodes well. You could conceivably get a $99 ‘computer’ with some form of Windows OS, and use it at the heart of your robot, quadcopter, art display, home automation thing…

Of course, the rest of the tinker market is heading even lower priced with things like the Teensy 3.1 at around $20. No “OS” per se, but surely capable hardware that could benefit from a nicely integrated programming environment and support from Microsoft. But, you don’t want Windows on such a device. You want to leverage some core technologies that Microsoft has in-house, and just apply it in various places. Wouldn’t it be great if all of Microsoft’s internal software was made available as installable packages…

Then there’s the whole ‘internet of things’ angle. Microsoft actually has a bunch of people focused in this space, but there’s no public offerings as yet. We’re Microsoft though, so you can imagine what the outcomes might look like. Just imagine lots of tiny little devices all tied to Microsoft services in some way, including good identities and all that.

Out on the fringe, non-Microsoft, there is, with their latest board back from manufacturing. I micro controller that runs node.js (and typescript for that matter), which is WiFi connected. That is bound to have a profound impact for those who are doing quick and dirty web connected physical computing.

Having spent the past few weeks coding in C++, I have been feeling the weight of years of piled on language cruft. I’ve been longing for the simplicity of the Lua language, and my beloved TINN, but that will just have to wait a few more weeks. In the meanwhile, I did purchase a Mojo FPGA board, in the hopes that I will once again get into FPGA programming, because “hardware is the new software”.

At the end of the day, I am as excited about the prospects of working at Microsoft as I was in 1998. My enthusiasm isn’t constrained by the possibilities of what Microsoft itself might do, rather I am overjoyed at the pace of development and innovation across the industry. There are new frontiers opening up all the time. New markets to explore, new waves to catch. It’s not all about desktops, browsers, office suites, search engines, phones, and tablets. Every day, there’s a new possibility, and the potential for a new application. Throw in 3D printing, instant manufacturing, and a smattering of BitCoin, and we’re living in a braver new world every day!!

Revisiting C++

I was a C++ expert twice in the past. The first time around was because I was doing some work for Taligent, and their whole operating system was written in C++. With that system I got knee deep into the finer details of templates, and exceptions, to a degree that will likely never be seen on the planet earth.

The second time around, was because I was programming on the BeOS. Not quite as crazy as the Taligent experience, but C/C++ were all the rage.

Then I drifted into Microsoft, and C# was born. For the past 15 years, it’s been a slow rise to dominance with C# in certain quarters of Microsoft. It just so happens that this corresponds to the rise of the virus attacks on Windows, as well as the shift in programming skills of college graduates. In the early days of spectacular virus attacks, you could attribute most of them to buffer overruns, which allowed code to run on the stack. This was fairly easily plugged by C#, and security coding standards.

Today, I am working on a project where once again I am learning C++. This time around it’s C++ 11, which is decidedly more mature than the C++ I learned while working on Taligent. It’s not so dramatically different as say the difference between Lisp and Cobol, but it gained a lot of stuff over the years.

I thought I would jot down some of the surface differences I have noticed since I’ve been away.

First, to compare C++ to Lua, there are some surface differences. Most of the languages I program in today have their roots in Algol, so they largely look the same. But, there are some simple dialect differences. C++ is full of curly braces ‘{}’, semi-colons ‘;’, and parenthesis ‘()’. Oh my god with the parens and semis!! With Lua, parens are optional, semis are optional, and instead of curlies, there are ‘do’, ‘end’, or simply ‘end’. For loops are different, array indices are different (unless you’re doing interop with the FFI), and do/while is repeat/until.

These are all minor differences, like say the differences between Portuguese and Spanish. You can still understand the other if you speak one. Perhaps not perfectly, but there is a relatively easy translation path.

Often times in language wars, these are the superficial differences that people talk about. Meh, not interesting enough to drive me one way or another.

But then, there’s this other stuff, which is truly the essence of the differences. Strong typing/duck typing, managed memory, dynamic code execution. I say ‘Lua’ here, but really that could be a standin for C#, node.js, Python, or Ruby. Basically, there are a set of modern languages which exhibit a similar set of features which are different enough from C/C++ that there is a difference in the programming models.

To illustrate, here’s a bit of C++ code that I have written recently. The setup is this, I receive a packet of data, typically the beginning of a HTTP conversation. From that packet of data, I must be able to ‘parse’ the thing, determine whether it is http/https, pull out headers, etc. I need to build a series of in-place parsers, which keep the amount of memory allocated to a minimum, and work fairly quickly. So, the first piece is this thing called a AShard_t:

#pragma once

#include "anl_conf.h"

class  DllExport AShard_t  {
	uint8_t *	m_Data;
	size_t	m_Length;
	size_t	m_Offset;

	// Constructors
	AShard_t(const char *);
	AShard_t(uint8_t *data, size_t length, size_t offset);

	// Virtual Destructor
	virtual ~AShard_t() {};

	// type cast
	operator uint8_t *() {return getData();}

	// Operator Overloads
	AShard_t & operator= (const AShard_t & rhs);

	// Properties
	uint8_t *	getData() {return &m_Data[m_Offset];};
	size_t		getLength() {return m_Length;};

	// Member functions
	AShard_t &	clear();
	AShard_t &	first(AShard_t &front, AShard_t &rest, uint8_t delim) const;
	bool		indexOfChar(const uint8_t achar, size_t &idx) const;
	bool		indexOfShard(const AShard_t &target, size_t &idx);
	bool 		isEmpty() const;
	void		print() const;
	bool		rebase();
	char *		tostringz() const;
	AShard_t &	trimfrontspace();


OK, so it’s actually a fairly simple data structure. Assuming you have a buffer of data, a shard is just a pointer into that buffer. It contains the pointer, an offset, and a length. You might say that the pointer/offset combo is redundant, you probably don’t need both. The offset could be eliminated, assuming the pointer is always at the base of the structure. But, there might be a design choice that makes this useful later.

At any rate, there’s a lot going on here for such a simple class. First of all, there’s that ‘#pragma once’ at the top. Ah yes, good ol’ C preprocessor, needs to be told not to load stuff it’s already loaded before. There’s there’s class vs struct, not to be confused with ‘typedef struct’. Public/Protected/Private, copy constructor or ‘operator=’. And heaven forbid you forget to make a public default constructor. You will not be able to create an array of these things without it!

These are not mere dialectual differences, these are the differences between Spanish and Hungarian. You MUST know about the default constructor thing, or things just won’t work.

As far as implementation is concerned, I did a mix of things here, primarily because the class is so small. I’ve inserted some simple “string” processing right into the class, because I found them to be constantly useful. ‘first’, ‘indexOfChar’, and ‘indexOfShard’ turn out to be fairly handy when you’re trying to parse through something in place. ‘first’ is like in Lisp, get the first element off the list of elements. In this case you can specify a single character delimiter. ‘indexOfChar’, is like strchr() function from C, except in this case it’s aware of the length, and it doesn’t assume a ‘null’ terminated string. ‘indexOfShard’ is like ‘strstr’, or ‘strpbrk’. With these in hand, you can do a lot of ‘tokenizing’.

Here’s an example of parsing a URL:

bool parseUrl(const AShard_t &uriShard)
  AShard_t shard = uriShard;
  AShard_t rest;
  AShard_t scheme;
  AShard_t url;
  AShard_t authority;
  AShard_t hostname;
  AShard_t port;
  AShard_t resquery;
  AShard_t resource;
  AShard_t query;

  // http:
  shard.first(scheme, rest, ':');

  // the 'rest' represents the resource, which 
  // includes the authority + query
  // so try and separate authority from query if the 
  // query part exists
  shard = rest;
  // skip past the '//'
  shard.m_Offset += 2;
  shard.m_Length -= 2;

  // Now we have the url separated from the scheme
  url = shard;

  // separate the authority from the resource based on '/'
  url.first(authority, rest, '/');
  resquery = rest;

  // Break the authority into host and port
  authority.first(hostname, rest, ':');
  port = rest;

  // Back to the resource.  Split it into resource/query
  parseResourceQuery(resquery, resource, query);

  // Print the shards
  printf("URI: "); uriShard.print();
  printf("  Scheme: "); scheme.print();
  printf("  URL: "); url.print();
  printf("    Authority: "); authority.print();
  printf("      Hostname: "); hostname.print();
  printf("      Port: "); port.print();
  printf("    Resquery: "); resquery.print();
  printf("      Resource: "); resource.print();
  printf("      Query: "); query.print();

  return true;

AShard_t url0("");

Of course, I’m leaving out error checking, but even for this simple tokenization, it’s fairly robust because in most cases, if a ‘first’ fails, you’ll just gen an empty ‘rest’, but definitely not a crash.

So, how does this fair against my beloved LuaJIT? Well, at this level things are about the same. In Lua, I could create exactly the same structure, using a table, and perform exactly the same operations. Only, if I wanted to do it without using the ffi, I’d have to stuff the data into a Lua string object (which causes a copy), then use the lua string.char, count from 1, etc. totally doable, and probably fairly optimized. There is a bit of a waste though because in Lua, everything interesting is represented by a table, so that’s a much bigger data structure than this simple AShard_t. It’s bigger in terms of memory footprint, and it’s probably slower in execution because it’s a generalized data structure that can serve many wonderful purposes.

For memory management, at this level of structure, things are relatively easy. Since the shard does not copy the data, it doesn’t actually do any allocations, so there’s relatively little to cleanup. The most common use case for shards is that they’ll either be stack based, or they’ll be stuffed into a data structure. In either case, their lifetime is fairly short and well managed, so memory management isn’t a big issue. If they are dynamically allocated, then of course there’s something to be concerned with.

Well, that touches the ice berg. I’ve re-attached to C++, and so far the gag reflex hasn’t driven me insane, so I guess it’s ok to continue.

Next, I’ll explore how insanely great the world becomes when shards roam the earth.

A Dictionary with a count

The most interesting type in Lua is the table object. The table serves dual purposes. It can act as an array, as well as a hash table. As a dictionary, you can do simple things like:

local tbl = {}
tbl["alpha"] = "alpha-value"

The problem with this construction is that you can not easily find out the number of items that are within the dictionary. The easiest way is to enumerate the whole thing, and keep a count:

local count = 0;
for k,v in pairs(tbl) do
  count = count + 1;

And so, what I really want is something that acts as a simple dictionary, but gives me the ability to easily find the count. I’ve created the “Bag”, which is simply a wrapper on a table, but it gives you a count.

local Bag = {}
setmetatable(Bag, {
	__call = function(self, ...)
		return self:_new(...);

local Bag_mt = {
	__index = function(self, key)
		--print("__index: ", key)
		return self.tbl[key]

	__newindex = function(self, key, value)		
		--print("__newindex: ", key, value)
		if value == nil then
			self.__Count = self.__Count - 1;
			self.__Count = self.__Count + 1;

		--rawset(self, key, value)
		self.tbl[key] = value;

	__len = function(self)
--		print("__len: ", self.__Count)
		return self.__Count;

function Bag._new(self, obj)
	local obj = {
		tbl = {},
		__Count = 0,

	setmetatable(obj, Bag_mt);

	return obj;

Each Bag instance has a ‘tbl’ and a ‘__Count’. Within the metastable, both the ‘__newindex’ and the ‘__index’ are implemented. The ‘__newindex’ is used when you make an assignment. So, when I do:

tbl["alpha"] = "alpha-value"

The ‘__newindex’ is called. This will in turn simply put the value into the self.tbl table. While it does that, it also increments the count of values that are stored in the Bag. When you assign “nil” to an entry, this will remove it from the underlying table, and thus decrement the count.

Then there’s the ‘__len’ metamethod. This will normally return the length of an item. In the case of regular tables, as long as they are being used as arrays (contiguous indices), then it will return the number of items. In the case of a regular table being used as a dictionary, it will return 0. So, implementing it here gives the Bag the ability to use the convenient ‘#’ operator.

local Collections = require("Collections")

local names = {
	alpha = "alpha-value";
	beta = "beta-value";
	gamma = "gamma-value";

local bg = Collections.Bag();

for k,v in pairs(names) do
	print("adding: ", k, v)
	bg[k] = v;

print("Count after add: ", #bg)

bg["gamma"] = nil;

print("Count, after 1 remove: ", #bg)

print("beta: ", bg["beta"])

Lua 5.1 will not utilize the ‘__len’ metamethod, only 5.2 will do that. So, with LuaJIT, which is half way between the two, you need to make sure you compile with the -DLUAJIT_ENABLE_LUA52COMPAT flag, or you won’t get the expected behavior.

This is great, and achieves what I wanted. But, I still want it to act as a dictionary in terms of being iterable, so I should implement a __pairs metamethod, but that can wait.

So, there you have it. A quick and dirty improvement upon tables which saves me the headache and wasted CPU cycles of counting my dictionary entries.

There are many situations where I am using a simple dictionary, but I want to quickly find the count of items in the dictionary


Get every new post delivered to your Inbox.

Join 47 other followers