SVG From the Ground Up – Can you imaging that?

It’s time to put the pieces together and get to rendering some SVG!

In the last couple of installments, we were looking at how to do some very low level scanning and parsing. We got through the basics of XML, and looked at some geometry with parsing of the SVG <path> ‘d’ element. The next step is to decide on how we’re going to represent an entire document in memory so that we can render the whole image. This is a pretty big subject, so we’ll start with some design constraints and goals.

At the end of the day, I want to turn the .svg file into bits on the screen. The blend2d graphics library has a BLContext object, which does all the drawing that I need. It can draw everything from individual lines, to gradient shaded polygons, and bitmaps. SVG has some particular drawing requirements, in terms which elements are drawn first, how they are styled, how they are grouped, etc. One example of this is the usage of Cascading Style Sheets (CSS). What this means is that if I turn on one attribute, such as a fill color for polygons, that attribute will be applied to all subsequent elements in a tree drawn after it, until something changes.

Example:

<svg
 viewbox='10 10 190 10'
 xmlns="http://www.w3.org/2000/svg">
<g stroke="red" stroke-width='4'>
  <line x1='10' y1='10' x2='200' y2='200'/>
  <line stroke='green' x1='10' y1='100' x2='200' y2='200'/>
  <line stroke-width='8' stroke='blue' x1='10' y1='200' x2='200' y2='200'/>
  <rect x='100' y='10' width='50' height='50' />
</g>
</svg>

The ‘<g…’ serves as a grouping mechanism. It allows you to set some attributes which will apply to all the elements that are within that group. In this case, I set the stroke (the color used to draw lines) to ‘red’. Until something later changes this, the default will be red lines. I also set the stroke-width (number of pixels used to draw the lines). So, again, unless it is explicitly changed, all subsequent lines in the group will have this width.

The first line drawn, since it does not change the color or the width, uses red, and 4.

The second line drawn, changes the color to ‘green’, but does not change the width.

The third line drawn, changes the color to blue, and changes the width to 8

The rectangle, does not explicitly change the color, so red line drawing, with a width of 4, and a default filll color of black.

Of note, changing the attributes on a single element, such as the green line, does not change that attribute for sibling elements, it only applies to that single element. Only attributes applied at a group level will affect the elements within that group, from above.

This imposes some of our first requirements. We need an object that can contain drawing attributes. In addition, there’s a difference between objects that contain the attributes, such as stroke-width, stroke, fill, etc, and actual geometry, such as line, polygon, path. I will drop SVGObject here, as it is a baseline. If you want to follow along, the code is in the svgtypes.h file.


struct IMapSVGNodes;    // forward declaration



struct SVGObject : public IDrawable
{
    XmlElement fSourceElement;

    IMapSVGNodes* fRoot{ nullptr };
    std::string fName{};    // The tag name of the element
    BLVar fVar{};
    bool fIsVisible{ false };
    BLBox fExtent{};



    SVGObject() = delete;
    SVGObject(const SVGObject& other) :fName(other.fName) {}
    SVGObject(IMapSVGNodes* root) :fRoot(root) {}
    virtual ~SVGObject() = default;

    SVGObject& operator=(const SVGObject& other) {
        fRoot = other.fRoot;
        fName = other.fName;
        BLVar fVar = other.fVar;

        return *this;
    }

    IMapSVGNodes* root() const { return fRoot; }
    virtual void setRoot(IMapSVGNodes* root) { fRoot = root; }

    const std::string& name() const { return fName; }
    void setName(const std::string& name) { fName = name; }

    const bool visible() const { return fIsVisible; }
    void setVisible(bool visible) { fIsVisible = visible; }

    const XmlElement& sourceElement() const { return fSourceElement; }

    // sub-classes should return something interesting as BLVar
    // This can be used for styling, so images, colors, patterns, gradients, etc
    virtual const BLVar& getVariant()
    {
        return fVar;
    }

    void draw(IRender& ctx) override
    {
        ;// draw the object
    }

    virtual void loadSelfFromXml(const XmlElement& elem)
    {
        ;
    }

    virtual void loadFromXmlElement(const svg2b2d::XmlElement& elem)
    {
        fSourceElement = elem;

        // load the common attributes
        setName(elem.name());

        // call to loadselffromxml
        // so sub-class can do its own loading
        loadSelfFromXml(elem);
    }
};

As a base object, it contains the bare minimum that is common across all subsequent objects. It also has a couple of extras which have proven to be convenient, if not strictly necessary.

The strictly necessary is the ‘void draw(IRender &ctx)’. Almost all objects, whether they be attributes, or elements, will need to affect the drawing context. So, they all will need to be given a chance to do that. The ‘draw()’ routine is what gives them that chance.

All objects need to be able to construct themselves from the xml element stream, so the convenient ‘load..’ functions sit here. Whether it’s an alement, or an attribute, it has a name, so we set the name as well. Attributes can set their name independently from being loaded from the XmlElement, so this is a bit of specialization, but it’s ok.

There is this bit of an oddity in the forward declaration of ‘struct IMapSVGNodes; // forward declaration’. As we’ll see much later, we need the ability to lookup nodes based on an ID, so we need an interface somewhere that allows us to do that. As every node constructed might need to do this, we need a way to pass this interface down the tree, without copying it, and without causing circular references, so the forward declaration, and use of the ‘root()’ method.

That’s got us started. We now have something of a base object.

Next up, we have SVGVisualProperty

// SVGVisualProperty
    // This is meant to be the base class for things that are optionally
    // used to alter the graphics context.
    // If isSet() is true, then the drawSelf() is called.
	// sub-classes should override drawSelf() to do the actual drawing
    //
    // This is used for things like; Paint, Transform, Miter, etc.
    //
    struct SVGVisualProperty :  public SVGObject
    {
        bool fIsSet{ false };

        //SVGVisualProperty() :SVGObject(),fIsSet(false){}
        SVGVisualProperty(IMapSVGNodes *root):SVGObject(root),fIsSet(false){}
        SVGVisualProperty(const SVGVisualProperty& other)
            :SVGObject(other)
            ,fIsSet(other.fIsSet)
        {}

        SVGVisualProperty operator=(const SVGVisualProperty& rhs)
        {
            SVGObject::operator=(rhs);
            fIsSet = rhs.fIsSet;
            
            return *this;
        }

        void set(const bool value) { fIsSet = value; }
        bool isSet() const { return fIsSet; }

		virtual void loadSelfFromChunk(const ByteSpan& chunk)
        {
			;
        }

        virtual void loadFromChunk(const ByteSpan& chunk)
        {
			loadSelfFromChunk(chunk);
        }
        
        // Apply propert to the context conditionally
        virtual void drawSelf(IRender& ctx)
        {
            ;
        }

        void draw(IRender& ctx) override
        {
            if (isSet())
                drawSelf(ctx);
        }

    };

It’s not much, and you might question whether it needs to even exist. Maybe it’s couple of routines can just be merged into the SVGObject itself. That is a simple design changed to contemplate, as the only real attribute introduced here is the ‘isSet()’. This is essentially a way to say ‘the value is null’. If I had nullable types, I’d just use that mechanism. But, it also allows you to turn an attribute on and off programmatically, which might turn out to be useful.

Now we can look at a single attribute, the stroke-width, and see how it goes from an xmlElement attribute, to a property in our tree.

    //=========================================================
    // SVGStrokeWidth
    //=========================================================
    
    struct SVGStrokeWidth : public SVGVisualProperty
    {
		double fWidth{ 1.0};

		//SVGStrokeWidth() : SVGVisualProperty() {}
		SVGStrokeWidth(IMapSVGNodes* iMap) : SVGVisualProperty(iMap) {}
		SVGStrokeWidth(const SVGStrokeWidth& other) :SVGVisualProperty(other) { fWidth = other.fWidth; }
        
		SVGStrokeWidth& operator=(const SVGStrokeWidth& rhs)
		{
			SVGVisualProperty::operator=(rhs);
			fWidth = rhs.fWidth;
			return *this;
		}

		void drawSelf(IRender& ctx)
		{
			ctx.setStrokeWidth(fWidth);
		}

		void loadSelfFromChunk(const ByteSpan& inChunk)
		{
			fWidth = toNumber(inChunk);
			set(true);
		}

		static std::shared_ptr<SVGStrokeWidth> createFromChunk(IMapSVGNodes* root, const std::string& name, const ByteSpan& inChunk)
		{
			std::shared_ptr<SVGStrokeWidth> sw = std::make_shared<SVGStrokeWidth>(root);

			// If the chunk is empty, return immediately 
			if (inChunk)
				sw->loadFromChunk(inChunk);

			return sw;
		}

		static std::shared_ptr<SVGStrokeWidth> createFromXml(IMapSVGNodes* root, const std::string& name, const XmlElement& elem)
		{
			return createFromChunk(root, name, elem.getAttribute(name));
		}
    };

It starts from the ‘createFromXml…’. We can look at the main parsing loop later, but there is a point where we’re looking at the attributes of an element, and we’ll run across the ‘stroke-width’, and call this function. The ‘createFromChunk’ is then called, which then calls loadFromChunk after instantiating an object.

There are a couple more design choices being made here. First is the fact that we’re using ‘std::shared_ptr’. This implies heap allocation of memory, and this is the right place to finally make such a decision. We did not want the XML parser itself to do any allocations, but we’re finally at the point where we need to. It’s possible to not even do allocations here, just have the attributes allocated on the objects that use them. But, since attributes can be shared, it’s easier just to bite the bullet now, and use shared_ptr.

In the case of stroke-width, we want to save the width specified (call toNumber()), and when it comes time to apply that width, in the ‘drawSelf()’, we make the rigth call on the drawing context ‘setStrokeWidth()’. Since the same drawing context is used throughout the rendering process, setting an attribute at one point will make that attribute sticky, until something else changes it, which is the behavior that we want.

I would like to describe the ‘stroke’ and ‘fill’ attributes, but they are actually the largest portions of the library. Setting these attributes can occur in so many different ways, it’s worth taking a look at them. Here I will just show a few ways in which they can be used, so you get a feel for how involved they are:

<line stroke="blue" x1='0' y1='0'  x2='100'  y2='100'/>
<line stroke="rgb(0,0,255)" x1='0' y1='0'  x2='100'  y2='100'/> 
<line stroke="rgba(0,0,255,1.0)" x1='0' y1='0'  x2='100'  y2='100'/> 
<line stroke="rgba(0,0,100%,1.0)" x1='0' y1='0'  x2='100'  y2='100'/> 
<line stroke="rgba(0%,0%,100%,100%)" x1='0' y1='0'  x2='100'  y2='100'/> 
<line style = "stroke:blue" x1='0' y1='0'  x2='100'  y2='100'/> 
<line stroke= "url(#SVGID_1)" x1='0' y1='0'  x2='100'  y2='100'/> 

And more…

There is a bewildering assortment of ways in which you can set a stroke or fill, and they don’t all resolve to a single color value. It can be patterns, gradients, even other graphics. So, it can get pretty intense. The SVGPaint structure does a good job of representing all the possibilities, so take a look at that if you want to want to see the intimate details.

We round out our basic object strucutures by looking at how shapes are represented.

//
	// SVGVisualObject
	// This is any object that will change the state of the rendering context
	// that's everything from paint that needs to be applied, to geometries
	// that need to be drawn, to line widths, text alignment, and the like.
	// Most things, other than basic attribute type, will be a sub-class of this
	struct SVGVisualNode : public SVGObject
	{

		std::string fId{};      // The id of the element
		std::map<std::string, std::shared_ptr<SVGVisualProperty>> fVisualProperties{};

		SVGVisualNode() = default;
		SVGVisualNode(IMapSVGNodes* root)
			: SVGObject(root)
		{
			setRoot(root);
		}
		SVGVisualNode(const SVGVisualNode& other) :SVGObject(other)
		{
			fId = other.fId;
			fVisualProperties = other.fVisualProperties;
		}


		SVGVisualNode & operator=(const SVGVisualNode& rhs)
		{
			fId = rhs.fId;
			fVisualProperties = rhs.fVisualProperties;
			
			return *this;
		}
		
		const std::string& id() const { return fId; }
		void setId(const std::string& id) { fId = id; }
		
		void loadVisualProperties(const XmlElement& elem)
		{
			// Run Through the property creation routines, generating
			// properties for the ones we find in the XmlElement
			for (auto& propconv : gSVGPropertyCreation)
			{
				// get the named attribute
				auto attrName = propconv.first;

				// We have a property and value, convert to SVGVisibleProperty
				// and add it to our map of visual properties
				auto prop = propconv.second(root(), attrName, elem);
				if (prop->isSet())
					fVisualProperties[attrName] = prop;

			}
		}

		void setCommonVisualProperties(const XmlElement &elem)
		{
			// load the common stuff that doesn't require
			// any additional processing
			loadVisualProperties(elem);

			// Handle the style attribute separately by turning
			// it into a standalone XmlElement, and then loading
			// that like a normal element, by running through the properties again
			// It's ok if there were already styles in separate attributes of the
			// original elem, because anything in the 'style' attribute is supposed
			// to override whatever was there.
			auto styleChunk = elem.getAttribute("style");

			if (styleChunk) {
				// Create an XML Element to hang the style properties on as attributes
				XmlElement styleElement{};

				// use CSSInlineIterator to iterate through the key value pairs
				// creating a visual attribute, using the gSVGPropertyCreation map
				CSSInlineStyleIterator iter(styleChunk);

				while (iter.next())
				{
					std::string name = std::string((*iter).first.fStart, (*iter).first.fEnd);
					if (!name.empty() && (*iter).second)
					{
						styleElement.addAttribute(name, (*iter).second);
					}
				}

				loadVisualProperties(styleElement);
			}

			// Deal with any more attributes that need special handling
		}

		void loadSelfFromXml(const XmlElement& elem) override
		{
			SVGObject::loadSelfFromXml(elem);
			
			auto id = elem.getAttribute("id");
			if (id)
				setId(std::string(id.fStart, id.fEnd));

			
			setCommonVisualProperties(elem);
		}
		
		// Contains styling attributes
		void applyAttributes(IRender& ctx)
		{
			for (auto& prop : fVisualProperties) {
				prop.second->draw(ctx);
			}
		}
		
		virtual void drawSelf(IRender& ctx)
		{
			;

		}
		
		void draw(IRender& ctx) override
		{
			ctx.save();
			
			applyAttributes(ctx);

			drawSelf(ctx);

			ctx.restore();
		}
	};

We are building up nodes in a tree structure. The SVGVisualNode is essentially the primary node of that construction. At the end of all the tree construction, we want to end up with a root node where we can just call ‘draw(context)’, and have it render itself into the context. That node needs to deal with the Cascading Styles, children drawing in the proper order (painter’s algorithm), deal with all the attributes, and context state.

Of particular note, right there at the end is the ‘draw()’ method. It starts with ‘ctx.save()’ and finishes with ‘ctx.restore()’. This is critical to maintaining the design constraint of ‘attributes are applied locally in the tree’. So, we save the sate of the context coming in, make whatever changes we, or our children will make, then restore the state upon exit. This is the essential operation required to maintain proper application of drawing attributes. Luckily, or rather by design, the blend2d library makes saving and restoring state very fast and efficient. If the base library did not have this facility, it would be up to our code to maintain this state.

Another note here is ‘applyAttributes’. This is what allows things such as the ‘<g>’ element to apply attributes at a high level in the tree, and subsequent elements don’t have to worry about it. They can just apply the attributes that they alter. And where do those common attributes come from?

	static std::map<std::string, std::function<std::shared_ptr<SVGVisualProperty> (IMapSVGNodes *root, const std::string& , const XmlElement& )>> gSVGPropertyCreation = {
		{"fill", [](IMapSVGNodes* root, const std::string& name, const XmlElement& elem) {return SVGPaint::createFromXml(root, "fill", elem ); } }
		,{"fill-rule", [](IMapSVGNodes* root, const std::string& name, const XmlElement& elem) {return SVGFillRule::createFromXml(root, "fill-rule", elem); } }
		,{"font-size", [](IMapSVGNodes* root, const std::string& name, const XmlElement& elem) {return SVGFontSize::createFromXml(root, "font-size", elem); } }
		,{"opacity", [](IMapSVGNodes* root, const std::string& name, const XmlElement& elem) {return SVGOpacity::createFromXml(root, "opacity", elem); } }
		,{"stroke", [](IMapSVGNodes* root, const std::string& name, const XmlElement& elem ) {return SVGPaint::createFromXml(root, "stroke", elem); } }
		,{"stroke-linejoin", [](IMapSVGNodes* root, const std::string& name, const XmlElement& elem) {return SVGStrokeLineJoin::createFromXml(root, "stroke-linejoin", elem); } }
		,{"stroke-linecap", [](IMapSVGNodes* root, const std::string& name, const XmlElement& elem ) {return SVGStrokeLineCap::createFromXml(root, "stroke-linecap", elem); } }
		,{"stroke-miterlimit", [](IMapSVGNodes* root, const std::string& name, const XmlElement& elem ) {return SVGStrokeMiterLimit::createFromXml(root, "stroke-miterlimit", elem); } }
		,{"stroke-width", [](IMapSVGNodes* root, const std::string& name, const XmlElement& elem ) {return SVGStrokeWidth::createFromXml(root, "stroke-width", elem); } }
		,{"text-anchor", [](IMapSVGNodes* root, const std::string& name, const XmlElement& elem) {return SVGTextAnchor::createFromXml(root, "text-anchor", elem); } }
		,{"transform", [](IMapSVGNodes* root, const std::string& name, const XmlElement& elem) {return SVGTransform::createFromXml(root, "transform", elem); } }
};

A nice dispatch table of the most common of attributes. The ‘loadVisualProperties()’ method uses this dispatch table to load the common display properties. Individual geometry objects can load their own specific properties after this, but these are the common ones, so this is very convenient. This table can and should be expanded as even more properties can be supported.

Finally, let’s get to the meat of the geometry representation. This can be found in the svgshapes.h file.

	struct SVGPathBasedShape : public SVGShape
	{
		BLPath fPath{};
		
		SVGPathBasedShape() :SVGShape() {}
		SVGPathBasedShape(IMapSVGNodes* iMap) :SVGShape(iMap) {}
		
		
		void drawSelf(IRender &ctx) override
		{
			ctx.fillPath(fPath);
			ctx.strokePath(fPath);
		}
	};

Ignoring the SVGShape object (a small shim atop SVGObject), we have a BLPath, and a drawSelf(). What could be simpler? The general premise is that all geometry can be represented as a BLPath at the end of the day. Everything from single lines, to polygons, to complex paths, they all boil down to a BLPath. Making this object hugely simplifies the drawing task. All subsequent geometry classes just need to convert themselves into BLPath, which we’ll see is very easy.

Here is the SVGLine, as it’s fairly simple, and representative of the rest of the geometries.

struct SVGLine : public SVGPathBasedShape
	{
		BLLine fGeometry{};
		
		SVGLine() :SVGPathBasedShape(){ reset(0, 0, 0, 0); }
		SVGLine(IMapSVGNodes* iMap) :SVGPathBasedShape(iMap) {}
		
		
		void loadSelfFromXml(const XmlElement& elem) override 
		{
			SVGPathBasedShape::loadSelfFromXml(elem);
			
			fGeometry.x0 = parseDimension(elem.getAttribute("x1")).calculatePixels();
			fGeometry.y0 = parseDimension(elem.getAttribute("y1")).calculatePixels();
			fGeometry.x1 = parseDimension(elem.getAttribute("x2")).calculatePixels();
			fGeometry.y1 = parseDimension(elem.getAttribute("y2")).calculatePixels();

			fPath.addLine(fGeometry);
		}

		static std::shared_ptr<SVGLine> createFromXml(IMapSVGNodes *iMap, const XmlElement& elem)
		{
			auto shape = std::make_shared<SVGLine>(iMap);
			shape->loadFromXmlElement(elem);

			return shape;
		}

		
	};

It’s fairly boilerplate. Just have to get the right attributes turned into values for the BLLine geometry, and add that to our path. That’s it. The rect, circle, ellipse, polyline, polygon, and path objects, all do pretty much the same thing, in as small a space. These are much simpler than having to deal with the ‘stroke’ or ‘fill’ attributes. There is some trickery here in terms of parsing the actual coordinate values, because they can be represented in different kinds of units, but the SVGDimension object deals with all those details.

That’s about enough code for this time around. We’ve looked at attributes, and VisualNodes, and we know that we need cascading styles, painter’s algorithm drawing order, and an ability to draw into a context. Now we have all the pieces we need to complete the final rendering task.

Next time around, I’ll wrap it up by bringing in the SVG ‘parser’, which will combine the XML scanning with our document tree, and render final images.


SVG From the Ground Up – Along the right path

<svg xmlns="http://www.w3.org/2000/svg" width="22" height="22" viewBox="0 0 22 22">
	<path d="M20.658,9.26l0.006-0.007l-9-8L11.658,1.26C11.481,1.103,11.255,1,11,1 
	c-0.255,0-0.481,0.103-0.658,0.26l-0.006-0.007l-9,8L1.342,9.26
	C1.136,9.443,1,9.703,1,10c0,0.552,0.448,1,1,1 c0.255,0,0.481-0.103,0.658-0.26l0.006,0.007
	L3,10.449V20c0,0.552,0.448,1,1,1h5v-8h4v8h5c0.552,0,1-0.448,1-1v-9.551l0.336,0.298 
	l0.006-0.007C19.519,10.897,19.745,11,20,11c0.552,0,1-0.448,1-1C21,9.703,20.864,9.443,20.658,9.26z 
	M7,16H5v-3h2V16z M17,16h-2 v-3h2V16z"/>
</svg>

In the last installment (It’s XML, How hard could it be?), we got as far as being able to scan XML, and generate a bunch of XmlElement objects. That’s a great first couple of steps, and now the really interesting parts begin. But, first, before we get knee deep into the seriousness of the rest of SVG, we need to deal with the graphics subsystem. It’s one thing to ‘parse’ SVG, and even build up a Document Object Model (DOM). It’s quite another to actually do the rendering of the same. To do both, in a compact form, with speed and agility, that’s what we’re after.

This time around I’m going to introduce blend2d, which is the graphics library that I’m using to do all my drawing. I stumbled across blend2d a few years ago, and I don’t even remember how I found it. There are a couple of key aspects to it that are of note. One is that the API is really simple to use, and the library is easy to build. The other part, is more esoteric, but perfect for our needs here. The library was built around support for SVG. So, it has all the functions we need to build the typical kinds of graphics that we’re concerned with. I won’t go into excruciating detail about the blend2d API here, as you can visit the project on github, but I will take a look at the BLPath object, because this is the true workhorse of most SVG graphics.

The little house graphic above is typical of the kinds of little vector based icons you find all over the place. In your apps as icons, on Etsy as laser cuttable images, etc. Besides the opening ‘<svg…’, you see the ‘<path…’. SVG images are comprised of various geometry elements such as rect, circle, polyline, polygon, and path. If you really want to get into the nitty gritty details, you can check out the full SVG Specification.

The path geometry is used to describe a series of movements a pen might make on a plotter. MoveTo, LineTo, CurveTo, etc. There are a total of 20 commands you can use to build up a path, and they can used in almost any combination to create as complex a figure as you want.

    // Shaper contour Commands
    // Origin from SVG path commands
    // M - move       (M, m)
    // L - line       (L, l, H, h, V, v)
    // C - cubic      (C, c, S, s)
    // Q - quad       (Q, q, T, t)
    // A - ellipticArc  (A, a)
    // Z - close        (Z, z)
    enum class SegmentCommand : uint8_t
    {
        INVALID = 0
        , MoveTo = 'M'
        , MoveBy = 'm'
        , LineTo = 'L'
        , LineBy = 'l'
        , HLineTo = 'H'
        , HLineBy = 'h'
        , VLineTo = 'V'
        , VLineBy = 'v'
        , CubicTo = 'C'
        , CubicBy = 'c'
        , SCubicTo = 'S'
        , SCubicBy = 's'
        , QuadTo = 'Q'
        , QuadBy = 'q'
        , SQuadTo = 'T'
        , SQuadBy = 't'
        , ArcTo = 'A'
        , ArcBy = 'a'
        , CloseTo = 'Z'
        , CloseBy = 'z'
    };

A single path has a ‘d’ attribute, which contains a series of these commands strung together. It’s a very compact description for geometry. A single path can be used to generate something quite complex.

With the exception of the blue text, that entire image is generated with a single path element. Quite complex indeed.

Being able to parse the ‘d’ attribute of the path element is super critical to our success in ultimately rendering SVG. There are a few design goals we have in doing this.

  • Be as fast as possible
  • Be as memory efficient as possible
  • Do not introduce intermediate forms if possible

No big deal right? Well, as luck would have it, or rather by design, the blend2d library has an object, BLPath, which was designed for exactly this task. You can checkout the API documentation if you want to look at the details, but it essentially has all those ‘moveTo’, ‘lineTo’, etc, and a whole lot more. It only implements the ‘to’ forms, and not the ‘by’ forms, but it’s easy to get the last vertex and implement the ‘by’ forms ourselves, which we’ll do.

So, our implementation strategy will be to read a command, and read enough numbers to make a call to a BLPath object to actually create the geometry. The entirety of the code is roughly 500 lines, and most of it is boilerplate, so I won’t bother listing it all here, but you can check it out online in the parseblpath.h file.

Let’s look at a little snippet of our house path, and see what it’s doing.

M20.658,9.26l0.006-0.007l-9-8

It is hard to see in this way, so let me write it another way.

M 20.658, 9.26
l 0.006, -0.007
l -9, -8

Said as a series of instructions (and it’s hard to tell between ‘one’ and ‘el’), it would be:

Move to 20.658, 9.26
Line by 0.006, -0.007
Line by -9, -8

If I were to do it as code in blend2d, it would be

BLPath path{};
BLPoint lastPt{};

path.moveTo(20.658, 9.26);
path.getLastVertex(&lastPt);

path.lineTo(lastPt.x + 0.006, lastPt.y+ -0.007);
path.getLastVertex(&lastPt);

path.lineTo(lastPt.x + -9, lastPt.y + -8);

So, our coding task is to get from that cryptic ‘d’ attribute to the code connecting to the BLPath object. Let’s get started.

The first thing we’re going to need is a main routine that drives the process.

		static bool parsePath(const ByteSpan& inSpan, BLPath& apath)
		{
			// Use a ByteSpan as a cursor on the input
			ByteSpan s = inSpan;
			SegmentCommand currentCommand = SegmentCommand::INVALID;
			int iteration = 0;

			while (s)
			{
				// ignore leading whitespace
				s = chunk_ltrim(s, whitespaceChars);

				// If we've gotten to the end, we're done
				// so just return
				if (!s)
					break;

				if (commandChars[*s])
				{
					// we have a command
					currentCommand = SegmentCommand(*s);
					
					iteration = 0;
					s++;
				}

				// Use parseMap to dispatch to the appropriate
				// parse function
				if (!parseMap[currentCommand](s, apath, iteration))
					return false;


			}

			return true;
		}

Takes a ByteSpan and a reference to a BLPath object, and returns ‘true’ if successful, ‘false’ otherwise. There are design choices to be made at every step of course. Why did I pass in a reference to a BLPath, instead of just constructing one inside the routine, and handing it back? Well, because this way, I allow something else to decide where the memory is allocated. This way also allows you to build upon an existing path if you want.

Second choice is, why a const ByteSpan? That’s a harder one. This allows a greater number of choices in terms of where the ByteSpan is coming from, such as you might have been passed a const span to begin with. But mainly it’s a contract that says “this routine will not alter the span.

OK, so following along, we make a reference to the input span, which does NOT copy anything, just sets up a couple of pointers. Then we use this ‘s’ span to do our movement. The main ‘while’ starts with a ‘trim’. XML, and thus SVG, are full of optional whitespace. I can say that for almost every routine, the first thing you want to do is eliminate whitespace. the ‘chunk_ltrim()’ function is very short and efficient, so liberal usage of that is a good thing.

Now we’re sitting at the ‘M’, so we first check to see if it’s one of our command characters. If it is, then we use it as our current command, and advance our pointer. The ‘iteration = 0’ is only useful for the Move commands, but we need that, as we’ll soon see.

Last, we have that cryptic function call thing

				if (!parseMap[currentCommand](s, apath, iteration))
					return false;

All set! Easy peasy, our task is done here…

That last little bit of function call trickery is using a dispatch table to make a call to a function. So let’s look at the dispatch table.

		// A dispatch std::map that matches the command character to the
		// appropriate parse function
		static std::map<SegmentCommand, std::function<bool(ByteSpan&, BLPath&, int&)>> parseMap = {
			{SegmentCommand::MoveTo, parseMoveTo},
			{SegmentCommand::MoveBy, parseMoveBy},
			{SegmentCommand::LineTo, parseLineTo},
			{SegmentCommand::LineBy, parseLineBy},
			{SegmentCommand::HLineTo, parseHLineTo},
			{SegmentCommand::HLineBy, parseHLineBy},
			{SegmentCommand::VLineTo, parseVLineTo},
			{SegmentCommand::VLineBy, parseVLineBy},
			{SegmentCommand::CubicTo, parseCubicTo},
			{SegmentCommand::CubicBy, parseCubicBy},
			{SegmentCommand::SCubicTo, parseSmoothCubicTo},
			{SegmentCommand::SCubicBy, parseSmoothCubyBy},
			{SegmentCommand::QuadTo, parseQuadTo},
			{SegmentCommand::QuadBy, parseQuadBy},
			{SegmentCommand::SQuadTo, parseSmoothQuadTo},
			{SegmentCommand::SQuadBy, parseSmoothQuadBy},
			{SegmentCommand::ArcTo, parseArcTo},
			{SegmentCommand::ArcBy, parseArcBy},
			{SegmentCommand::CloseTo, parseClose},
			{SegmentCommand::CloseBy, parseClose}
		};

Dispatch tables are the modern day C++ equivalent of the giant switch statement typically found in such programs. I actually started with the giant switch statement, then said to myself, “why don’t I just use a dispatch table”. They are functionally equivalent. In this case, we have a std::map, which uses a single SegmentCommand as the key. Each element is tied to a function, that takes the same set of parameters, namely a ByteSpan, a BLPath, and an int. As you can see, there is a function for each of our 20 commands.

I won’t go into every single one of those 20 commands, but looking at a couple will be instructive. Let’s start with the MoveTo

		static bool parseMoveTo(ByteSpan& s, BLPath& apath, int& iteration)
		{
			double x{ 0 };
			double y{ 0 };

			if (!parseNextNumber(s, x))
				return false;
			if (!parseNextNumber(s, y))
				return false;

			if (iteration == 0)
				apath.moveTo(x, y);
			else
				apath.lineTo(x, y);

			iteration++;

			return true;
		}

This has a few objectives.

  • Parse a couple of numbers
  • Call the appropriate function on the BLPath object
  • Increment the ‘iteration’ parameter
  • Advance the pointer, indicating how much we’ve consumed
  • Return false on failure, true on success

This pattern is repeated for every other of the 19 functions. One thing to know about all the commands, and why the main loop is structured the way it is, you can have multiple sets of numbers after the initial set. In the case of MoveTo, the following is a valid input stream.

M 0,0 20,20 30,30 40,40

The way you treat it, in the case of MoveTo, is the initial numbers set an origin (0,0), all subsequent number pairs are implied LineTo commands. That’s why we need to know the iteration. If the iteration is ‘0’, then we need to call moveTo on the BLPath object. If the iteration is greater than 0, then we need to call lineTo on the BLPath. All the commands behave in a similar fashion, except they don’t change based on the iteration number.

Well gee whiz, that seems pretty simple and straightforward. Don’t know what all the fuss is about. Hidden within the parseMoveTo() is parseNextNumber(), so let’s take a look at that as this is where all the bugs can be found.

// Consume the next number off the front of the chunk
// modifying the input chunk to advance past the  number
// we removed.
// Return true if we found a number, false otherwise
		static inline bool parseNextNumber(ByteSpan& s, double& outNumber)
		{
			static charset whitespaceChars(",\t\n\f\r ");          // whitespace found in paths

			// clear up leading whitespace, including ','
			s = chunk_ltrim(s, whitespaceChars);

			ByteSpan numChunk{};
			s = scanNumber(s, numChunk);

			if (!numChunk)
				return false;

			outNumber = chunk_to_double(numChunk);

			return true;
		}

The comment gives you the flavor of it. Again, we start with trimming ‘whitespace’, before doing anything. This is very important. In the case of these numbers, ‘whitespace’ not only includes the typical 0x20 TAB, etc, but also the COMMA (‘,’) character. “M20,20” and “M20 20” and “M 20 20” and “M 20, 20” and even “M,20,20” are all equivalent. So, if you’re going to be parsing numbers in a sequence, you’re going to have to deal with all those cases. The easiest thing to do is trim whitespace before you start. I will point out the convenience of the charset construction. Super easy.

We trim the whitespace off the front, then call ‘scanNumber()’. That’s another workhourse routine, which is worth looking into, but I won’t put the code here. You can find it in the bspanutil.h file. I will put the comment associated with the code here though, as it’s informative.

// Parse a number which may have units after it
//   1.2em
// -1.0E2em
// 2.34ex
// -2.34e3M10,20
// 
// By the end of this routine, the numchunk represents the range of the 
// captured number.
// 
// The returned chunk represents what comes next, and can be used
// to continue scanning the original inChunk
//
// Note:  We assume here that the inChunk is already positioned at the start
// of a number (including +/- sign), with no leading whitespace

This is probably the most singularly important routine in the whole library. It has the big task of figuring out numbers from a stream of characters. Those numbers, as you can see from the examples, come in many different forms, and things can get confusing. Here’s another example of a sequence of characters it needs to be able to figure out: “M-1.7-82L.92 27”. You save yourself a ton of time, headache, and heartburn by getting this right.

The next choice you make is how to convert from the number that we scanned (it’s still just a stream of ASCII characters) into an actual ‘double’. This is the point where most programmers might throw up their hands and reach for their trusty ‘strtod’ or ye olde ‘atof’, or even ‘sprintf’. There’s a whole science to this, just know that strtod() is not your friend, and for something you ‘ll be doing millions of times, it’s worth investigating some alternatives. I highly recommend reading the code for fast_double_parser. If you want to examine what I do, checkout the chunk_to_double() routine within the bspanutil.h file.

We’re getting pretty far into the weeds down here, so let’s look at one more function, the LineTo

		static bool parseLineTo(ByteSpan& s, BLPath& apath, int& iteration)
		{
			double x{ 0 };
			double y{ 0 };

			if (!parseNextNumber(s, x))
				return false;
			if (!parseNextNumber(s, y))
				return false;

			apath.lineTo(x, y);

			iteration++;

			return true;
		}

Same as MoveTo, parse a couple of numbers, apply them to the right function on the path object, return true or false. Just do the same thing 18 more times for the other functions, and you’ve got your path ‘parser’.

To recap, parsing the ‘d’ parameter is one of the most important parts of any SVG parser. In this case, we want to get from the text to an actual object we can render, as quickly as possible. A BLPath alone is not enough to create great images, we still have a long way to go until we start seeing pretty pictures on the screen. Parsing the path is critical to getting there though. This is where you could waste tons of time and memory, so it’s worth considering the options carefully. In this case, we’ve chosen to represent the path in memory using a data structure that can be a part of a graphic elements tree, as well as being handed to the drawing engine directly, without having to transform it once again before actually drawing.

There you have it. One step closer to our beautiful images.

Next time around, we need to look at what kind of Document Object Model (DOM) we want to construct, and how our SVG parser will construct it.


SVG From the Ground Up – It’s XML, How hard could it be?

Let’s take a look at the SVG (XML) code that generates that image.

<svg height="200" width="680" xmlns="http://www.w3.org/2000/svg">
    <circle cx="70" cy="70" r="50" />
    <circle cx="200" cy="70" r="50" fill="#79C99E" />
    <circle cx="330" cy="70" r="50" fill="#79C99E" stroke-width="10" stroke="#508484" />
    <circle cx="460" cy="70" r="50" fill="#79C99E" stroke-width="10" />
    <circle cx="590" cy="70" r="50" fill="none" stroke-width="10" stroke="#508484" />
</svg>

By the end of this post, we should be able to scan through the components of that, and generate the tokens necessary to begin rendering it as SVG. So, where to start?

Last time around (SVG From the Ground Up – Parsing Fundamentals), I introduced the ByteSpan and charset data structures, as a way to say “these are the only tools you’ll need…”. Well, at least they are certainly the core building blocks. Now we’re going to actually use those components to begin the process of breaking down the XML. XML can be a daunting sprawling beast. Its origins are in an even older document technology known as SGML. The first specification for the language can be found here: Extensible Markup Language (XML) 1.0 (Fifth Edition). When I joined the team at Microsoft in 1998 to work on this under Jean Paoli, one of the original authors, there were probably 30 people across dev, test, and pm. Of course we had people working on the standards body, and I was working on XSLT, and a couple on the parser, someone on DTD schema. It was quite a production. At that time, we had to deal with myriad encodings (utf-8 did not rule the world yet), conformance and compliance test suites, and that XSLT beast (CSS did not rule the world yet). It was a daunting endeavor, and at some point we tried to color everything with XML, much to the chagrin of most other people. But, some things did come out of that era, and SVG is one of them.

Today, our task is not to implement a fully compliant validating parser. That again would take a team of a few, and a ton of testing. What we’re after is something more modest. Something a hobby hacker could throw together in a weekend, but have a fair chance at it being able to consume most of the SVG you’re ever really interested in. To that end, there’s a much smaller, simpler XML spec out there. MicroXML. This describes a subset of XML that leaves out all the really hard parts. While that spec is far more readable, we’ll go even one step simpler. With our parser here, we won’t even be supporting utf-8. That might seem like a tragic simplification, but the reality is, not even that’s needed for most of what we’ll be doing with SVG. So, here’s the list of what we will be doing.

  • Decoding elements
  • Decoding attributes
  • Decoding element content (supporting text nodes)
  • Skipping Doctype
  • Skipping Comments
  • Skipping Processing Instructions
  • Not expanding character entities (although user can)

As you will soon see “skipping” doesn’t mean you have access to the data, it just means our SVG parser won’t do anything with it. This is a nice extensibility point. We start simple, and you can add as much complexity as you want over time, without changing the fundamental structure of what we’re about to build.

Now for some types and enums. I won’t put the entirety of the code in here, so if you want to follow along, you can look at the xmlscan.h file. We’ll start with the XML element types.

    enum XML_ELEMENT_TYPE {
        XML_ELEMENT_TYPE_INVALID = 0
		, XML_ELEMENT_TYPE_XMLDECL                  // An XML declaration, like <?xml version="1.0" encoding="UTF-8"?>
        , XML_ELEMENT_TYPE_CONTENT                  // Content, like <foo>bar</foo>, the 'bar' is content
        , XML_ELEMENT_TYPE_SELF_CLOSING             // A self-closing tag, like <foo/>
        , XML_ELEMENT_TYPE_START_TAG                // A start tag, like <foo>
        , XML_ELEMENT_TYPE_END_TAG                  // An end tag, like </foo>
        , XML_ELEMENT_TYPE_COMMENT                  // A comment, like <!-- foo -->
        , XML_ELEMENT_TYPE_PROCESSING_INSTRUCTION   // A processing instruction, like <?foo bar?>
        , XML_ELEMENT_TYPE_CDATA                    // A CDATA section, like <![CDATA[ foo ]]>
        , XML_ELEMENT_TYPE_DOCTYPE                  // A DOCTYPE section, like <!DOCTYPE foo>
    };

This is where we indicate what kinds of pieces of the XML file we will recognize. If something is not in this list, it will either be reported as invalid, or it will simply cause the scanner to stop processing. From the little bit of XML that opened this article, we see “START_TAG”, “SELF_CLOSING”, “END_TAG”. And that’s it!! Simple right?

OK. Next up are a couple of data structures which are the guts of the XML itself. First is the XmlName. Although we’re not building a super conformant parser, there are some simple realities we need to be able to handle to make our future life easier. XML namespaces are one of those things. In XML, you can have a name with a ‘:’ in it, which puts the name into a namespace. Without too much detail, just know that “circle”, could have been “svg:circle”, or something, and possibly mean the same thing. We need a data structure that will capture this.

struct XmlName {
        ByteSpan fNamespace{};
        ByteSpan fName{};

        XmlName() = default;
        
        XmlName(const ByteSpan& inChunk)
        {
            reset(inChunk);
        }

        XmlName(const XmlName &other):fNamespace(other.fNamespace), fName(other.fName){}
        
        XmlName& operator =(const XmlName& rhs)
        {
            fNamespace = rhs.fNamespace;
            fName = rhs.fName;
            return *this;
        }
        
        XmlName & operator=(const ByteSpan &inChunk)
        {
            reset(inChunk);
            return *this;
        }
        
		// Implement for std::map, and ordering in general
		bool operator < (const XmlName& rhs) const
		{
			size_t maxnsbytes = std::min(fNamespace.size(), rhs.fNamespace.size());
			size_t maxnamebytes = std::min(fName.size(), rhs.fName.size());
            
			return (memcmp(fNamespace.begin(), rhs.fNamespace.begin(), maxnsbytes)<=0)  && (memcmp(fName.begin(), rhs.fName.begin(), maxnamebytes) < 0);
		}
        
        // Allows setting the name after it's been created
        XmlName& reset(const ByteSpan& inChunk)
        {
            fName = inChunk;
            fNamespace = chunk_token(fName, charset(':'));
            if (chunk_size(fName)<1)
            {
                fName = fNamespace;
                fNamespace = {};
            }
            return *this;
        }
        
		ByteSpan name() const { return fName; }
		ByteSpan ns() const { return fNamespace; }
	};

Given a ByteSpan, our universal data representation, split it out into the ‘namespace’ and ‘name’ parts, if they exist. Then we can get the name part by calling ‘name()’, and if there was a namespace part, we can get that from ‘ns()’. Why ‘ns’ instead of ‘namespace’? Because ‘namespace’ is a keyword in C/C++, and we don’t want any confusion or compiler errors.

One thing to note here is the implementation of the ‘operator <‘. Why is that there? Because if you want to use this as a keyfield in an associative container, such as std::map, you need some comparison operator, and by implementing ‘<‘, you get a quick and dirty comparison operator. This is a future enhancement we’ll use later.

Next up is the representation of an XML node itself, where we have XmlElement.

    // Representation of an xml element
    // The xml iterator will generate these
    struct XmlElement
    {
    private:
        int fElementKind{ XML_ELEMENT_TYPE_INVALID };
        ByteSpan fData{};

        XmlName fXmlName{};
        std::string fName{};
        std::map<std::string, ByteSpan> fAttributes{};

    public:
        XmlElement() {}
        XmlElement(int kind, const ByteSpan& data, bool autoScanAttr = false)
            :fElementKind(kind)
            , fData(data)
        {
            reset(kind, data, autoScanAttr);
        }

		void reset(int kind, const ByteSpan& data, bool autoScanAttr = false)
		{
            clear();

            fElementKind = kind;
            fData = data;

            if ((fElementKind == XML_ELEMENT_TYPE_START_TAG) ||
                (fElementKind == XML_ELEMENT_TYPE_SELF_CLOSING) ||
                (fElementKind == XML_ELEMENT_TYPE_END_TAG))
            {
                scanTagName();

                if (autoScanAttr) {
                    if (fElementKind != XML_ELEMENT_TYPE_END_TAG)
                        scanAttributes();
                }
            }
		}
        
		// Clear this element to a default state
        void clear() {
			fElementKind = XML_ELEMENT_TYPE_INVALID;
			fData = {};
			fName.clear();
			fAttributes.clear();
		}
        
        // determines whether the element is currently empty
        bool empty() const { return fElementKind == XML_ELEMENT_TYPE_INVALID; }

        explicit operator bool() const { return !empty(); }

        // Returning information about the element
        const std::map<std::string, ByteSpan>& attributes() const { return fAttributes; }
        
        const std::string& name() const { return fName; }
		void setName(const std::string& name) { fName = name; }
        
        int kind() const { return fElementKind; }
		void kind(int kind) { fElementKind = kind; }
        
        const ByteSpan& data() const { return fData; }

		// Convenience for what kind of tag it is
        bool isStart() const { return (fElementKind == XML_ELEMENT_TYPE_START_TAG); }
		bool isSelfClosing() const { return fElementKind == XML_ELEMENT_TYPE_SELF_CLOSING; }
		bool isEnd() const { return fElementKind == XML_ELEMENT_TYPE_END_TAG; }
		bool isComment() const { return fElementKind == XML_ELEMENT_TYPE_COMMENT; }
		bool isProcessingInstruction() const { return fElementKind == XML_ELEMENT_TYPE_PROCESSING_INSTRUCTION; }
        bool isContent() const { return fElementKind == XML_ELEMENT_TYPE_CONTENT; }
		bool isCData() const { return fElementKind == XML_ELEMENT_TYPE_CDATA; }
		bool isDoctype() const { return fElementKind == XML_ELEMENT_TYPE_DOCTYPE; }

        
        void addAttribute(std::string& name, const ByteSpan& valueChunk)
        {
            fAttributes[name] = valueChunk;
        }

        ByteSpan getAttribute(const std::string &name) const
		{
			auto it = fAttributes.find(name);
			if (it != fAttributes.end())
				return it->second;
			else
                return ByteSpan{};
		}
        
    private:
        //
        // Parse an XML element
        // We should be sitting on the first character of the element tag after the '<'
        // There are several things that need to happen here
        // 1) Scan the element name
        // 2) Scan the attributes, creating key/value pairs
        // 3) Figure out if this is a self closing element

        // 
        // We do NOT scan the content of the element here, that happens
        // outside this routine.  We only deal with what comes up the the closing '>'
        //
        void setTagName(const ByteSpan& inChunk)
        {
            fXmlName.reset(inChunk);
            fName = toString(fXmlName.name());
        }
        
        void scanTagName()
        {
            ByteSpan s = fData;
            bool start = false;
            bool end = false;

            // If the chunk is empty, just return
            if (!s)
                return;

            // Check if the tag is end tag
            if (*s == '/')
            {
                s++;
                end = true;
            }
            else {
                start = true;
            }

            // Get tag name
            ByteSpan tagName = s;
            tagName.fEnd = s.fStart;

            while (s && !wspChars[*s])
                s++;

            tagName.fEnd = s.fStart;
            setTagName(tagName);


            fData = s;
        }

        public:
        //
        // scanAttributes
        // Scans the fData member looking for attribute key/value pairs
        // It will add to the member fAttributes these pairs, without further processing.
        // This should be called after scanTagName(), because we want to be positioned
        // on the first key/value pair. 
        //
        int scanAttributes()
        {

            int nattr = 0;
            bool start = false;
            bool end = false;
            uint8_t quote{};
            ByteSpan s = fData;


            // Get the attribute key/value pairs for the element
            while (s && !end)
            {
                uint8_t* beginattrValue = nullptr;
                uint8_t* endattrValue = nullptr;


                // Skip white space before the attrib name
                s = chunk_ltrim(s, wspChars);

                if (!s)
                    break;

                if (*s == '/') {
                    end = true;
                    break;
                }

                // Find end of the attrib name.
                //static charset equalChars("=");
                auto attrNameChunk = chunk_token(s, "=");
                attrNameChunk = chunk_trim(attrNameChunk, wspChars);    // trim whitespace on both ends

                std::string attrName = std::string(attrNameChunk.fStart, attrNameChunk.fEnd);

                // Skip stuff past '=' until the beginning of the value.
                while (s && (*s != '\"') && (*s != '\''))
                    s++;

                // If we've reached end of span, bail out
                if (!s)
                    break;

                // capture the quote character
                // Store value and find the end of it.
                quote = *s;

				s++;    // move past the quote character
                beginattrValue = (uint8_t*)s.fStart;    // Mark the beginning of the attribute content

                // Skip until we find the matching closing quote
                while (s && *s != quote)
                    s++;

                if (s)
                {
                    endattrValue = (uint8_t*)s.fStart;  // Mark the ending of the attribute content
                    s++;
                }

                // Store only well formed attributes
                ByteSpan attrValue = { beginattrValue, endattrValue };

                addAttribute(attrName, attrValue);

                nattr++;
            }

            return nattr;
        }
    };

That’s a bit of a brute, but actually pretty straightforward. We need a data structure that tells us what kind of XML element type we’re dealing with. We need the name, as the content of the element held onto for future processing. We hold onto the content as a ByteSpan, but have provision for making more convenient representations. For example, we turn the name into a std::string. In the futue, we can eliminate even this, and just use the XmlName with its chunks directly.

Besides the element name, we also have the ability to split out the attribute key/value pairs, as seen in ‘scanAttributes()’. Let’s take a deeper look at the constructor.

        XmlElement(int kind, const ByteSpan& data, bool autoScanAttr = false)
            :fElementKind(kind)
            , fData(data)
        {
            reset(kind, data, autoScanAttr);
        }

		void reset(int kind, const ByteSpan& data, bool autoScanAttr = false)
		{
            clear();

            fElementKind = kind;
            fData = data;

            if ((fElementKind == XML_ELEMENT_TYPE_START_TAG) ||
                (fElementKind == XML_ELEMENT_TYPE_SELF_CLOSING) ||
                (fElementKind == XML_ELEMENT_TYPE_END_TAG))
            {
                scanTagName();

                if (autoScanAttr) {
                    if (fElementKind != XML_ELEMENT_TYPE_END_TAG)
                        scanAttributes();
                }
            }
		}

The constructor takes a ‘kind’, a ByteSpan, and a flag indicating whether we want to parse out the attributes or not. In ‘reset()’, we see that we hold onto the kind of element, and the ByteSpan. That ByteSpan contains everything between the ‘<‘ of the tag to the closing ‘>’, non-inclusive. The first thing we do is scan the tag name, so we can at least hold onto that, leaving the fData representing the rest. This is relatively low impact so far.

Why not just do this in the constructor itself, why have a “reset()”? As we’ll see later, we actually reuse XmlElement in some situations while parsing, so we want to be able to set, and reset, the same object multiple times. At least that’s one way of doing things.

Another item of note is whether you scan the attributes or not. If you do scan the attributes, you end up with a map of those elements, and a way to get the value of individual attributes.

        std::map<std::string, ByteSpan> fAttributes{};

        ByteSpan getAttribute(const std::string &name) const
		{
			auto it = fAttributes.find(name);
			if (it != fAttributes.end())
				return it->second;
			else
                return ByteSpan{};
		}

The ‘getAttribute()’ method is a most critical piece when we later start building our SVG model, so it needs to be fast and efficient. Of course, this does not have to be embedded in the core of the XmlElement, you could just as easily construct an attribute list outside of the element, but then you’d have to associate it back to the element anyway, and you end up in the same place. getAttribute() takes a name as a string, and returns the ByteSpan which is the raw, uninterpreted content of that attribute, without the enclosing quote marks. In the future, it would be nice to replace that std::string based name with a XmlName, which will save on some allocations, but we’ll stick with this convenience for now.

The stage is now set. We have our core components and data structures, we’re ready for the main event of actually parsing some content. For that, we have to make some design decisions. The first one we already made in the very beginning. We will be consuming a chunk of memory as represented in a ByteSpan. The next decision is how we want to consume? Do we want to build a Document Object Model (DOM), or some other structure? Do we just want to print out nodes as we see them? Do we want a ‘pull model’ parser, where we are in control of getting each node one by one, or a ‘push model’, where we have a callback function which is called every time a node is seen, but the primary driver is elsewhere?

My choice is to have a pull model parser, where I ask for each node, one by one, and do whatever I’m going to do with it. In terms of programming patterns, this is the ‘iterator’. So, I’m going to create an XML iterator. The fundamental structure of an iterator is this.

Iterator iter(content)
while (iter)
{
   doSomethingWithCurrentItem(*iter);
  iter++;
}

So, that’s what we need to construct for our XML. Something that can scan its input, delivering XmlElement as the individual items that we can then do something with. So, here is XmlElementIterator.

   struct XmlElementIterator {
    private:
        // XML Iterator States
        enum XML_ITERATOR_STATE {
            XML_ITERATOR_STATE_CONTENT = 0
            , XML_ITERATOR_STATE_START_TAG

        };
        
        // What state the iterator is in
        int fState{ XML_ITERATOR_STATE_CONTENT };
        svg2b2d::ByteSpan fSource{};
        svg2b2d::ByteSpan mark{};

        XmlElement fCurrentElement{};
        
    public:
        XmlElementIterator(const svg2b2d::ByteSpan& inChunk)
        {
            fSource = inChunk;
            mark = inChunk;

            fState = XML_ITERATOR_STATE_CONTENT;
            
            next();
        }

		explicit operator bool() { return !fCurrentElement.empty(); }
        
        // These operators make it operate like an iterator
        const XmlElement& operator*() const { return fCurrentElement; }
        const XmlElement* operator->() const { return &fCurrentElement; }

        XmlElementIterator& operator++() { next(); return *this; }
        XmlElementIterator& operator++(int) { next(); return *this; }
        
        // Reset the iterator to a known state with data
        void reset(const svg2b2d::ByteSpan& inChunk, int st)
        {
            fSource = inChunk;
            mark = inChunk;

            fState = st;
        }

        ByteSpan readTag()
        {
            ByteSpan elementChunk = fSource;
            elementChunk.fEnd = fSource.fStart;
            
            while (fSource && *fSource != '>')
                fSource++;

            elementChunk.fEnd = fSource.fStart;
            elementChunk = chunk_rtrim(elementChunk, wspChars);
            
            // Get past the '>' if it was there
            fSource++;
            
            return elementChunk;
        }
        
        // readDoctype
		// Reads the doctype chunk, and returns it as a ByteSpan
        // fSource is currently sitting at the beginning of !DOCTYPE
        // Note: 
        
        ByteSpan readDoctype()
        {

            // skip past the !DOCTYPE to the first whitespace character
			while (fSource && !wspChars[*fSource])
				fSource++;
            
			// Skip past the whitespace
            // to get to the beginning of things
			fSource = chunk_ltrim(fSource, wspChars);

            
            // Mark the beginning of the "content" we might return
            ByteSpan elementChunk = fSource;
            elementChunk.fEnd = fSource.fStart;

            // To get to the end, we're looking for '[]' or just '>'
            auto foundChar = chunk_find_char(fSource, '[');
            if (foundChar)
            {
                fSource = foundChar;
                foundChar = chunk_find_char(foundChar, ']');
                if (foundChar)
                {
                    fSource = foundChar;
                    fSource++;
                }
                elementChunk.fEnd = fSource.fStart;
            }
            
            // skip whitespace?
            // search for closing '>'
            foundChar = chunk_find_char(fSource, '>');
            if (foundChar)
            {
                fSource = foundChar;
                elementChunk.fEnd = fSource.fStart;
                fSource++;
            }
            
            return elementChunk;
        }
        
        
        // Simple routine to scan XML content
        // the input 's' is a chunk representing the xml to 
        // be scanned.
        // The input chunk will be altered in the process so it
        // can be used in a subsequent call to continue scanning where
        // it left off.
        bool next()
        {
            while (fSource)
            {
                switch (fState)
                {
                case XML_ITERATOR_STATE_CONTENT: {

                    if (*fSource == '<')
                    {
                        // Change state to beginning of start tag
                        // for next turn through iteration
                        fState = XML_ITERATOR_STATE_START_TAG;

                        if (fSource != mark)
                        {
                            // Encapsulate the content in a chunk
                            svg2b2d::ByteSpan content = { mark.fStart, fSource.fStart };

                            // collapse whitespace
							// if the content is all whitespace
                            // don't return anything
							content = chunk_trim(content, wspChars);
                            if (content)
                            {
                                // Set the state for next iteration
                                fSource++;
                                mark = fSource;
                                fCurrentElement.reset(XML_ELEMENT_TYPE_CONTENT, content);
                                
                                return true;
                            }
                        }

                        fSource++;
                        mark = fSource;
                    }
                    else {
                        fSource++;
                    }

                }
                break;

                case XML_ITERATOR_STATE_START_TAG: {
                    // Create a chunk that encapsulates the element tag 
                    // up to, but not including, the '>' character
                    ByteSpan elementChunk = fSource;
                    elementChunk.fEnd = fSource.fStart;
                    int kind = XML_ELEMENT_TYPE_START_TAG;
                    
                    if (chunk_starts_with_cstr(fSource, "?xml"))
                    {
						kind = XML_ELEMENT_TYPE_XMLDECL;
                        elementChunk = readTag();
                    } 
                    else if (chunk_starts_with_cstr(fSource, "?"))
                    {
                        kind = XML_ELEMENT_TYPE_PROCESSING_INSTRUCTION;
                        elementChunk = readTag();
                    }
                    else if (chunk_starts_with_cstr(fSource, "!DOCTYPE"))
                    {
                        kind = XML_ELEMENT_TYPE_DOCTYPE;
                        elementChunk = readDoctype();
                    }
                    else if (chunk_starts_with_cstr(fSource, "!--"))
                    {
						kind = XML_ELEMENT_TYPE_COMMENT;
                        elementChunk = readTag();
                    }
                    else if (chunk_starts_with_cstr(fSource, "![CDATA["))
                    {
                        kind = XML_ELEMENT_TYPE_CDATA;
                        elementChunk = readTag();
                    }
					else if (chunk_starts_with_cstr(fSource, "/"))
					{
						kind = XML_ELEMENT_TYPE_END_TAG;
						elementChunk = readTag();
					}
					else {
						elementChunk = readTag();
                        if (chunk_ends_with_char(elementChunk, '/'))
                            kind = XML_ELEMENT_TYPE_SELF_CLOSING;
					}
                    
                    fState = XML_ITERATOR_STATE_CONTENT;

                    mark = fSource;

					fCurrentElement.reset(kind, elementChunk, true);

                    return true;
                }
                break;

                default:
                    fSource++;
                    break;

                }
            }

            fCurrentElement.clear();
            return false;
        } // end of next()
    };

That code might have a face only a programmer could love, but it’s relatively simple to break down. The constructor takes a ByteSpan, and holds onto it as fSource. This ByteSpan is ‘consumed’, meaning, once you’ve iterated, you can’t go back. But, since ‘iteration’ is nothing more than moving a pointer in a ByteSpan, you can always take a ‘snapshot’ of where you’re at, and continue, but we won’t go into that right here. That’s going to be useful for tracking down where an error occured.

The crux of the iterator is the ‘next()’ method. This is where we look for the ‘<‘ character that indicates the start of some tag. The iterator runs between two states. You’re either in ‘XML_ITERATOR_STATE_CONTENT’ or ‘XML_ITERATOR_STATE_START_TAG’. Initially we start in the ‘CONTENT’ state, and flip to ‘START_TAG’ as soon as we see the character. Once in ‘START_TAG’, we try to further refine what kind of tag we’re dealing with. In most cases, we just capture the content, and that becomes the current element.

The iteration terminates when the current XmlElement (fCurretElement) is empty, which happems if we run out of input, or there’s some kind of error.

So, next() returns true or false. And our iterator does what it’s supposed to do, which is hold onto the current XmlElement that we have scanned. You can get to the contents of the element by using the dereference operator *, like this: *iter, or the arrow operator. In either case, they simply return the current element

        const XmlElement& operator*() const { return fCurrentElement; }
        const XmlElement* operator->() const { return &fCurrentElement; }

Alright, in practice, it looks like this:

#include "mmap.h"
#include "xmlscan.h"
#include "xmlutil.h"

using namespace filemapper;
using namespace svg2b2d;

int main(int argc, char** argv)
{
    if (argc < 2)
    {
        printf("Usage: pullxml <xml file>\n");
        return 1;
    }

    // create an mmap for the specified file
    const char* filename = argv[1];
    auto mapped = mmap::createShared(filename);

    if (mapped == nullptr)
        return 0;


    // 
	// Parse the mapped file as XML
    // printing out the elements along the way
    ByteSpan s(mapped->data(), mapped->size());
    
    XmlElementIterator iter(s);

    while (iter)
    {
		ndt_debug::printXmlElement(*iter);

        iter++;
    }

    // close the mapped file
    mapped->close();

    return 0;
}

That will generate the following output, where the printXmlElement() function can be found in the file xmlutil.h. The individual attributes are indicated with their name followed by ‘:’, such as ‘height:’, followed by the value of the attributed, surrounded by ‘||’ markers. Each tag kind is indicated as well.

START_TAG: [svg]
    height: ||200||
    width: ||680||
    xmlns: ||http://www.w3.org/2000/svg||
SELF_CLOSING: [circle]
    cx: ||70||
    cy: ||70||
    r: ||50||
SELF_CLOSING: [circle]
    cx: ||200||
    cy: ||70||
    fill: ||#79C99E||
    r: ||50||
SELF_CLOSING: [circle]
    cx: ||330||
    cy: ||70||
    fill: ||#79C99E||
    r: ||50||
    stroke: ||#508484||
    stroke-width: ||10||
SELF_CLOSING: [circle]
    cx: ||460||
    cy: ||70||
    fill: ||#79C99E||
    r: ||50||
    stroke-width: ||10||
SELF_CLOSING: [circle]
    cx: ||590||
    cy: ||70||
    fill: ||none||
    r: ||50||
    stroke: ||#508484||
    stroke-width: ||10||
END_TAG: [svg]

At this point, we have our XML “parser”. It can scan/parse enough for us to continue on our journey to parse and display SVG. It’s not the most robust XML parser on the planet, but it’s a good performer, very small and hopefully understandable. Usage could not be easier, and it does not impose a lot of frameworks, or pull in a lot of dependencies. We’re at a good starting point, and if all you wanted was to be able to parse some XML to do something, you could stop here and call it a day.

Next time around, we’re going to look into the SVG side of things, and sink deep into that rabbit hole.


SVG From the Ground Up – Parsing Fundamentals

Scalable Vector Graphics, is a XML based format. So, the first thing we want to do is create an XML ‘parser’. I put that in quotes because we don’t really need to create a full fledged conformant, validating, XML parser. This is the first design principle I’m going to be following. Here I want to create just enough to make things work, with an eye towards future proofing and extensibility, but not go so far as to make it absolutely bullet proof. So, I’ll be writing just enough code to scan some typical .svg files, while leaving room to swap out our quick and dirty parser for something that is more substantial in the future.

If you want to follow along the code I am constructing, you can find it in this GitHub repository: svg2b2d

To start scanning text, begins with how text is represented in the first place. This is a very core fundamental design decision. Will we be reading from files on the local machine? Will we be reading from a stream of bytes coming from the network? Will we be reading from a chunk of memory handed to us through some API within the program? I’ve decided on this last choice. These days, it’s very common to be able to read a file into memory, and operate on it from there. Similarly with networks, speeds are fast enough that you can read the entirety of the content into memory before processing. SVG is not a format where you can easily progressively render, like with raster based formats. You really do need the whole document before you can render it.

So, we’re going to be reading from memory, assuming something else has already taken care of getting the image into memory. I am writing in C++, so I’m ok with struct, and class, but I don’t necessarily want to use the iostream facilities, nor get too far down the track with templates and the like. The latest C++ (20) has a std::span object, which is very useful, and does exactly what I want, but I want the code to be a bit more portable than C++ 20, so I’m not going to use that facility, instead I’m going to somewhat replicate it.

How to represent a chunk of memory. There are two choices. You can either use a starting pointer and length, or you can use a starting and ending pointer. After much deliberation, I chose to do the latter, and use two pointers.

struct ByteSpan
{
    const unsigned char* fStart;
    const unsigned char* fEnd;
};

Throughout the code, I will use the ‘struct’ instead of ‘class’ because I’m ok with the data structure defaulting to everything being publicly accessible. There’s not a lot of sub-classing that’s going to occur here either, so I’m not as concerned about data hiding and encapsulation. This also makes the code easier to understand, without a lot of extraneous decorations.

There we have it. You have a chunk of memory, now what? Well, the most common things you do when scanning code are advance the pointer, and check the character you’re currently scanning. So, lets add these conveniences, as well as a couple of constructors, then we can do a sample.


struct ByteSpan
{
    const unsigned char * fStart{};
    const unsigned char * fEnd{};

    ByteSpan():fStart(nullptr), fEnd(nullptr){}
    ByteSpan(const char *cstr):fStart((const unsigned char *)cstr), fEnd((const unsigned char *)cstr+strlen(cstr)){}
    explicit ByteSpan(const void* data, size_t sz) 
        :fStart((const unsigned char*)data)
        , fEnd((const unsigned char*)fStart + sz) 
        {}

    // Return false when start and end are the same
    explicit operator bool() const { return (fEnd - fStart) > 0; };

    // get current value from fStart, like a 'peek' operation
    unsigned char& operator*() { 
        static unsigned char zero = 0;  
        if (fStart < fEnd) 
            return *(unsigned char*)fStart; 
        return  zero; 
    }
    
    const uint8_t& operator*() const { 
        static unsigned char zero = 0;  
        if (fStart < fEnd) 
            return *(unsigned char*)fStart; 
        return  zero; 
    }

    ByteSpan& operator++() { return operator+=(1); }	// prefix notation ++y
    ByteSpan& operator++(int i) { return operator+=(1); }   // postfix notation y++
};


With all that boiler plate code added to the structure, you can now do the following operations

ByteSpan b("Here is some text");
while (b)
{
    printf("%c",*b);
    b++;
}

And that little loop will essentially print a copy of the string you used to create the ByteSpan ‘b’. At this point, it might hardly seem worth the effort. I mean, you could just as simply use a starting pointer and ending pointer, without the intervening ByteSpan structure. Well, yes, and a lot of code out there in the world does exactly that, and it’s just fine. But, we have some future design goals which will make this little encapsulation of the two pointers very convenient. One of the design goals is worth introducing now, and that is the concept of zero, or minimal allocation. We want the scanner to be light weight, fast, minimal impact on the memory of the system. We want it to be able to parse data that is megabytes in size without having any problems. To this end, the scanner itself does no allocations, and does not alter the original memory its operating on, even though the ByteSpan would allow you to.

Alright. With this little tool in hand, what else can we do? Well, soon enough we’re going to need to compare characters, and make decisions. Is that a ‘<‘ opening to an XmlElement? Is this whitespace? Does this string end with “/>”. We have need for something that can represent a set of characters. Here is charset.

struct charset {
		std::bitset<256> bits;

		explicit charset(const char achar) { addChar(achar); }
		charset(const char* chars) { addChars(chars); }

		charset& addChar(const char achar)
		{
			bits.set(achar);
			return *this;
		}

		charset& addChars(const char* chars)
		{
			size_t len = strlen(chars);
			for (size_t i = 0; i < len; i++)
				bits.set(chars[i]);

			return *this;
		}

		charset& operator+=(const char achar) { return addChar(achar); }
		charset& operator+=(const char* chars) { return addChars(chars); }

		charset operator+(const char achar) const
		{
			charset result(*this);
			return result.addChar(achar);
		}

		charset operator+(const char* chars) const
		{
			charset result(*this);
			return result.addChars(chars);
		}

		// This one makes it look like an array
		bool operator [](const size_t idx) const { return bits[idx]; }

		// This way makes it look like a function
		bool operator ()(const size_t idx) const { return bits[idx]; }

		bool contains(const uint8_t idx) const { return bits[idx]; }
		
	};

All of that, so that we can write the following

charset wspChars(" \t\r\n\v\f");

ByteSpan b("  Now is the time for all humans to come to the aid of animals  ");

while (b)
{
    // skip whitespace
   while (wspChars.contains(*b))
        b++;

    // Create a span that will represent a word
    // start it being empty
    ByteSpan aWord = b;
    aWord.fEnd = aWord.fStart;

    // Advance while there are still characters and they are not a whitespace character
    while (b && !wspChars.contains(*b))
        b++;

    // Now we're sitting at the end of the whole span, or at the beginning of the next
    // whitespace character.  In either case, it's the end of our word
    aWord.fEnd = b.fStart;

    // Now we can do something with the word that we found
    printWord(aWord);

    // And continue around the loop until we've exhausted the byte span
}


And that’s how we start. If you want to get ahead, you can look at the code in the repository, in particular bspan.h and bspanutil.h. With these two classes alone, we will build up the XML scanning capability, and ultimately the SVG building capability on top of that. So, these are very code, and important to get right, because they will maintain the promise of “no allocations” and “be super fast”.

One question that came up in my mind was “why not just use regex and be done with it?”. Well, yes, C/C++ have regular expression capabilities, either built-in, or as some side library. There were a couple of reasons I chose not to go that route. One is about speed, the other is about allocations. It’s super easy to just store your text into a std::string object, then use regex on that. But, when you do, you’ll find that std::string objects are allocated all over the place, and you don’t have tight control of your memory consumption, which breaks one of the design tenets I’m after. The other is just the size of such code. A good regex library can easily be as big, if not bigger, than the entirety of the SVG parser we’re trying to build. I am somewhat concerned with code size, so I’d rather not have the extra bloat. Besides, all that, trying to construct regex patterns that I or anyone can maintain in the future, can be quite challenging. We’ll essentially be building bits and pieces of what would typically go into regex libraries, but we’ll only be building as much as we need, so it will stay small and tight.

And there you have it. We have begun our journey with these two first small steps, the ByteSpan, and the charset.

Next time, we’ll see how easy it is to ‘parse’ some xml, as we introduce the XmlElement and XmlElementIterator.


Creating A SVG Viewer from the ground up

In the series I did last summer (Hello Scene), I walked through the fundamentals of creating a simple graphics system, from the ground up. Putting a window on the screen, setting pixels, drawing, text, visual effects, screen captures, and all that. Along the way, I discussed various design choices and tradeoffs that I made while creating the code.

While capturing screenshots, and flipping some bits might make for a cool demo, at the end of the day, I need to create actual applications that are: robust, performant, functional, and a delight for the user to use. A lot of what we see today are “web apps”, that is, things that are created to be run in a web browser. Web Apps have a lot of HTML, CSS, Javascript, and are programmed with myriad frameworks, in multiple languages on the front end and backend. It a whole industry out there!

One question arises for me though, and perhaps a bit of envy. Why do those web apps have to look so great, with their fancy gradients, shadows, and animations, whereas my typical applications look like they’re stuck in a late 2000 computer geek movie. I’m talking about desktop apps, and why they haven’t changed much in the past 20 years. Maybe we get a splash here and there with some changes in icon styles (shardows, transparency, flat, ‘dark’), but really, the rest of the app looks and feels the same. No animations, no fancy pictures, everything is square, just no fun.

Well, to this point, I’ve been on a mission to create more engaging desktop app experiences, and it starts with the graphics. To that end, I looked out into the world and saw that SVG (Scalable Vector Graphics) would be a great place to start. Vector graphics are great. The other form of graphics are ‘bitmap’. Bitmap graphics are the realm of file formats such as ‘png’, ‘jpeg’, ‘gif’, ‘webp’, and the like. As the name implies, a ‘bitmap’ is just a bunch of dots of color in a square. There are a couple of challenges with bitmap graphics. One is that when you scale them, the thing starts to look “pixelated”. You know, they get the ‘jaggies’, and they just don’t look that great.

The second challenge you have is that the image is static. You don’t know where the keys on that keyboard are located, so being able to push them, or have them reflect music that’s playing, is quite a hard task.

In steps vector graphics. Vector Graphics contain the original drawing commands that are used to create a bitmap, at any size. With a vector graphics file, you can retain the information about colors, locations, geometry, everything that went into creating the image. This means that you can locate individual elements, name them, change them during the application, and so on.

Why don’t we just use vector graphics all the time then? Honestly, I really don’t know. I do know that one impediment to using them is being able to parse the format, and do something meaningful with it. To date, you mostly find support for SVG in web browsers, where they’re already parsing this kind of data. In that environment, you have full access to all those annotations, and furthermore, you can attach javascript code the various actions, like mouse hovering, clicking, dragging and the like. But, for the most part, desktop applications don’t participate in that world. Instead, we’re typically stuck with bitmap graphics and clunky UI builders.

To change that, the first step is parsing the .svg file format. Lucky for me, SVG is based on XML, which is the first thing I worked on at Microsoft back in 1998. I never actually wrote the parser (worked on XSLT originally), but I’m super familiar with it. So, that’s where to start.

In this series, I’m going to write a functional SVG parser, which will be capable of generating SVG based bitmap images, as well as operate in an application development environment for desktop apps. I will be using the blend2d graphics library to do all the super heavy lifting of rendering the actual images, but I will focus on what goes into writing the parser, and seamlessly integrating the results into useful desktop applications.

So, follow along over the next few installments to see how it’s done.


Looking Back to the Road Ahead

As 2022 comes to a close, I am somewhat reflective, but, mostly looking ahead. This past year was certainly tumultuous on several fronts. Coming more solidly out of Covid protocols, kids firmly back in school, life contemplated, and perhaps most impactful personally, leaving Microsoft after 24 years of service.

What did I do in my first days after leaving the company? Start coding of course! I started coding when I was 12 years old on a Commodore PET. To say ‘coding is in my blood’, would be an understatement. I have been coding longer than I’ve been able to hold coherent conversations with adults, that’s how long I’ve been coding, and I don’t see stopping any time soon.

I’ve always thought that coding is story telling. You’re telling a story, converting some sort of desire into language the computer can understand and execute. The computer, for its part, is super simplistic, with a limited vocabulary. Just think about it. How much work would you have to put into telling someone the directions to your house if you could only communicate in numbers, arithmetic, and simple logic ‘if’, ‘compare’, ‘then’. You don’t have the higher order stuff like “get on the highway, head south”. You have to go all the way back to first principles, and somehow encode the ‘highway’, and ‘head south’. And that’s why we’ve had programming languages as long as we’ve had computers, and it’s also why we’ll continue to develop more programming languages, because, this stuff is just too hard.

My recent weeks have been filled with various realizations related to the state of computing. When you have the leisure to take a step back, and just observe the state of the art in computing these days, you can gain both an appreciation, and a feeling of being overwhelmed at the same time.

It wasn’t too long ago that the company Adapteva: http://www.adapteva.com was pioneering, and pushing on a CPU architecture that had 64 64-bit RISC processors in a single package. That was the parallela computer. The experimental board was roughly raspberry pi sized, and packed quite a punch. The company did not survive, but now 64 cores is not outrageous, at least for data center class machines and workstations.

Meanwhile, nVidia, and AMD, and intel, have been pushing the compute cores in graphics processors into the hundreds and thousands. At this point, the GPU is the new CPU, with the CPU being relegated to mundane operating systems tasks such as managing memory and interacting with peripherals. Most of the computation of consequence is happening on the GPU now. And, accordingly, the GPU now commands the lion’s share of the PC price. This makes sense, as the CPU has become a commodity part, with the AMD/intel wars at a point of equilibrium. No longer can they win by juicing clock rates, now it’s all about cores, and they just keep leepfrogging. nVidia is not standing still, and will be dipping their toe into the general computing (as it relates to data centers at least) market in due time.

nVidia, long a critical piece of the High Performance Computing (HPC) scene, is pushing further down the stack. They’re driving a new three letter acronym Data Processing Unit (DPU). With a nod to madernity, and decades of experience in the HPC realm, the DPU promises to be a modern replacement for a lot of the disparate discreet pieces of computing found in and around data centers.

nVidia isn’t slouching on graphics though. Aside from their hardware, they continue to make strides in the realm of graphics algorithms. NeuralVDB is one of those areas of innovation. Improving the ability to render things like water, fire, smoke and clouds, it’s about the algorithm, and not about the hardware. Bottom line, better looking simulations, in less time, while requiring less energy. This is a great direction to go.

But this is just the graphics area related to nVidia. There has been an explosion of algorithms in the “AI” space as well. While the headliner might be OpenAI and their various efforts, such as Dall*E, which can generate any image you can imagine, there are other efforts as well. The OpenAI Whisper project is all about achieving even better voice to text translation (English primarily).

Not to be left in the dark, Google, Microsoft, Meta, even IBM and myriad researchers in companies, universities, and private labs, are all driving hard on several fronts to evolve myriad technologies. This is the ‘overwhelm’ part. One thing is sure, the pace of change is accelerating. We don’t even have to wait for the advent of ‘quantum computing’, the future is now.

The opportunities in all this are tremendous, but it takes a different perspective than we’ve had in the past to ride the waves of innovation. There will be no single winner here, at least not yet. The various algorithms and frameworks that are emerging are real game changers. Dall*E, and the like, and making it possible for everyday individuals to come up with reasonable artwork, for example. This could be a threat to those who make their living in the creative arts, or it could be a tremendous new tool to add to their arsenal. More imagination and tweaking are required to make truly brilliant art compared to the standard fair individuals such as myself might come up with.

One thing that has emerged from all this, and the thing that really gets me thinking is, conversational computing might start to emerge now. What I mean by that; Dall*E, and others, work off of prompts you type in: “A teddy bear washing dishes”. You don’t write C or JavaScript, or Renderman, you just type plain english, and the computer turns that into the image you seek. Well, what if we take that further. “Show this picture to my brother”. An always listening system, that has observed things like “brother”, and knows the context of the picture I’m talking about, and has learned myriad ways to send something to my brother, will figure out what to do, without much prompting from me. In the case of ambiguity, it will ask me questions, and I can provide further guidance.

This is going far beyond “hay Siri”, which is very limited to specific tasks. This is the confluence of AI, digital assistant, digital me, visualization and conversational computing.

When I look back over the roughly 40 years of computing that I’ve been engaged in, I see the evolution of computers from the first PC and hobbyist machines, to the super computers we all carry around in our pockets in the form of cell phones. Computing is becoming ubiquitous, as it is woven into the very fabric of our existence. Our programming is evolving, and is reaching a breaking point where we’ll stop using specialized languages to ‘program’ the machine, and we’ll begin to have conversations with them instead, voicing our desires and intents rather than giving explicit instructions.

It’s been a tremendous year, with many changes. I am glad I’ve had the opportunity to lift my head up, have a look around, and dive in a new direction leveraging all the magic that is being created.


Hello Scene – Conclusion

demo scene

I have been writing code since about 1978, when I first had access to a Commodore PET computer. For me, it’s always been about having fun, doing things with the machine that might not be obvious, and certainly not achievable on my own. Over the years, I’ve picked up some tips and tricks to help me get to the interesting parts sooner rather than later. During the 1980s-1990s, there was a ‘demo scene’ wherein coders such as myself, were engaged in trying to push our ‘personal computers’ to the limit in terms of what they could do visually, and with audio. This demo scene was often centered around computers such as the Commodore 64, or Apple II, or the venerable Commodore Amiga. The demo scene days are largely gone, and computers are a few orders of magnitude more powerful than those early personal computers.

And yet…

I still get excited to create quick and dirty programs that really push the limits of what you can do with the modern personal computer. Modern day programming is super heavy with frameworks, operating systems, SDKs and libraries. We are several levels removed from the core of the machine, which the demo scene of yore leveraged to great effect. But, the machine is still down there, waiting to be unlocked. With some esoteric knowledge, and some good habits and insights, we can begin to unlock all that the computer has to offer, and make creations quickly and easily, for fun and profit.

At the end of June of 2022, I decided I wanted to share some of this low level esoteric knowledge, because why should I have all the fun. So, I began on a series of tutorials to show how, the average programmer, can start to conquer some typically low level stuff. In brief, the series is about how to create quick and dirty programs using the C/C++ programming language on the Windows platform. Included in the series is everything from how to put a window up on the screen, to how to display text along an animated bezier curve. I avoid the typical frameworks and libraries, and use a fairly minimal amount of OS features. Without much work, the tutorials can apply to just about any platform where you have access to the graphics screen, mouse, and keyboard.

Along the way, I share various design decisions that I’ve made, as well as the reasoning behind doing things simple and cheap, rather than relying on giant frameworks. In the end, you could pick up where this series left off and create your own demos, or simply use it as inspiration for generating your own things that are small and fun to play with.

Here are the links to the various tutorials. They rely on my minwe github repository, so it’s pretty easy to follow along if you want to look at the code in full.

Have You Scene My Demo?

Hello Scene – Win32 Wrangling

Hello Scene – What’s in a Window?

Hello Scene – Events, Organization, more drawing

Hello Scene – All the pretty little things

Hello Scene – Screen Captures for Fun and Profit

Hello Scene – It’s all about the text

I can only hope these tutorials give someone a fresh new perspective on one aspect or another of the coding process, and if they’re like my younger self, gives them some tools so they can create their own wild creation.


Hello Scene – It’s all about the text

That’s a lot of fonts. But, it’s a relatively simple task to achieve once we’ve gained some understanding of how to deal with text. We’ll park this bit of code here (fontlist.cpp) while we gain some understanding.

#include "gui.h"
#include "fontmonger.h"

std::list<std::string> fontList;

void drawFonts()
{
	constexpr int rowHeight = 24;
	constexpr int colWidth = 213;

	int maxRows = canvasHeight / rowHeight;
	int maxCols = canvasWidth / colWidth;

	int col = 0;
	int row = 0;

	std::list<std::string>::iterator it;
	for (it = fontList.begin(); it != fontList.end(); ++it) 
	{
		int x = col * colWidth;
		int y = row * rowHeight;

		textFont(it->c_str(), 18);
		text(it->c_str(), x, y);

		col++;
		if (col >= maxCols)
		{
			col = 0;
			row++;
		}
	}
}

void setup()
{
	setCanvasSize(1280, 1024);

	FontMonger::collectFontFamilies(fontList);

	background(PixelRGBA(0xffdcdcdc));

	drawFonts();
}

I must say, dealing with fonts, and text rendering is one of the most challenging of the graphics disciplines. We could spend years and gigabytes of text explaining the intricacies of how fonts and text work. For our demo scene, we’re not going to get into all that though. We just want a little bit of text to be able to splash around here and there. So, I’m going to go the easy route, and explain how to use the system text rendering and incorporate it into the rest of our little demo framework.

First of all, some terminology. These words; Font, Font Face, OpenType, Points, etc, are all related to fonts, and all can cause confusion. So, let’s ignore all that for now, and just do something simple.

And the code to make it happen?

#include "gui.h"

void setup()
{
	setCanvasSize(320, 240);
	background(PixelRGBA (0xffffffff));		// A white background

	text("Hello Scene!", 24, 48);
}

Pretty simple right? By default, the demo scene chooses the “Segoe UI” font at 18 pixels high to do text rendering. The single call to “text(…)”, puts whatever text you want at the x,y coordinates specified afterward. So, what is “Segoe UI”? A Font describes the shape of a character. So, the letter ‘A’ in one font looks one way in say “Times New Roman”, and probably slightly different in “Tahoma”. These are stylistic differences. Us humans will just recognize it as ‘A’. Each font contains a bunch of descriptions of how to draw individual characters. These descriptions are essentially just polygons, with curves, and straight lines.

I’m grossly simplifying.

The basic description can be scaled, rotated, printed in ‘bold’, ‘italics’, or ‘underline’, depending on what you want to do when you’re displaying text. So, besides just saying where we want text to be located, we can specify the size (in pixels), and choose a specific font name other than the default.

Which was produced with a slight change in the code

#include "gui.h"

void setup()
{
	setCanvasSize(640, 280);
	background(PixelRGBA (0xffffffff));		// A white background

	textFont("Sitka Text", 100);
	text("Hello My Scene!", 24, 48);
}

And last, you can change the color of the text

How exciting is that?! For the simplest of demos, and maybe even some UI framework, this might be enough. But, le’ts go a little bit further, and get some more functions that might be valuable.

First thing, we need to understand a little bit more about the font, like how tall and wide characters are, where’s the baseline, the ascent, and descent. Character width and height are easily understood. Ascent and descent might not be as well understood. Let’s start with a little display.

Some code to go with it

#include "gui.h"

constexpr int leftMargin = 24;
constexpr int topMargin = 24;


void drawTextDetail()
{
    // Showing font metrics
	const char* str2 = "My Scene!";
	PixelCoord sz;
	textMeasure(sz, str2);

	constexpr int myTop = 120;

	int baseline = myTop + fontHeight - fontDescent;
	int topline = myTop + fontLeading;

	strokeRectangle(*gAppSurface, leftMargin, myTop, sz.x(), sz.y(), PixelRGBA(0xffff0000));

	// Draw internalLeading - green
	copySpan(*gAppSurface, 
        leftMargin, topline, sz.x(), 
        PixelRGBA(0xff00ff00));

	// draw baseline
	copySpan(*gAppSurface, 
        leftMargin, baseline, sz.x(), 
        PixelRGBA(0xff0000ff));

	// Draw text in the box
    // Turquoise Text
	textColor(PixelRGBA(0xff00ffff));	
	text("My Scene!", leftMargin, myTop);
}

void setup()
{
	setCanvasSize(640, 280);
	background(PixelRGBA (0xffffffff));

	textFont("Sitka Text", 100);

	drawTextDetail();
}

In the setup, we do the usual to create a canvas of a desirable size. Then we select the font with a particular pixel height. Then wave our hands and call ‘drawDetail()’.

In ‘drawDetail()’, one of the first calls is to ‘textMeasure()’. We want the answer to; “How many pixels wide and high is this string?” The ‘textMeasure()’ function does this. It’s pretty straight forward as the GDI API that we’re using for text rendering has a function call for this purpose.

void textMeasure(PixelCoord& pt, const char* txt)
{
    SIZE sz;
    ::GetTextExtentPoint32A(gAppSurface->getDC(), txt,strlen(txt), &sz);

    pt[0] = sz.cx;
    pt[1] = sz.cy;
}

It’s that simple. Just pass in a structure to receive the size, and make the call to ‘GetTextExtentPoint32A()’. I choose to return the value in a PixelCoord object, because I don’t want the Windows specific data structures bleeding into my own demo API. This allows me to change the underlying text API without having to worry about changing dependent data structures.

The size that is returned incorporates a few pieces of information. It’s not a tight fit to the string. The size is derived from a combination of global font information (tallest character, lowest part of a character), as well as the cumulative widths of the actual characters specified. In the case of our little demo, the red rectangle represents the size that was returned.

There are a couple more bits of information that are set when you select a font of a particular size. The three most important bits are, the ascent, descent, and internal leading.

Let’s start from the descent. Represented by the blue line, this is the maximum amount any given character of the font might fall below the ‘baseline’. The baseline is implicitly defined by this descent, and it essentially the fontHeight-fontDescent. This is the line where all the other characters will have as their ‘bottom’. The ‘ascent’ is the amount of space above this baseline. So, the total fontHeight is the fontDescent+fontAscent. The ascent isn’t explicitly shown, because it is essentially the topline of the rectangle. The last bit is the internalLeading. This is the amount of space used by accent characters and the like. The fontLeading is this number, and is represented as the green line, as it’s essentially subtracted from the fontHeight in terms of coordinates.

And there you have it. All the little bits and pieces of a font. When you specify a location for drawing the font in the ‘text()’ function, you’re essentially specifying the top left corner of this red rectangle. Of course, that leaves you a bit high and dry when it comes to precisely placing your text. More than likely, what you really want to do is place your text according to the baseline, so that you can be more assured of where your text is actually going to show up. Maybe you want that, maybe you don’t. What you really need is the flexibility to specify the ‘alignment’ of your text rendering.

This is actually a re-creation of something I did about 10 years ago, for another project. It’s a pretty simple matter once you have adequate font and character sizing information.

#include "gui.h"
#include "textlayout.h"

TextLayout tLayout;

void drawAlignedText()
{
	int midx = canvasWidth / 2;
	int midy = canvasHeight / 2;

	// draw vertical line down center of canvas
	line(*gAppSurface, midx, 0, midx, canvasHeight - 1, PixelRGBA(0xff000000));

	// draw horizontal line across canvas
	line(*gAppSurface, 0, midy, canvasWidth - 1, midy, PixelRGBA(0xff000000));

	tLayout.textFont("Consolas", 24);
	tLayout.textColor(PixelRGBA(0xff000000));

	tLayout.textAlign(ALIGNMENT::LEFT, ALIGNMENT::BASELINE);
	tLayout.text("LEFT", midx, 24);

	tLayout.textAlign(ALIGNMENT::CENTER, ALIGNMENT::BASELINE);
	tLayout.text("CENTER", midx, 48);

	tLayout.textAlign(ALIGNMENT::RIGHT, ALIGNMENT::BASELINE);
	tLayout.text("RIGHT", midx, 72);

	tLayout.textAlign(ALIGNMENT::RIGHT, ALIGNMENT::BASELINE);
	tLayout.text("SOUTH EAST", midx, midy);

	tLayout.textAlign(ALIGNMENT::LEFT, ALIGNMENT::BASELINE);
	tLayout.text("SOUTH WEST", midx, midy);

	tLayout.textAlign(ALIGNMENT::RIGHT, ALIGNMENT::TOP);
	tLayout.text("NORTH EAST", midx, midy);

	tLayout.textAlign(ALIGNMENT::LEFT, ALIGNMENT::TOP);
	tLayout.text("NORTH WEST", midx, midy);
}

void setup()
{
	setCanvasSize(320, 320);

	tLayout.init(gAppSurface);

	background(PixelRGBA(0xffDDDDDD));

	drawAlignedText();
}

Design-wise, I chose to stuff the various text measurement and rendering routines into a separate object. My other choice would have been to put them into the gui.h/cpp file, and I did do that initially, but then I thought better of it, because that would be forcing a particular strong opinion on how text should be dealt with, and I did not make that choice for drawing in general, so I thought better of it and chose to encapsulate the text routines in this layout structure (textlayout.h) .

Now that we have the ability to precisely place a string, we can get a little creative in playing with the displacement of all the characters in a string. With that ability, we can have text placed based on the evaluation of a function, with animation of course.

#include "gui.h"
#include "geotypes.hpp"
#include "textlayout.h"

using namespace alib;

constexpr int margin = 50;
constexpr int FRAMERATE = 20;

int dir = 1;				// direction
int currentIteration = 1;	// Changes during running
int iterations = 30;		// increase past frame rate to slow down
bool showCurve = true;
TextLayout tLayout;

void textOnBezier(const char* txt, GeoBezier<ptrdiff_t>& bez)
{
	int len = strlen(txt);

	double u = 0.0;
	int offset = 0;
	int xoffset = 0;

	while (txt[offset])
	{
		// Isolate the current character
		char buf[2];
		buf[0] = txt[offset];
		buf[1] = 0;

		// Figure out the x and y offset
		auto pt = bez.eval(u);

		// Display current character
		tLayout.text(buf, pt.x(), pt.y());

		// Calculate size of current character
		// so we can figure out where next one goes
		PixelCoord charSize;
		tLayout.textMeasure(charSize, buf);

		// Now get the next value of 'u' so we 
		// can evaluate where the next character will go
		u = bez.findUForX(pt.x() + charSize.x());

		offset++;
	}

}

void strokeCurve(PixelMap& pmap, GeoBezier<ptrdiff_t> &bez, int segments, const PixelRGBA c)
{
	// Get starting point
	auto lp = bez.eval(0.0);

	int i = 1;
	while (i <= segments) {
		double u = (double)i / segments;

		auto p = bez.eval(u);

		// draw line segment from last point to current point
		line(pmap, lp.x(), lp.y(), p.x(), p.y(), c);

		// Assign current to last
		lp = p;

		i = i + 1;
	}
}

void onFrame()
{
	background(PixelRGBA(0xffffffff));

	int y1 = maths::Map(currentIteration, 1, iterations, 0, canvasHeight);

	GeoCubicBezier<ptrdiff_t> bez(margin, canvasHeight / 2, 
        canvasWidth * 0.25, y1, 
        canvasWidth - (canvasWidth * 0.25), canvasHeight -y1, 
        canvasWidth - margin, canvasHeight / 2.0);
	
	if (showCurve)
		strokeCurve(*gAppSurface, bez, 50, PixelRGBA(0xffff0000));

	// honor the character spacing
	tLayout.textColor(PixelRGBA(0xff0000ff));
	textOnBezier("When Will The Quick Brown Fox Jump Over the Lazy Dogs Back", bez);


	currentIteration += dir;

	// reverse direction if needs be
	if ((currentIteration >= iterations) || (currentIteration <= 1))
		dir = dir < 1 ? 1 : -1;
}

void setup()
{
	setCanvasSize(800, 600);
	setFrameRate(FRAMERATE);

	tLayout.init(gAppSurface);
	tLayout.textFont("Consolas", 24);
	tLayout.textAlign(ALIGNMENT::CENTER, ALIGNMENT::CENTER);
}


void keyReleased(const KeyboardEvent& e) 
{
	switch (e.keyCode) {
	case VK_ESCAPE:
		halt();
		break;

	case VK_SPACE:
		showCurve = !showCurve;
		break;

	case 'R':
		recordingToggle();
		break;
	}
}

For once, I won’t go line by line. The key trick here is the ‘findUForX()’ function of the bezier object. Since textMeasure() tells us how wide a string is (in pixels), we know how much to advance in the x direction as we display characters. Our bezier curve has an eval() function, which takes a value from 0.0 to 1.0. It will return a ‘y’ value along the curve when given a ‘u’ value between 0 and 1 to evaluate. So, we want to match the x offset of the next character with its corresponding ‘u’ value along the curve, then we can evaluate the curve at that position, and find out the appropriate ‘y’ value.

Notice in the setup, the text alignment is set to CENTER, CENTER. This means that the coordinate positions being calculated should represent the center of the characters being printed. That roughly leaves the center of the character aligned with the evaluated values of the curve, which will match your expectations most closely. Another way to do it might be to do LEFT, BASELINE, to get the characters left aligned, and use the curve as the baseline. There are a few possibilities, and you can simply choose what suits your needs.

This is a very crude way to do some displayment of text on a curve, but, showing text along a path is a fairly common parlor trick in demo applications, and this is one way to doing it quick and dirty. Your curves doesn’t have to be a bezier, it could be anything you like. Just take it one character at a time, and use the textAlignment, and see what you can accomplish.

There is a design choice here. I am using simple GDI based interfaces to display the text. I can do this because at the core, the PixelArray that we’re drawing into does in fact have a “DeviceContext”, so GDI knows how to draw into it. This is a great convenience, because it means that we can do all the independent drawing that we’ve been doing, from random pixels to bezier curves, and when we get to something we can’t quite handle, we can fall back to what the system provides, in this case text rendering.

With that, we’re at the end of this series. We’ve gone from a basic window on the screen, to drawing text along an animating bezier curve, all while recording to a .mpg file. This is just the beginning. We’ve covered some design choices along the way, including the desire to keep the code small and composable. The only thing left to do is go out and create something of your own, by using this kind of toolkit, or better yet, have the confidence to create your own.

The Demo Scene is out there. Go create something.


Hello Scene – Screen Captures for Fun and Profit

Being able to capture the display screen opens up some interesting possibilities for our demo scenes.

In this particular case, my demo app is capturing a part of my screen, and using it as a ‘texture map’ on a trapezoid, and compositing that onto a perlin noise background. The capture is live, as we’ll see shortly, but first, the code that does this little demo (sampmania.cpp).


#include "gui.h"
#include "sampledraw2d.h"
#include "screensnapshot.h"
#include "perlintexture.h"

ScreenSnapshot screenSamp;

void onFrame()
{
	// Take current snapshot of screen
	screenSamp.next();

	// Trapezoid
	PixelCoord verts[] = { PixelCoord({600,100}),PixelCoord({1000,100}),PixelCoord({1700,800}),PixelCoord({510,800}) };
	int nverts = 4;
	sampleConvexPolygon(*gAppSurface, 
		verts, nverts, 0, 
		screenSamp, 
		{ 0,0,canvasWidth, canvasHeight });

}

void setup()
{
	setCanvasSize(1920, 1080);
	setFrameRate(15);

	// Draw noisy background only once
	NoiseSampler perlinSamp(4);
	sampleRectangle(*gAppSurface, gAppSurface->frame(), perlinSamp);

	// Setup the screen sampler
	// Capture left half of screen
	screenSamp.init(0, 0, displayWidth / 2, displayHeight);
}

Pretty standard fair for our demos. There are a couple of new concepts here though. One is a sampler, the other is the ScreenSnapshot object. Let’s first take a look at the ScreenSnapshot object. The idea here is we want to take a picture of what’s on the screen, and make it available to the program in a PixelArray, which is how we represent pixel images in general. If we can do that, we can further use the screen snapshot just like the canvas. We can draw on it, save it, whatever.

On the Windows platform, there are 2 or 3 ways to take a snapshot of the display screen. Each method comes from a different era of the evolution of the Windows APIs, and has various benefits or limitations. In this case, we use the most ancient method for taking a snapshot, relying on the good old GDI API to do the work, since it’s been reliable all the way back to Windows 3.0.

#pragma once
// ScreenSnapshot
//
// Take a snapshot of a portion of the screen and hold
// it in a PixelArray (User32PixelMap)
//
// When constructed, a single snapshot is taken.
// every time you want a new snapshot, just call 'next()'
// This is great for doing a live screen capture
//
//    ScreenSnapshot ss(x,y, width, height);
//
//    References:
//    https://www.codeproject.com/articles/5051/various-methods-for-capturing-the-screen
//    https://stackoverflow.com/questions/5069104/fastest-method-of-screen-capturing-on-windows
//  https://github.com/bmharper/WindowsDesktopDuplicationSample
//

#include "User32PixelMap.h"

class ScreenSnapshot : public User32PixelMap
{
    HDC fSourceDC;  // Device Context for the screen

    // which location on the screen are we capturing
    int fOriginX;   
    int fOriginY;


public:
    ScreenSnapshot()
        : fSourceDC(nullptr)
        , fOriginX(0)
        , fOriginY(0)
    {}

    ScreenSnapshot(int x, int y, int awidth, int aheight, HDC srcDC = NULL)
        : User32PixelMap(awidth, aheight),
        fOriginX(x),
        fOriginY(y)
    {
        init(x, y, awidth, aheight, NULL);

        // take at least one snapshot
        next();
    }

    bool init(int x, int y, int awidth, int aheight, HDC srcDC=NULL)
    {
        User32PixelMap::init(awidth, aheight);

        if (NULL == srcDC)
            fSourceDC = GetDC(nullptr);
        else
            fSourceDC = srcDC;

        fOriginX = x;
        fOriginY = y;

        return true;
    }

    // take a snapshot of current screen
    bool next()
    {
        // copy the screendc into our backing buffer
        // getDC retrieves the device context of the backing buffer
        // which in this case is the 'destination'
        // the fSourceDC is the source
        // the width and height are dictated by the width() and height() 
        // and the source origin is given by fOriginX, fOriginY
        // We use the parameters (SRCCOPY, CAPTUREBLT) because that seems 
        // to be best practice in this case
        BitBlt(getDC(), 0, 0, width(), height(), fSourceDC, fOriginX, fOriginY, SRCCOPY | CAPTUREBLT);

        return true;
    }
};

There’s really not much to it. The real working end of it is the ‘next()’ function. That function call to ‘BitBlt()’ is where all the magic happens. That’s a Graphics Device Interface (GDI) system call, which will copy from one “DeviceContext” to another. A DevieContext is a Windows construct that represents the interface for drawing into something. This interface exists for screens, printers, or bitmaps in memory. Very old, very basic, very functional.

So, the basics are, get a ‘DeviceContext’ for the screen, and another ‘DeviceContext’ for a bitmap in memory, and call BitBlt to copy pixes from one to the other.

Also, notice the ScreenSnapshot inherits from User32PixelMap. We first saw this early on in this series (What’s In a Window), when we were first exploring how to put pixels up on the screen. We’re just leveraging what was built there, which was essentially a Windows Bitmap.

OK, so bottom line, we can take a picture of the screen, and put it into a bitmap, that we can then use in various ways.

Here’s the movie

Well, isn’t that nifty. You might notice that if you query the internet for “screen capture”, you’ll find links to tons of products that do screen capture, and recording. Finding a library that does this for you programmatically is a bit more difficult. One method that seems to pop up a lot is to capture the screen to a file, or to the clipboard, but that’s not what you want, you just want it in a bitmap ready to go, which is what we do here.

On Windows, a more modern method is to use DirectX, because that’s the preferred interface of modern day Windows. The GDI calls under the covers probably call into DirectX. The benefit of using this simple BitBlt() method is that you don’t have to increase your dependencies, and you don’t need to learn a fairly complicated interface layer, just to capture the screen.

I’ve used a complex image here, mainly to draw attention to this subject, but really, the screen capturing and viewing can be much simpler.

Just a straight up view, without any geometric transformation, other than to fit the rectangle.

Code that looks very similar, but just using a simple mapping to a rectangle, rather than a trapezoid. This is from screenview.cpp

//
// screenview
// Simplest application to do continuous screen capture
// and display in another window.
//
#include "gui.h"

#include "screensnapshot.h"

ScreenSnapshot screenSamp;

void onFrame()
{
    // Get current screen snapshot
    screenSamp.next();

    // Draw a rectangle with snapshot as texture
    sampleRectangle(*gAppSurface,gAppSurface->frame(),screenSamp);
}

// Do application setup before things get
// going
void setup()
{
    // Setup application window
	setCanvasSize(displayWidth/2, displayHeight);

    // setup the snapshot
    screenSamp.init(0, 0, displayWidth / 2, displayHeight);
}

void keyReleased(const KeyboardEvent& e) {
    switch (e.keyCode)
    {
    case VK_ESCAPE:
        halt();
        break;

    case 'R':
    {
        recordingToggle();
    }
    break;
    }
}

Capturing the screen has additional benefit for our demo scenes. One little used feature of Windows is the fact you can use translucency, and transparency. As such, you can display rather interesting things on the display. Using the recording technique where we just capture what’s on our canvas won’t really capture what the user will see. You’ll only capture what you’re drawing in your own buffer. In order to capture the fullness of the demo, you need to capture what’s on the screen.

And just to kick it up a notch, and show off some other things you can do with transparency…

In both these cases of the chasing balls, as well as the transparent keyboard, there is a function call within the demo scene ‘layered()’. If you call this in your setup, then your window won’t have any sort of border, and if you use transparency in your colors, they’ll be composited with whatever is on the desktop.

You can go one step further (in the case of the chasing balls), and call ‘fullscreen()’, which will essentiallly do a: setCanvas(displayWidth, displayHeight); layered();

There is one additional call, which allows you to retain your window title bar (for moving around and closing), but sets a global transparency level for your window ‘windowOpacity(double)’, which takes a value between 0.0 (fully transparent), and 1.0 (fully opaque).

And of course the demo code for the disappearing rectangles trick.

#include "apphost.h"
#include "draw.h"
#include "maths.hpp"

using namespace maths;

bool outlineOnly = false;
double opacity = 1.0;

INLINE PixelRGBA randomColor(uint32_t alpha=255)
{
	uint32_t r = random_int(255);
	uint32_t g = random_int(255);
	uint32_t b = random_int(255);

	return { r,g,b,alpha };
}

void handleKeyboardEvent(const KeyboardEventTopic& p, const KeyboardEvent& e)
{
	if (e.keyCode == VK_ESCAPE)
		halt();

	if (e.keyCode == VK_SPACE)
		outlineOnly = !outlineOnly;

	if (e.keyCode == VK_UP)
		opacity = maths::Clamp(opacity + 0.05, 0.0, 1.0);

	if (e.keyCode == VK_DOWN)
		opacity = maths::Clamp(opacity - 0.05, 0.0, 1.0);

	windowOpacity(opacity);
}

void onLoop()
{
	PixelRGBA stroke;
	PixelRGBA fill;
	PixelRGBA c;

	gAppSurface->setAllPixels(PixelRGBA(0x0));

	for (int i = 1; i <= 2000; i++)
	{
		int x1 = random_int(canvasWidth - 1);
		int y1 = random_int(canvasHeight - 1);
		int lwidth = random_int(4, 60);
		int lheight = random_int(4, 60);

		c = randomColor(192);

		if (outlineOnly)
		{
			stroke = c;
			draw::rectangle_copy(*gAppSurface, x1, y1, lwidth, lheight, c);

		}
		else
		{
			fill = c;
			//draw::rectangle_copy(*gAppSurface, x1, y1, lwidth, lheight, c);
			draw::rectangle_blend(*gAppSurface, x1, y1, lwidth, lheight, c);
		}
	}

	refreshScreen();
}

void onLoad()
{
	subscribe(handleKeyboardEvent);

	setCanvasSize(800, 800);
}

Well, that’s a lot of stuff, but mostly we covered various forms of screen capture, what you can do with it, and why recording just your own drawing buffer doesn’t show the full fidelity of your work.

We also covered a little bit of Windows wizardry with transparent windows, a very little known or used feature, but we can use it to great advantage for certain kinds of apps.

From a design perspective, I chose to use an ancient, but still supported API call, because it has the least number of dependencies, is the easiest of all the screen capture methods to understand and implement, and it uses the smallest amount of code.

Another thing of note for this demo framework is the maximum usage of ‘.h’ files. In each demo sample, there’s typically only 2 or 3 ‘.cpp’ files, and NO .dll files. This is again for simplicity and portability. You could easily put things in a library, and having appmain.cpp in a .exe file would even work, but that leads down a different path. Here, we just make every demo self contained, compiling all the code needed right then and there. This works out when your file count is relatively small (fewer than 10), and you’re working on a small team (fewer than 5). This probably does not scale as well beyond that.

But, there you have it. We’ve gone all the way from putting a single pixel on the screen, to displaying complex deometries with animation in transparent windows. The only thing left in this series is to draw some text, and call it a wrap.

So, next time.


Hello Scene – All the pretty little things

Wait, what? Well, yah, why not?

One of the joys I’ve had as a programmer over the years has been to read some paper, or some article, and try out the code for myself. Well, ray tracing has been a love of mine since the early 90s, when I first played with POV Ray.

Back in the day, Peter Shirley introduced ray tracing to an audience of eager programmers through the book: Ray Tracing in One Weekend. There were two subsequent editions that followed, exploring various optimizations and improvements. For the purposes of my demo scene here, I wanted to see how hard it was to integrate, and how big the program would be. So, here’s what the integration looks like:


#include "scene_final.h"

#include "gui.h"

scene_final mainScene;


void onFrame()
{
    if (!mainScene.renderContinue())
    {
        recordingStop();
        halt();
    }
}

void setup() 
{
    setCanvasSize(mainScene.image_width, mainScene.image_height);

    mainScene.renderBegin();
    recordingStart();
}

Just your typical demo scene, using the gui.h approach. I implement setup(), to create the screen size, initializer the raytrace renderer, and begin the screen recording.

In ‘onFrame()’, I tell the renderer to continue, as it will render only one scanline at a time on the canvas. It will return false when there are no more lines to be rendered, and that’s when I just stop the program. How did I get the screen capture? Just comment out that ‘halt()’ for one fun, then take a screen snapshot.

I did have to make two alterations to the original Raytrace Weekend code. One was to the scene renderer. I had to split out the initialization code (for convenience), and I had to break the ‘render()’ into two parts, the ‘begin()’ and ‘continue()’.

class scene {
public:
    hittable_list world;
    hittable_list lights;
    camera        cam;

    double aspect_ratio = 1.0;
    int    image_width = 100;
    int    image_height = 100;
    int    samples_per_pixel = 10;
    int    max_depth = 20;
    color  background = color(0, 0, 0);

    int fCurrentRow = 0;

  public:
      void init(int iwidth, double aspect, int spp, int maxd, const color& bkgd)
      {
          image_width = iwidth;
          aspect_ratio = aspect;
          image_height = static_cast<int>(image_width / aspect_ratio);
          samples_per_pixel = spp;
          max_depth = maxd;
          background = bkgd;
      }

      bool renderContinue()
      {
          if (fCurrentRow >= image_height)
              return false;


          int j = image_height - 1 - fCurrentRow;

          color out_color;

          for (int i = 0; i < image_width; ++i) {
              color pixel_color(0, 0, 0);
              for (int s = 0; s < samples_per_pixel; ++s) {
                  auto u = (i + random_double()) / (image_width - 1);
                  auto v = (j + random_double()) / (image_height - 1);
                  ray r = cam.get_ray(u, v);
                  pixel_color += ray_color(r, max_depth);
              }

              fit_color(out_color, pixel_color, samples_per_pixel);
              //write_color(std::cout, pixel_color, samples_per_pixel);
              gAppSurface->copyPixel(i, fCurrentRow, PixelRGBA(out_color[0], out_color[1], out_color[2]));
          }

          fCurrentRow = fCurrentRow + 1;

          return true;
      }

      void renderBegin()
      {
        cam.initialize(aspect_ratio);

        std::cout << "P3\n" << image_width << ' ' << image_height << "\n255\n";
    }



  private:
    color ray_color(const ray& r, int depth) {
        hit_record rec;

        // If we've exceeded the ray bounce limit, no more light is gathered.
        if (depth <= 0)
            return color(0,0,0);

        // If the ray hits nothing, return the background color.
        if (!world.hit(r, interval(0.001, infinity), rec))
            return background;

        scatter_record srec;
        color color_from_emission = rec.mat->emitted(r, rec, rec.u, rec.v, rec.p);

        if (!rec.mat->scatter(r, rec, srec))
            return color_from_emission;

        if (srec.skip_pdf) {
            return srec.attenuation * ray_color(srec.skip_pdf_ray, depth-1);
        }

        auto light_ptr = make_shared<hittable_pdf>(lights, rec.p);
        mixture_pdf p(light_ptr, srec.pdf_ptr);

        ray scattered = ray(rec.p, p.generate(), r.time());
        auto pdf_val = p.value(scattered.direction());

        double scattering_pdf = rec.mat->scattering_pdf(r, rec, scattered);

        color color_from_scatter =
            (srec.attenuation * scattering_pdf * ray_color(scattered, depth-1)) / pdf_val;

        return color_from_emission + color_from_scatter;
    }
};

These are the guts of the ray tracer, with the private ‘ray_color()’ function doing the brunt of it. But, I’m not really dissecting how the ray tracer works, just what was required to incorporate it into my demo scene.

Right there in ‘renderContinue()’, you can see how we go from whatever the raytracer was doing before (writing out a .ppm file), to converting the color to something we can throw into our canvas

fit_color(out_color, pixel_color, samples_per_pixel);
//write_color(std::cout, pixel_color, samples_per_pixel);
gAppSurface->copyPixel(i, fCurrentRow, PixelRGBA(out_color[0], out_color[1], out_color[2]));

The ‘fit_color’ routine takes the oversaturated color value the ray tracer had created, and turns it into a RGB in the range of 0..255. We then simply copy that to the canvas with copyPixel(). The effect this will have is to very slowly refresh the application window every time a single line is ray traced. With this particular image, it is slower than watching grass grow. This image took several hours (8) to render on my 5 year old i7 based desktop machine. Even if you imagined it took half that time, it’s still slow. There are ways to speed it up, but that’s another story.

What I’m interested in are a couple of things, how big is that program, and where’s the movie?

This little demo program is 173kilo bytes in size. Just think about that. Go to a typical web page, and the banner image might be bigger than that. Given that our machines, even cell phones, comes with Gigabytes of RAM, who cares how big a program is? Well, small size still means more efficient, if you’ve chosen proper algorithms. I like the challenge of small, because it means I’m parsimonious. Using as little external dependencies as possible. This also means that when I want to port to another platform, beyond Windows, I have less baggage to carry around.

This points to another design point.

I’m using C/C++ here. That’s not the only language I ever use, but it’s ok for these demos. I’m a big fan of C#, as well as my favorite LuaJIT. Of course you can also just use Javascript and browsers, but here we are. You’ll also notice in my usage of the language you don’t see a lot of memory management. You don’t typically see new/delete. That’s not because I’m using some garbage collection system. It’s because of a careful choice of data structures, calling conventions, and object lifetime management. Most things are held on ‘the stack’, because they’re temporary. Then, things like the canvas object, are initialized internally, so the programmer doesn’t have to worry about how that’s occuring, and doesn’t need to manage any associated memory.

I like this. It gives me a relatively easy programming API without forcing me to deal with memory management, which is easily the biggest bug generator when using this particular language. This is great for short demos. It’s a lot harder to maintain with more serious involved applications, although I’d argue it can be done with proper composition and super tight adherence to a coding methodology. Not realistic with large teams of programmers though.

OK, so small size, simple to write code, simple to integrate stuff you see on the internet. What about the movie?

scene_full movie

There you go. 8 hours of rendering, condensed down to a few seconds for your viewing pleasure. In this particular case, since the renderer is updating the canvas every frame, each frame is a single scan line. As there are 800 vertical lines, there are 800 frames. You can pick whatever frame rate you like to display at whatever speed.

If you were really clever, and had a few machines laying around, you’d create a render farm, and make an animated short with motion, each machine rendering a single frame, and then ultimately stitching it all together. But, on my meager dev box, I just get this little movie, and that’s my demo.

This just goes to show you. If you see something interesting out there in the graphics world, maybe a new line drawing algorithm, or a real-time renderer, it’s not hard to try those things out when you’ve got the proper setup. Of course, there are other frameworks out there, like SDL, or Qt, which do this kind of thing. If you look at them, just see how big they are, how complex their build environment is, and how much framework you must learn to do basic things. If they’re ok for you, then go that route. If they’re a bit much, then you might pursue this method, which is fairly minimal.

Next time around, screen capture, for fun and profit.