Screen Capture for Fun and Profit

In Screen Sharing from a Browser I wrote about how relatively easy it is to display a continuous snapshot of a remote screen, and even send mouse and keyboard events back to it.  That was the essence of modern day browser based screen sharing.  Everything else is about compression for bandwidth management.

In this article, I’ll present the “server” side of the equation.  Since I’ve discovered the ‘sourcecode’ bracket in WordPress, I can even present the code with line numbers.  So, here in its entirety is the server side:

 


local ffi = require "ffi"

local WebApp = require("WebApp")

local HttpRequest = require "HttpRequest"
local HttpResponse = require "HTTPResponse"
local URL = require("url")
local StaticService = require("StaticService")

local GDI32 = require ("GDI32")
local User32 = require ("User32")
local BinaryStream = require("core.BinaryStream")
local MemoryStream = require("core.MemoryStream")
local WebSocketStream = require("WebSocketStream")
local Network = require("Network")

local utils = require("utils")
local zlib = require ("zlib")

local UIOSimulator = require("UIOSimulator")

--[[
	Application Variables
--]]
local ScreenWidth = User32.GetSystemMetrics(User32.FFI.CXSCREEN);
local ScreenHeight = User32.GetSystemMetrics(User32.FFI.CYSCREEN);

local captureWidth = ScreenWidth;
local captureHeight = ScreenHeight;

local ImageWidth = captureWidth;
local ImageHeight = captureHeight;
local ImageBitCount = 16;

local hbmScreen = GDIDIBSection(ImageWidth, ImageHeight, ImageBitCount);
local hdcScreen = GDI32.CreateDCForDefaultDisplay();

local net = Network();

--[[
	Application Functions
--]]
function captureScreen(nWidthSrc, nHeightSrc, nXOriginSrc, nYOriginSrc)
  nXOriginSrc = nXOriginSrc or 0;
  nYOriginSrc = nYOriginSrc or 0;

  -- Copy some of the screen into a
  -- bitmap that is selected into a compatible DC.
  local ROP = GDI32.FFI.SRCCOPY;

  local nXOriginDest = 0;
  local nYOriginDest = 0;
  local nWidthDest = ImageWidth;
  local nHeightDest = ImageHeight;
  local nWidthSrc = nWidthSrc;
  local nHeightSrc = nHeightSrc;

  GDI32.Lib.StretchBlt(hbmScreen.hDC.Handle,
    nXOriginDest,nYOriginDest,nWidthDest,nHeightDest,
    hdcScreen.Handle,
    nXOriginSrc,nYOriginSrc,nWidthSrc,nHeightSrc,
    ROP);

  hbmScreen.hDC:Flush();
end

-- Serve the screen up as a bitmap image (.bmp)
local getContentSize = function(width, height, bitcount, alignment)
  alignment = alignment or 4

  local rowsize = GDI32.GetAlignedByteCount(width, bitcount, alignment);
  local pixelarraysize = rowsize * math.abs(height);
  local filesize = 54+pixelarraysize;
  local pixeloffset = 54;

  return filesize;
end

local filesize = getContentSize(ImageWidth, ImageHeight, ImageBitCount);
local memstream = MemoryStream.new(filesize);
local zstream = MemoryStream.new(filesize);

local writeImage = function(dibsec, memstream)
  --print("printImage")
  local width = dibsec.Info.bmiHeader.biWidth;
  local height = dibsec.Info.bmiHeader.biHeight;
  local bitcount = dibsec.Info.bmiHeader.biBitCount;
  local rowsize = GDI32.GetAlignedByteCount(width, bitcount, 4);
  local pixelarraysize = rowsize * math.abs(height);
  local filesize = 54+pixelarraysize;
  local pixeloffset = 54;

  -- allocate a MemoryStream to fit the file size
  local streamsize = GDI32.GetAlignedByteCount(filesize, 8, 4);

  memstream:Seek(0);

  local bs = BinaryStream.new(memstream);

  -- Write File Header
  bs:WriteByte(string.byte('B'))
  bs:WriteByte(string.byte('M'))
  bs:WriteInt32(filesize);
  bs:WriteInt16(0);
  bs:WriteInt16(0);
  bs:WriteInt32(pixeloffset);

  -- Bitmap information header
  bs:WriteInt32(40);
  bs:WriteInt32(dibsec.Info.bmiHeader.biWidth);
  bs:WriteInt32(dibsec.Info.bmiHeader.biHeight);
  bs:WriteInt16(dibsec.Info.bmiHeader.biPlanes);
  bs:WriteInt16(dibsec.Info.bmiHeader.biBitCount);
  bs:WriteInt32(dibsec.Info.bmiHeader.biCompression);
  bs:WriteInt32(dibsec.Info.bmiHeader.biSizeImage);
  bs:WriteInt32(dibsec.Info.bmiHeader.biXPelsPerMeter);
  bs:WriteInt32(dibsec.Info.bmiHeader.biYPelsPerMeter);
  bs:WriteInt32(dibsec.Info.bmiHeader.biClrUsed);
  bs:WriteInt32(dibsec.Info.bmiHeader.biClrImportant);

  -- Write the actual pixel data
  memstream:WriteBytes(dibsec.Pixels, pixelarraysize, 0);
end

local getSingleShot = function(response, compressed)
  captureScreen(captureWidth, captureHeight);

  writeImage(hbmScreen, memstream);

  zstream:Seek(0);
  local compressedLen = ffi.new("int[1]", zstream.Length);
  local err = zlib.compress(zstream.Buffer,   compressedLen, memstream.Buffer, memstream:GetPosition() );

  zstream.BytesWritten = compressedLen[0];

  local contentlength = zstream.BytesWritten;
  local headers = {
    ["Content-Length"] = tostring(contentlength);
    ["Content-Type"] = "image/bmp";
    ["Content-Encoding"] = "deflate";
  }

  response:writeHead("200", headers);
  response:WritePreamble();
  return response.DataStream:WriteBytes(zstream.Buffer, zstream.BytesWritten);
end

local handleUIOCommand = function(command)

  local values = utils.parseparams(command)

  if values["action"] == "mousemove" then
    UIOSimulator.MouseMove(tonumber(values["x"]), tonumber(values["y"]))
  elseif values["action"] == "mousedown" then
    UIOSimulator.MouseDown(tonumber(values["x"]), tonumber(values["y"]))
  elseif values["action"] == "mouseup" then
    UIOSimulator.MouseUp(tonumber(values["x"]), tonumber(values["y"]))
  elseif values["action"] == "keydown" then
    UIOSimulator.KeyDown(tonumber(values["which"]))
  elseif values["action"] == "keyup" then
    UIOSimulator.KeyUp(tonumber(values["which"]))
  end
end

local startupContent = nil

local handleStartupRequest = function(request, response)
  -- read the entire contents
  if not startupContent then
    -- load the file into memory
    local fs, err = io.open("viewscreen2.htm")

    if not fs then
      response:writeHead("500")
      response:writeEnd();

      return true
    end

    local content = fs:read("*all")
    fs:close();

    -- perform the substitution of values
    -- assume content looks like this:
    -- <!--?hostip? -->:<!--?serviceport?-->
    local subs = {
      ["frameinterval"]	= 300,
      ["hostip"] 			= net:GetLocalAddress(),
      ["capturewidth"]	= captureWidth,
      ["captureheight"]	= captureHeight,
      ["imagewidth"]		= ImageWidth,
      ["imageheight"]		= ImageHeight,
      ["screenwidth"]		= ScreenWidth,
      ["screenheight"]	= ScreenHeight,
      ["serviceport"] 	= Runtime.config.port,
    }
    startupContent = string.gsub(content, "%<%?(%a+)%?%>", subs)
  end

  -- send the content back to the requester
  response:writeHead("200",{["Content-Type"]="text/html"})
  response:writeEnd(startupContent);

  return true
end

--[[
  Responding to remote user input
]]--
local handleUIOSocketData = function(ws)
  while true do
    local bytes, bytesread = ws:ReadFrame()

    if not bytes then
      print("handleUIOSocketData() - END: ", err);
      break
    end

    local command = ffi.string(bytes, bytesread);
    handleUIOCommand(command);
  end
end

local handleUIOSocket = function(request, response)
  local ws = WebSocketStream();
  ws:RespondWithServerHandshake(request, response);

  Runtime.Scheduler:Spawn(handleUIOSocketData, ws);

  return false;
end

--[[
  Primary Service Response routine
]]--
local HandleSingleRequest = function(stream, pendingqueue)
  local request, err  = HttpRequest.Parse(stream);

  if not request then
    -- dump the stream
    --print("HandleSingleRequest, Dump stream: ", err)
    return
  end

  local urlparts = URL.parse(request.Resource)
  local response = HttpResponse.Open(stream)
  local success = nil;

  if urlparts.path == "/uiosocket" then
    success, err = handleUIOSocket(request, response)
  elseif urlparts.path == "/screen.bmp" then
    success, err = getSingleShot(response, true);
  elseif urlparts.path == "/screen" then
    success, err = handleStartupRequest(request, response)
  elseif urlparts.path == "/favicon.ico" then
    success, err = StaticService.SendFile("favicon.ico", response)
  elseif urlparts.path == "/jquery.js" then
    success, err = StaticService.SendFile("jquery.js", response)
  else
    response:writeHead("404");
    success, err = response:writeEnd();
  end

  if success then
    return pendingqueue:Enqueue(stream)
  end
end

--[[
  Start running the service
--]]
local serviceport = tonumber(arg[1]) or 8080

Runtime = WebApp({port = serviceport, backlog=100})

Runtime:Run(HandleSingleRequest);

As a ‘server’ this code is responsible for handling a couple of things. First, it needs to act as a basic http server, serving up relatively static content to get things started. When the user specifies the url http://localhost/screen, the server will respond by sending back the browser code that I showed in the previous article. The function “handleStartupRequest()” performs this operation. The file ‘viewscreen2.htm’ is HTML, but it’s a bit of a template as well. You can delimit a piece to be replaced by enclosing it in a tag such as: . This tag can be replaced by any bit of code that you choose. In this case, I’m doing replacements for the size of the image, the size of the screen, the refreshinterval, and the hostid and port. This last is most important because without it, you won’t be able to setup the websocket.

The other parts are fairly straight forward. Of particular note is the ‘captureScreen()’ function. In Windows, since the dawn of man, there has been GDI for graphics. Good ol’ GDI still has the ability to capture the screen, or a single window, or a portion of the screen. this still works in Windows 8 as well. So, capturing the screen is nothing more that drawing into a DIBSection, and that’s that. Just one line of code.

The magic happens after that. Rather than handing the raw image back to the client, I want to send it out as a compressed BMP image. I could choose PNG, or JPG, or any other format browsers are capable of handling, but BMP is the absolute easiest to deal with, even if it is the most bulky. I figure that since I’m using zlib to deflate it before sending it out, that will be somewhat helpful, and it turns out this works just fine.

The rest of the machinery there is just to deal with being an http server. A lot is hidden behind the ‘WebApp’ and the ‘WebSocket’ classes. Those are good for another discussion.

So, all in, this is about 300 lines of code. Not too bad for a rudimentary screen sharing service. Of course, there’s a supporting cast that runs into the thousands of lines of code, but I’m assuming this as a given since frameworks such as Node and various others exist.

I could explain each and every line of code here, but I think it’s small enough and easy enough to read that won’t be necessary. I will point out that there’s not much difference between sending single snapshots one at a time vs having an open stream and presenting the screen as h.264 or WebM. For that scenario, you just need a library that can capture snapshots of the screen and turn them into the properly encoded video stream. Since you have the WebSocket, it could easily be put to use for that purpose, rather than just receiving the mouse and keyboard events.

Food for thought.


A Picture’s worth 5Mb

What’s this then?

Another screen capture. This time I just went with the full size 1920×1080. What’s happening on this screen? Well, that tiger in the upper left is well known from PostScript days, and is the gold standard for testing graphics rendering systems. In this case, I’m using OpenVG on the Pi, from a Lua driven framework. No external support libraries, other than what’s on the box. In fact, I just took the hello_tiger sample, and did some munging to get it into the proper shape, and here it is.  One thing of note, this image is actually rotating.  It’s not blazingly fast, but it’s not blazingly fast on any platform.  But, it’s decent.  It’s way faster than what it would be using the CPU only on my “high powered” quad core desktop machine.  This speed comes from the fact that the GPU on the Pi is doing all the work.  You can tell because if you get a magnifying glass and examine the lowest right hand corner of the image, you’ll see that the CPU meter is not pegged.  What amount of action is occuring there is actually coming from other parts of the system, not the display of the tiger.  I guess that VideoCore GPU thing really does make a difference in terms of accelerating graphics.  Go figure.

In the middle of the image, you see a window “snapper.lua”. This is the code that is doing the snapshot. Basically, I run the tiger app from one terminal, the one on the lower left. Then in the lower right, I run the ‘snapper.lua’ script. As can be seen in the OkKeyUp function, every time the user presses the “SysRQ” key (also ‘Print Screen’ on many keyboards), a snapshot is taken of the screen.

Below that, there’s a little bit of code that stitches an event loop together with a keyboard object. Yes, I now have a basic event loop, and a “Application” object as well. This makes it really brain dead simple to throw together apps like this without much effort.

[sidetrack]
One very interesting thing about being able to completely control your eventing model and messaging loops is that you can do whatever you want. Eventually, I’ll want to put together a quick and dirty “remote desktop” sort of deal, and I’ll need to be able to quickly grab the keyboard, mouse, and other interesting events, and throw them to some other process. That process will need to be able to handle them as if they happened locally. Well, when you construct your environment from scratch, you can easily bake that sort of thing in.
[sidetrack]

It’s nice to have such a system readily at hand.  I can fiddle about with lots of different things, build apps, experiment with eventing models, throw up some graphics, and never once have to hit “compile” and wait.  This makes for a very productive playground where lots of different ideas can be tried out quickly before being baked into more ‘serious’ coding environments.

 


Screencast of the Raspberry Pi

It’s one of those innevitabilities.  Start with fiddling about with low level graphics system calls, do some screen capture, then some single file saving, and suddenly enough you’ve got screen capture movies!  Assuming WordPress does this right.

If you’ve been following along, the relevant code looks like this:

-- Create the resource that will be used
-- to copy the screen into.  Do this so that
-- we can reuse the same chunk of memory
local resource = DMXResource(displayWidth, displayHeight, ffi.C.VC_IMAGE_RGB888);

local p_rect = VC_RECT_T(0,0,displayWidth, displayHeight);
local pixdata = resource:CreateCompatiblePixmap(displayWidth, displayHeight);

local framecount = 120

for i=1,framecount do
	-- Do the snapshot
	Display:Snapshot(resource);

	local pixeldata, err = resource:ReadPixelData(pixdata, p_rect);
	if pixeldata then
		-- Write the data out
		local filename = string.format("screencast/desktop_%06d.ppm", i);
		print("Writing: ", filename);

		WritePPM(filename, pixeldata);
	end
end

In this case, I’m capturing into a bitmap that is 640×320, which roughly matches the aspect ratio of my wide monitor.

This isn’t the fastest method of capturing on the planet. It actually takes a fair amount of time to save each image to the SD card in my Pi. Also, I might be able to eliminate the copy (ReadPixelData), if I can get the pointer to the memory that the resource uses.

This little routine will generate a ton of .ppm image files stored in the local ‘screencast’ directory.

From there, I use ffmpeg to turn the sequence of images into a movie:

ffmpeg -i desktop_0x%06.ppm  desktop.mp4

If you’re a ffmpeg guru, you can set all sorts of flags to change the framerate, encoder, and the like. I just stuck with defaults, and the result is what you see here.

So, the Pi is capable. It’s not the MOST capable, but it can get the job done. If I were trying to do this in a production environment, I’d probably attach a nice SSD drive to the USB port, and stream out to that. I might also choose a smaller image format such as YUV, which is easier to compress. As it is, the compression was getting about 9fps, which ain’t too bad for short clips like this.

One nice thing about this screen capture method is that it doesn’t matter whether you’re running X Windows, or not. So, you’re not limited to things that run in X. You can capture simple terminal sessions as well.

I’m rambling…

This works, and it can only get better from here.

It is part of the LJIT2RPi project.


Taking Screen Snapshots on the Raspberry Pi

Last time around, I was doing some display wrangling, trying to put some amount of ‘framework’ goodness around this part of the Raspberry Pi operating system. With the addition of a couple more classes, I an finally do something useful.

Here is how to take a screen snapshot:

local ffi = require "ffi"
local DMX = require "DisplayManX"

local Display = DMXDisplay();

local width = 640;
local height = 480;
local layer = 0;	-- keep the snapshot view on top

-- Create a resource to copy image into
local pixmap = DMX.DMXResource(width,height);

-- create a view with the snapshot as
-- the backing store
local mainView = DMX.DMXView.new(Display, 200, 200, width, height, layer, pformat, pixmap);


-- Hide the view so it's not in the picture
mainView:Hide();	

-- Do the snapshot
Display:Snapshot(pixmap);

-- Show it on the screen
mainView:Show();

ffi.C.sleep(5);

This piece of code is so short, it’s almost self explanatory. But, I’ll explain it anyway.

The first few lines are just setup.

local ffi = require "ffi"
local DMX = require "DisplayManX"

local Display = DMXDisplay();

local width = 640;
local height = 480;
local layer = 0;	-- keep the snapshot view on top

The only two that are strictly needed are these:

local DMX = require "DisplayManX"
local Display = DMXDisplay();

The first line pulls in the Lua DisplayManager module that I wrote. This is the entry point into the Raspberry Pi’s low level VideoCore routines. Besides containing the simple wrappers around the core routines, it also contains some convenience classes which make doing certain tasks very simple.

Creating a DMXDisplay object is most certainly the first thing you want to do in any application involving the display. This gives you a handle on the entire display. From here, you can ask what size things are, and it’s necessary for things like creating a view on the display.

The DMXDisplay interface has a function to take a snapshot. That function looks like this:

Snapshot = function(self, resource, transform)
  transform = transform or ffi.C.VC_IMAGE_ROT0;

  return DisplayManX.snapshot(self.Handle, resource.Handle, transform);
end,

The important part here is to note that a ‘resource’ is required to take a snapshot. This might look like 3 parameters are required, but through the magic of Lua, it atually turns into only 1. We’ll come back to this.

So, a resource is needed. What’s a resource? Basically a bitmap that the VideoCore system controls. You can create one easily like this:

local pixmap = DMX.DMXResource(width,height);

There are a few other parameters that you could use while creating your bitmap, but width and height are the essentials.

One thing of note, when you eventually call: Display:Snapshot(pixmap), you can not control which part of the screen is taken as the snapshot. It will take a snapshot of the entire screen. But, your bitmap does not have to be the same size! It can be any size you like. The VideoCore library will automatically squeeze your snapshot down to the size you specified when you created your resource.

So, we have a bitmap within which our snapshot will be stored. The last thing to do is to actually take the snapshot:

Display:Snapshot(pixmap);

In this particular example, I also want to display the snapshot on the screen. So, I created a ‘view’. This view is simply a way to display something on the screen.

local mainView = DMX.DMXView.new(Display, 200, 200, width, height, layer, pformat, pixmap);

In this case, I do a couple of special things. I create the view to be the same size as the pixel buffer, and in fact, I use the pixel buffer as the backing store of the view. That means that whenever the pixel buffer changes, for example, when a snapshot is taken, it will automatically show up in the view, because the system draws the view from the pixel buffer. I know it’s a mouth full, but that’s how the system works.

So the following sequence:

-- Hide the view so it's not in the picture
mainView:Hide();	

-- Do the snapshot
Display:Snapshot(pixmap);

-- Show it on the screen
mainView:Show();

ffi.C.sleep(5);

…will hide the view
take a snapshot
show the view

That’s so the view itself is not a part of the snapshot. You could achieve the same by moving the view ‘offscreen’ and then back again, but I haven’t implemented that part yet.

Well, there you have it. A whole bunch of words to describe a fairly simple process. I think this is an interesting thing though. Thus far, when I’ve seen Raspberry Pi ‘demo’ videos, it’s typically someone with a camera in one hand, bad lighting, trying to type on their keyboard and use their mouse while taking video. With the ability to take screen snapshots in code, making screencasts can’t be that far off.

Now, if only I could mate this capability with that x264 video compression library, I’d be all set!