Spelunking Linux – procfs or is that sysctl?

Last time around, I introduced some simple things with lj2procfs.  Being able to simply access the contents of the various files within procfs is a bit of convenience.  Really what lj2procfs is doing is just giving you a common interface to the data in those files.  Everything shows up as simple lua values, typically tables, with strings, and numbers.  That’s great for most of what you’d be doing with procfs, just taking a look at things.

But, on Linux, procfs has another capability.  The /proc/sys directory contains a few interesting directories of its own:

 

abi/
debug/
dev/
fs/
kernel/
net/
vm/

And if you look into these directories, you find some more interesting files. For example, in the ‘kernel/’ directory, we can see a little bit of this:

hostname
hotplug
hung_task_check_count
hung_task_panic
hung_task_timeout_secs
hung_task_warnings
io_delay_type
kexec_load_disabled
keys
kptr_restrict
kstack_depth_to_print
max_lock_depth
modprobe
.
.
.

Now, these are looking kind of interesting. These files contain typically tunable portions of the kernel. On other unices, these values might be controlled through the sysctl() function call. On Linux, that function would just manipulate the contents of these files. So, why not just use lj2procfs to do the same.

Let’s take a look at a few relatively simple tasks. First, I want to get the version of the OS running on my machine. This can be obtained through the file /proc/sys/kernel/version

local procfs = require("lj2procfs.procfs")
print(procfs.sys.kernel.version)

$ #15-Ubuntu SMP Thu Apr 16 23:32:37 UTC 2015

This is the same string returned from the call ‘uname -v’

And, to get the hostname of the machine:

print(procfs.sys.kernel.hostname)
$ ubuntu

Which is what the ‘hostname’ command returns on my machine.

And what about setting the hostname? First of all, you’ll want to do this as root, but it’s equally simple:

procfs.sys.kernel.hostname = 'alfredo'

Keep in mind that setting the hostname in this way is transient, and it will seriously mess up things, like your about to sudo after this. But, there you have it.

Any value under /proc/sys can be retrieved or set using the fairly simple mechanism. I find this to be very valuable for two reasons. First of all, spelunking these values makes for great discovery. More importantly, being able to capture and set the values makes for a fairly easily tunable system.

An example of how this can be used for system diagnostics and tuning, you can capture the kernel values, using a simple command that just dumps what you want into a table. Send that table to anyone else for analysis. Similarly, if someone has come up with a system configuration that is great for a particular task, tuning the VM allocations, networking values, and the like, they can send you the configuration (just a string value that is a lua table) and you can apply it to your system.

This is a tad better than simply trying to look at system logs to determine after the fact what might be going on with a system. Perhaps the combination of these live values, as well as correlation with system logs, makes it easier to automate the process of diagnosing and tuning a system.

Well, there you have it. The lj2procfs thing is getting more concise, as well as becoming more usable at the same time.


Spelunking Linux – procfs, and a bag of chips

Recently, as a way to further understand all things Linux, I’ve been delving into procfs.  This is one of those virtual file systems on linux, meaning, the ‘files’ and ‘directories’ are not located on some real media anywhere, they are conjured up in realtime from within the kernel.  If you take a look at the ‘/proc’ directory on any linux machine, you’ll find a couple of things.  First, there are a bunch of directories with numeric values as their names.

 

1	     10			100	      1003	     1004
10075	     101		10276	      10683	     10695
10699	     1071		10746	      10756	     10757
1081	     11			11927	      12	     1236
12527	     12549		12563	      1296	     13

Yes, true to the unix philosophy, all the things are but files/directories. Each one of these numbers represents a process that is running on the system at the moment. Each one of these directories contains additional directories, and files. The files contain interesting data, and the directories lead into even more places where you can find more interesting things about the process.

Here are the contents of the directory ‘1’, which is the first process running on the system:

attr/	    autogroup	     auxv	 cgroup     clear_refs	cmdline
comm	    coredump_filter  cpuset	 cwd@	    environ	exe@
fd/	    fdinfo/	     gid_map	 io	    limits	loginuid
map_files/  maps	     mem	 mountinfo  mounts	mountstats
net/	    ns/		     numa_maps	 oom_adj    oom_score	oom_score_adj
pagemap     personality      projid_map  root@	    sched	schedstat
sessionid   setgroups	     smaps	 stack	    stat	statm
status	    syscall	     task/	 timers     uid_map	wchan

Some actual files, some more directories, some symbolic links. To find out the details of what each of these contains, and their general meaning, you need to consult the procfs man page, as well as the source code of the linux kernel, or various utilities that use them.

Backing up a bit, the /proc directory itself contains some very interesting files as well:

acpi/	      asound/		 buddyinfo     bus/	      cgroups
cmdline       consoles		 cpuinfo       crypto	      devices
diskstats     dma		 driver/       execdomains    fb
filesystems   fs/		 interrupts    iomem	      ioports
irq/	      kallsyms		 kcore	       keys	      key-users
kmsg	      kpagecount	 kpageflags    loadavg	      locks
mdstat	      meminfo		 misc	       modules	      mounts@
mtrr	      net@		 pagetypeinfo  partitions     sched_debug
schedstat     scsi/		 self@	       slabinfo       softirqs
stat	      swaps		 sys/	       sysrq-trigger  sysvipc/
thread-self@  timer_list	 timer_stats   tty/	      uptime
version       version_signature  vmallocinfo   vmstat	      zoneinfo

Again, the meanings of each of these is buried in the various documentation and source code files that surround them, but let’s take a look at a couple of examples. How about that uptime file?

8099.41 31698.74

OK. Two numbers. What do they mean? The first one is how many seconds the system has been running. The second one is the number of seconds all cpus on the system have been idle since the system came up. Yes, on a multi-proc system, the second number can be greater than the first. And thus begins the actual journey into spelunking procfs. If you’re like me, you occasionally need to know this information. Of course, if you want to know it from the command line, you just run the ‘uptime’ command, and you get…

 06:38:22 up  2:18,  2 users,  load average: 0.17, 0.25, 0.17

Well, hmmm, I get the ‘up’ part, but what’s all that other stuff, and what happened to the idle time thing? As it turns out, the uptime command does show the up time, but it also shows the logged in users, and the load average numbers, which actually come from different files.

It’s like this. Whatever you want to know about the system is probably available, but you have to know where to look for it, and how to interpret the data from the files. Often times there’s either a libc function you can call, or a command line utility, if you can discover and remember them.

What about a different way? Since I’m spelunking, I want to discover things in a more random fashion, and of course I want easy lua programmatic access to what I find. In steps the lj2procfs project.

In lj2procfs, I try to provide a more manageable interface to the files in /proc.  Most often, the information is presented as lua tables.  If the information is too simple (like /proc/version), then it is presented as a simple string.  Here is a look at that uptime example, done using lj2procfs:

 

return {
    ['uptime'] = {
        seconds = 19129.39,
        idle = 74786.86,
    };
}

You can see that the simple two numbers in the uptime file are converted to meaningful fields within the table. In this case, I use a simple utility program to turn any of the files into simple lua value output, suitable for reparsing, or transmitting. First, what does the ‘parsing’ look like?

--[[
	seconds idle 

	The first value is the number of seconds the system has been up.
	The second number is the accumulated number of seconds all processors
	have spent idle.  The second number can be greater than the first
	in a multi-processor system.
--]]
local function decoder(path)
	path = path or "/proc/uptime"
	local f = io.open(path)
	local str = f:read("*a")
	f:close()

	local seconds, idle = str:match("(%d*%.?%d+)%s+(%d*%.?%d+)")
	return {
		seconds = tonumber(seconds);
		idle = tonumber(idle);
	}
end

return {
	decoder = decoder;
}

In most cases, the output of the /proc files are meant to be human readable. At least with Linux. Other platforms might prefer these files to be more easily machine readable (binary). As such, they are readily parseable mostly by simple string patterns.

So, this decoder is one of many. There is one for each of the file types in the /proc directory, or at least the list is growing.

They are in turn accessed using the Decoders class.

local Decoders = {}
local function findDecoder(self, key)
	local path = "lj2procfs.codecs."..key;

	-- try to load the intended codec file
	local success, codec = pcall(function() return require(path) end)
	if success and codec.decoder then
		return codec.decoder;
	end

	-- if we didn't find a decoder, use the generic raw file loading
	-- decoder.
	-- Caution: some of those files can be very large!
	return getRawFile;
end
setmetatable(Decoders, {
	__index = findDecoder;

})

This is a fairly simple forwarding mechanism. You could use this in your code by doing the following:

procfs = require("Decoders")
local uptime = procfs.uptime

printValue(uptime)

When you try to access the procfs.uptime field of the Decoders class, it will go; “Hmmm, I don’t have a field in my table with that name, I’ll defer to whatever was set as my __index value, which so happens to be a function, so I’m going to call that function and see what it comes up with”. The findDecoder function will in turn look in the codecs directory for something with that name. It will find the code in uptime.lua, and execute it, handing it the path specified. The uptime function will read the file, parse the values, and return a table.

And thus magic is practiced!

It’s actually pretty nice because having things as lua tables and lua values such as numbers and strings, makes it really easy to do programmatic things from there.

Here’s meminfo.lua

local function meminfo(path)
	path = path or "/proc/meminfo"
	local tbl = {}
	local pattern = "(%g+):%s+(%d+)%s+(%g+)"

	for str in io.lines(path) do
		local name, size, units = str:match(pattern)
		if name then
			tbl[name] = {
				size = tonumber(size), 
				units = units;
			}
		end
	end

	return tbl
end

return {
	decoder = meminfo;
}

The raw ‘/proc/meminfo’ file output looks something like this:

MemTotal:        2045244 kB
MemFree:          273464 kB
MemAvailable:     862664 kB
Buffers:           72188 kB
Cached:           629268 kB
SwapCached:            0 kB
.
.
.

And the parsed output might be something like this:

    ['meminfo'] = {
        ['Active'] = {
            size = 1432756,
            units = [[kB]],
        };
        ['DirectMap2M'] = {
            size = 1992704,
            units = [[kB]],
        };
        ['MemFree'] = {
            size = 284604,
            units = [[kB]],
        };
        ['MemTotal'] = {
            size = 2045244,
            units = [[kB]],
        };
.
.
.

Very handy.

In some cases, the output can be a bit tricky, since it’s trying to be human readable, there might be some trickery, like header lines, and variable number of columns. This can get tricky, but you have the full power of lua to do the parsing, including using something like lpeg if you so choose. Here’s the parser for the ‘/proc/interrupts’ file, for example:

local strutil = require("lj2procfs.string-util")

local namepatt = "(%g+):(.*)"
local numberspatt = "[%s*(%d+)]+"
local descpatt = "[%s*(%d+)]+(.*)"

local function numbers(value)
	num, other = string.match(value, numpatt)
	return num;
end

local function interrupts(path)
	path = path or "/proc/interrupts"

	local tbl = {}
	for str in io.lines(path) do
		local name, remainder = str:match(namepatt)
		if name then
			local numbers = remainder:match(numberspatt)
			local desc = remainder:match(descpatt)

			local cpu = 0;
			local valueTbl = {}
			for number in numbers:gmatch("%s*(%d+)") do
				--print("NUMBER: ", number)
				valueTbl["cpu"..cpu] = tonumber(number);
				cpu = cpu + 1;
			end
			valueTbl.description = desc
			tbl[name] = valueTbl

		end
	end

	return tbl
end

return {
	decoder = interrupts;
}

And it deals with raw file data that looks like this:

          CPU0       CPU1       CPU2       CPU3       
  0:         38          0          0          0   IO-APIC-edge      timer
  1:      25395          0          0          0   IO-APIC-edge      i8042
  8:          1          0          0          0   IO-APIC-edge      rtc0
  9:      70202          0          0          0   IO-APIC-fasteoi   acpi
.
.
.

In this case, I’m running on a VM which was configured with 4 cpus. I had run previously with a VM with only 3 CPUs, and there were only 3 CPU columns. So, in this case, the patterns first isolate the interrupt number from the remainder of the line, then the numeric columns are isolated from the interrupt description field, then the numbers themselves are matched off using an iterator (gmatch). The table generated looks something like this:

    ['interrupts'] = {
        ['SPU'] = {
            cpu2 = 0,
            cpu3 = 0,
            cpu1 = 0,
            cpu0 = 0,
            description = [[Spurious interrupts]],
        };
        ['22'] = {
            cpu2 = 0,
            cpu3 = 0,
            cpu1 = 0,
            cpu0 = 0,
            description = [[IO-APIC  22-fasteoi   virtio1]],
        };
        ['NMI'] = {
            cpu2 = 0,
            cpu3 = 0,
            cpu1 = 0,
            cpu0 = 0,
            description = [[Non-maskable interrupts]],
        };
.
.
.

Nice.

To make spelunking easier, I’ve created a simple script which just calls the procfs thing, given a command line argument of the name of the file you’re interested in looking at.

#!/usr/bin/env luajit

--[[
	procfile

	This is a general purpose /proc/<file> interpreter.
	Usage:  
		$ sudo ./procfile filename

	Here, 'filname' is any one of the files listed in the 
	/proc directory.

	In the cases where a decoder is implemented in Decoders.lua
	the file will be parsed, and an appropriate value will be
	returned and printed in a lua form appropriate for reparsing.

	When there is no decoder implemented, the value returned is
	"NO DECODER AVAILABLE"

	example:
		$ sudo ./procfile cpuinfo
		$ sudo ./procfile partitions

--]]

package.path = "../?.lua;"..package.path;

local procfs = require("lj2procfs.procfs")
local putil = require("lj2procfs.print-util")

if not arg[1] then
	print ([[

USAGE: 
	$ sudo ./procfile <filename>

where <filename> is the name of a file in the /proc
directory.

Example:
	$ sudo ./pocfile cpuinfo
]])

	return 
end


local filename = arg[1]

print("return {")
putil.printValue(procfs[filename], "    ", filename)
print("}")

Once you have these basic tools in hand, you can begin to look at the various utilities that are used within linux, and try to emulate them. For example, the ‘free’ command will show you roughly how memory currently sits on your system, in terms of how much is physically available, how much is used, and the like. It’s typical output, without any parameters, might look like:

             total       used       free     shared    buffers     cached
Mem:       2045244    1760472     284772      28376      73276     635064
-/+ buffers/cache:    1052132     993112
Swap:      1046524          0    1046524

Here’s the code to generate something similar, using lj2procfs

#!/usr/bin/env luajit

--[[
	This lua script acts similar to the 'free' command, which will
	display some interesting information about how much memory is being
	used in the system.
--]]
--memfree.lua
package.path = "../?.lua;"..package.path;

local procfs = require("lj2procfs.procfs")

local meminfo = procfs.meminfo;

local memtotal = meminfo.MemTotal.size
local memfree = meminfo.MemFree.size
local memused = memtotal - memfree
local memshared = meminfo.Shmem.size
local membuffers = meminfo.Buffers.size
local memcached = meminfo.Cached.size

local swaptotal = meminfo.SwapTotal.size
local swapfree = meminfo.SwapFree.size
local swapused = swaptotal - swapfree

print(string.format("%18s %10s %10s %10s %10s %10s",'total', 'used', 'free', 'shared', 'buffers', 'cached'))
print(string.format("Mem: %13d %10d %10d %10d %10d %10d", memtotal, memused, memfree, memshared, membuffers, memcached))
--print(string.format("-/+ buffers/cache: %10d %10d", 1, 2))
print(string.format("Swap: %12d %10d %10d", swaptotal, swapused, swapfree))

The working end of this is simply ‘local meminfo = procfs.meminfo’

The code generates the following output.

             total       used       free     shared    buffers     cached
Mem:       2045244    1768692     276552      28376      73304     635100
Swap:      1046524          0    1046524

I couldn’t quite figure out where the -/+ buffers/cache: values come from yet. I’ll have to look at the actual code for the ‘free’ program to figure it out. But, the results are otherwise the same.

Some of these files can be quite large, like kallsyms, which might argue for an iterator interface instead of a table interface. But, some of the files have meta information, as well as a list of fields. Since the number of large files is fairly small, it made more sense to cover the broader cases instead, and tables do that fine. kallsyms being fairly large, it will still nicely fit into a table.

So, what can you do with that?

--findsym.lua
package.path = "../?.lua;"..package.path;

local sym = arg[1]
assert(sym, "must specify a symbol")

local putil = require("lj2procfs.print-util")
local sutil = require("lj2procfs.string-util")
local fun = require("lj2procfs.fun")
local procfs = require("lj2procfs.procfs")

local kallsyms = procfs.kallsyms

local function isDesiredSymbol(name, tbl)
    return sutil.startswith(name, sym)
end

local function printSymbol(name, value)
	putil.printValue(value)
end

fun.each(printSymbol, fun.filter(isDesiredSymbol, kallsyms))

In this case, a little utility which will traverse through the symbols, looking for something that starts with whatever the user specified on the command line. So, I can use it like this:

luajit findsym.lua mmap

And get the following output:

{
    location = [[0000000000000000]],
    name = [[mmap_init]],
    kind = [[T]],
};
{
    location = [[0000000000000000]],
    name = [[mmap_rnd]],
    kind = [[t]],
};
{
    location = [[0000000000000000]],
    name = [[mmap_zero]],
    kind = [[t]],
};
{
    location = [[0000000000000000]],
    name = [[mmap_min_addr_handler]],
    kind = [[T]],
};
{
    location = [[0000000000000000]],
    name = [[mmap_vmcore_fault]],
    kind = [[t]],
};
{
    location = [[0000000000000000]],
    name = [[mmap_region]],
    kind = [[T]],
};
{
    location = [[0000000000000000]],
    name = [[mmap_vmcore]],
    kind = [[t]],
};
{
    location = [[0000000000000000]],
    name = [[mmap_kset.19656]],
    kind = [[b]],
};
{
    location = [[0000000000000000]],
    name = [[mmap_min_addr]],
    kind = [[B]],
};
{
    location = [[0000000000000000]],
    name = [[mmap_mem_ops]],
    kind = [[r]],
};
{
    location = [[0000000000000000]],
    name = [[mmap_mem]],
    kind = [[t]],
};

Of course, you’re not limited to simply printing to stdout. In fact, that’s the least valuable thing you could be doing. What you really have is programmatic access to all these values. If you had run this command as root, you would get the actual addresses of these routines.

And so it goes. lj2procfs gives you easy programmatic access to all the great information that is hidden in the procfs file system on linux machines. These routines make it relatively easy to gain access to the information, and utilize tools such as luafun to manage it. Once again, the linux system is nothing more than a very large database. Using a tool such a lua makes it relatively easy to access all the information in that database.

So far, lj2procfs just covers reading the values. In this article I did not cover the fact that you can also get information on individual processes as well. Aside from this, procfs actually allows you to set some values as well. This is why I structured the code as ‘codecs’. You can encode, and decode. So, in future, setting a value will be as simple as ‘procfs.something.somevalue = newvalue’. This will eliminate the guess work out of doing command line ‘echo …’ commands for esoteric values, which are seldom used. It also makes easy to achieve great things programmatically through script, without even relying on various libraries that are meant to do the same.

And there you have it. procfs wrapped up in lua goodness.


Spelunking Linux – what is this auxv thing anyway

While spelunking Linux, trying to find an easier way to do this or that, I ran across this vDSO thing (virtual ELF Dynamic Shared Object). What?

Ok, it’s like this. I was implementing direct calling of syscalls on Linux, and reading up on how the C libraries do things. On Linux, when you wan to talk to the kernel, you typically go through syscalls, or ioctl calls, or netlinks. With syscall, it’s actually a fairly expensive process, switching from userspace to kernel space, issuing and responding to interrupts, etc. In some situations this could be a critical performance hit. So, to make things easier/faster, some of the system calls are implemented in this little ELF package (vDSO). This little elf package (a dynamic link library) is loaded into every application on Linux. Then, the C library can decide to make calls into that library, instead of syscalls, thus saving a lot of overhead and speeding things up. Not all systems have this capability, but many do.

Alright, so how does the C runtime know whether the capability is there or now, and where this little library is, and how to get at functions therewith? In steps our friend auxv. In the GNU C library, there is a lone call:

unsigned long getauxval(unsigned long);

What values can you get out of this thing? Well, the constants can be found in the elf.h file, and look like:

#define AT_PLATFORM 15
#define AT_PAGESZ 6

And about 30 others. How you use each of these depends on the type of the value that you are looking up. For instance, the AT_PLATFORM returns a pointer to a null terminated string. The AT_PAGESZ returns an integer which represents the memory page size of the machine you’re running on.

OK, so what’s the lua version?

ffi.cdef[[
static const int AT_NULL = 0;
static const int AT_IGNORE = 1;
static const int AT_EXECFD = 2;
static const int AT_PHDR = 3;
static const int AT_PHENT = 4;
static const int AT_PHNUM = 5;
static const int AT_PAGESZ = 6;
static const int AT_BASE = 7;
static const int AT_FLAGS = 8;
static const int AT_ENTRY = 9;
static const int AT_NOTELF = 10;
static const int AT_UID = 11;
static const int AT_EUID = 12;
static const int AT_GID = 13;
static const int AT_EGID = 14;
static const int AT_CLKTCK = 17;
static const int AT_PLATFORM = 15;
static const int AT_HWCAP = 16;
static const int AT_FPUCW = 18;
static const int AT_DCACHEBSIZE = 19;
static const int AT_ICACHEBSIZE = 20;
static const int AT_UCACHEBSIZE = 21;
static const int AT_IGNOREPPC = 22;
static const int AT_SECURE = 23;
static const int AT_BASE_PLATFORM = 24;
static const int AT_RANDOM = 25;
static const int AT_HWCAP2 = 26;
static const int AT_EXECFN = 31;
static const int AT_SYSINFO = 32;
static const int AT_SYSINFO_EHDR = 33;
static const int AT_L1I_CACHESHAPE = 34;
static const int AT_L1D_CACHESHAPE = 35;
static const int AT_L2_CACHESHAPE = 36;
]]

ffi.cdef[[
unsigned long getauxval(unsigned long);
]]

With this, I can then write code that looks like the following:

local function getStringAuxVal(atype)
	local res = libc.getauxval(atype)

	if res == 0 then 
		return false, "type not found"
	end

	local str = ffi.string(ffi.cast("char *", res));
	return str
end

local function getIntAuxValue(atype)
	local res = libc.getauxval(atype)

	if res == 0 then 
		return false, "type not found"
	end

	return tonumber(res);
end

local function getPtrAuxValue(atype)
	local res = libc.getauxval(atype)

	if res == 0 then 
		return false, "type not found"
	end

	return ffi.cast("intptr_t", res);
end


-- convenience functions
local function getExecPath()
	return getStringAuxVal(libc.AT_EXECFN);
end

local function getPlatform()
	return getStringAuxVal(libc.AT_PLATFORM);
end

local function getPageSize()
	return getIntAuxValue(libc.AT_PAGESZ);
end

local function getRandom()
	return getPtrAuxValue(libc.AT_RANDOM);
end

--[[
	Some test cases
--]]
print(" Platform: ", getPlatform());
print("Exec Path: ", getExecPath());
print("Page Size: ", getPageSize());
print("   Random: ", getRandom());


And so on and so forth, assuming you have the proper ‘libc’ luajit ffi binding, which gives you access to constants through the ‘libc.’ mechanism.

OK, fine, if I’m a C programmer, and I just want to port some code that’s already doing this sort of thing. By the way, the one value that we’re interested in is: AT_SYSINFO_EHDR. That contains a pointer to the beginning of our vDSO. Then you can call functions directly from there (there’s an API for that).

But, if I’m a lua programmer, I’ve come to expect more out of my environment, largely because I’m lazy and don’t like so much typing. Upon further examination, you can get this information yourself. If you’re hard core, you can look at the top of memory in your program, and map that location to a pointer you can fiddle with directly. Otherwise, you can actually get this information from a ‘file’ on Linux.

Turns out that you can get this info from ‘/proc/self/auxv’, if you’re running this command about your current process (which you most likely are). So, now what can I do with that? Well, the lua way would be the following:

-- auxv_iter.lua
local ffi = require("ffi")
local libc = require("libc")

local E = {}

-- This table maps the constant values for the various
-- AT_* types to their symbolic names.  This table is used
-- to both generate cdefs, as well and hand back symbolic names
-- for the keys.
local auxtbl = {
	[0] =  "AT_NULL";
	[1] =  "AT_IGNORE";
	[2] = "AT_EXECFD";
	[3] = "AT_PHDR";
	[4] = "AT_PHENT";
	[5] = "AT_PHNUM";
	[6] = "AT_PAGESZ";
	[7] = "AT_BASE";
	[8] = "AT_FLAGS";
	[9] = "AT_ENTRY";
	[10] = "AT_NOTELF";
	[11] = "AT_UID";
	[12] = "AT_EUID";
	[13] = "AT_GID";
	[14] = "AT_EGID";
	[17] = "AT_CLKTCK";
	[15] = "AT_PLATFORM";
	[16] = "AT_HWCAP";
	[18] = "AT_FPUCW";
	[19] = "AT_DCACHEBSIZE";
	[20] = "AT_ICACHEBSIZE";
	[21] = "AT_UCACHEBSIZE";
	[22] = "AT_IGNOREPPC";
	[23] = "AT_SECURE";
	[24] = "AT_BASE_PLATFORM";
	[25] = "AT_RANDOM";
	[26] = "AT_HWCAP2";
	[31] = "AT_EXECFN";
	[32] = "AT_SYSINFO";
	[33] = "AT_SYSINFO_EHDR";
	[34] = "AT_L1I_CACHESHAPE";
	[35] = "AT_L1D_CACHESHAPE";
	[36] = "AT_L2_CACHESHAPE";
}

-- Given a auxv key(type), and the value returned from reading
-- the file, turn the value into a lua specific type.
-- string pointers --> string
-- int values -> number
-- pointer values -> intptr_t

local function auxvaluefortype(atype, value)
	if atype == libc.AT_EXECFN or atype == libc.AT_PLATFORM then
		return ffi.string(ffi.cast("char *", value))
	end

	if atype == libc.AT_UID or atype == libc.AT_EUID or
		atype == libc.AT_GID or atype == libc.AT_EGID or 
		atype == libc.AT_FLAGS or atype == libc.AT_PAGESZ or
		atype == libc.AT_HWCAP or atype == libc.AT_CLKTCK or 
		atype == libc.AT_PHENT or atype == libc.AT_PHNUM then

		return tonumber(value)
	end

	if atype == libc.AT_SECURE then
		if value == 0 then 
			return false
		else
			return true;
		end
	end


	return ffi.cast("intptr_t", value);
end

-- iterate over the auxv values at the specified path
-- if no path is specified, use '/proc/self/auxv' to get
-- the values for the currently running program
local function auxviterator(path)
	path = path or "/proc/self/auxv"
	local fd = libc.open(path, libc.O_RDONLY);

	local params = {
		fd = fd;
		keybuff = ffi.new("intptr_t[1]");
		valuebuff = ffi.new("intptr_t[1]");
		buffsize = ffi.sizeof(ffi.typeof("intptr_t"));
	}


	local function gen_value(param, state)
		local res1 = libc.read(param.fd, param.keybuff, param.buffsize)
		local res2 = libc.read(param.fd, param.valuebuff, param.buffsize)
		if param.keybuff[0] == 0 then
			libc.close(param.fd);
			return nil;
		end

		local atype = tonumber(param.keybuff[0])
		return state, atype, auxvaluefortype(atype, param.valuebuff[0])
	end

	return gen_value, params, 0

end

-- generate ffi.cdef calls to turn the symbolic type names
-- into constant integer values
local cdefsGenerated = false;

local function gencdefs()
	for k,v in pairs(auxtbl) do		
		-- since we don't know if this is already defined, we wrap
		-- it in a pcall to catch the error
		pcall(function() ffi.cdef(string.format("static const int %s = %d;", v,k)) end)
	end
	cdefsGenerated = true;
end

-- get a single value for specified key.  A path can be specified
-- as well (default it '/proc/self/auxv')
-- this is most like the gnuC getauxval() function
local function getOne(key, path)
	-- iterate over the values, looking for the one we want
	for _, atype, value in auxviterator(path) do
		if atype == key then
			return value;
		end
	end

	return nil;
end

E.gencdefs = gencdefs;
E.keyvaluepairs = auxviterator;	
E.keynames = auxtbl;
E.getOne = getOne;

setmetatable(E, {
	-- we allow the user to specify one of the symbolic constants
	-- when doing a 'getOne()'.  This indexing allows for the creation
	-- and use of those constants if they haven't already been specified
	__index = function(self, key)
		if not cdefsGenerated then
			gencdefs();
		end

		local success, value = pcall(function() return ffi.C[key] end)
		if success then
			rawset(self, key, value);
			return value;
		end

		return nil;
	end,

})

return E

In a nutshell, this is all you need for all the lua based auxv goodness in your life. Here are a couple of examples of usage in action:

local init = require("test_setup")()
local auxv_util = require("auxv_iter")
local apairs = auxv_util.keyvaluepairs;
local keynames = auxv_util.keynames;
local auxvGetOne = auxv_util.getOne;


--auxv_util.gencdefs();
print("==== Iterate All ====")
local function printAll()
	for _, key, value in apairs(path) do
		io.write(string.format("%20s[%2d] : ", keynames[key], key))
		print(value);
	end
end

-- print all the entries
printAll();

-- try to get a specific one
print("==== Get Singles ====")
print(" Platform: ", auxvGetOne(auxv_util.AT_PLATFORM))
print("Page Size: ", auxvGetOne(auxv_util.AT_PAGESZ))

The output from printAll() might look like this:

==== Iterate All ====
     AT_SYSINFO_EHDR[33] : 140721446887424LL
            AT_HWCAP[16] : 3219913727
           AT_PAGESZ[ 6] : 4096
           AT_CLKTCK[17] : 100
             AT_PHDR[ 3] : 4194368LL
            AT_PHENT[ 4] : 56
            AT_PHNUM[ 5] : 10
             AT_BASE[ 7] : 140081410764800LL
            AT_FLAGS[ 8] : 0
            AT_ENTRY[ 9] : 4208720LL
              AT_UID[11] : 1000
             AT_EUID[12] : 1000
              AT_GID[13] : 1000
             AT_EGID[14] : 1000
           AT_SECURE[23] : false
           AT_RANDOM[25] : 140721446787113LL
           AT_EXECFN[31] : /usr/local/bin/luajit
         AT_PLATFORM[15] : x86_64

The printAll() function uses the auxv iteration function, which in turns reads the key/value pairs directly from the /proc/self/auxv file. No need for the GNU C lib function at all. It goes further and turns the raw ‘unsigned long’ values into the appropriate data type based on what kind of data the key specified represents. So, you get lua string, and not just a pointer to a C string.

In the second example, getting singles, the output is simply this:

 Platform: 	x86_64
Page Size: 	4096

The code for this goes into a bit of trickery that’s possible with luajit. first of all, notice the use of ‘auxv_util.AT_PAGESZ’. There is nothing in the auxv_iter.lua file that supports this value directly. There is the table of names, and then there’s that ‘setmetatable’ at the end of things. Here’s where the trickery happens. Basically, this function is called whenever you put a ‘.’ after a table to try and access something, and that something isn’t in the table. You get a chance to make something up and return it. Ini this case, we first call ‘gencdefs()’ if that hasn’t already been called. This will generate ‘static const int XXX’ for all the values in the table of names, so that we can then do a lookup of the values in the ‘ffi.C.’ namespace, using the name. If we find a name, then we add it to the table, so next time the lookup will succeed, and we won’t end up calling the __index function.

At any rate, we now have the requisite value to lookup. Then we just roll through the iterator, and return when we’ve got the value we were looking for. The conversion to the appropriate lua type is automatic.

And there you have it! From relative obscurity, to complete usability, in one iterator. Being able to actually get function pointers in the vDSO is the next step. That will require another API wrapper, or worst case, and all encompassing ELF parser…


Spelunking Linux – Decomposing systemd

Honestly, I don’t know what all the fuss is about. What is systemd?  It’s that bit of code that gets things going on your Linux machine once the kernel has loaded itself.  You know, dealing with bringing up services, communicating between services, running the udev and dbus stuff, etc.

So, I wrote an ffi wrapper for the libsystemd.so library This has proven to be handy, as usual, I can essentially write what looks like standard C code, but it’s actually LuaJIT goodness.

--[[
	Test using SDJournal as a cursor over the journal entries

	In this case, we want to try the various cursor positioning
	operations to ensure the work correctly.
--]]
package.path = package.path..";../src/?.lua"

local SDJournal = require("SDJournal")
local sysd = require("systemd_ffi")

local jnl = SDJournal()

-- move forward a few times
for i=1,10 do
	jnl:next();
end

-- get the cursor label for this position
local label10 = jnl:positionLabel()
print("Label 10: ", label10)

-- seek to the beginning, print that label
jnl:seekHead();
jnl:next();
local label1 = jnl:positionLabel();
print("Label 1: ", label1);


-- seek to label 10 again
jnl:seekLabel(label10)
jnl:next();
local label3 = jnl:positionLabel();
print("Label 3: ", label3)
print("label 3 == label 10: ", label3 == label10)

In this case, a simple journal object which makes it relatively easy to browse through the systemd journals that are laying about. That’s handy. Combined with the luafun functions, browsing through journals suddenly becomes a lot easier, with the full power of lua to form very interesting queries, or other operations.

--[[
	Test cursoring over journal, turning each entry
	into a lua table to be used with luafun filters and whatnot
--]]
package.path = package.path..";../src/?.lua"

local SDJournal = require("SDJournal")
local sysd = require("systemd_ffi")
local fun = require("fun")()

-- Feed this routine a table with the names of the fields
-- you are interested in seeing in the final output table
local function selection(fields, aliases)
	return function(entry)
		local res = {}
		for _, k in ipairs(fields) do
			if entry[k] then
				res[k] = entry[k];
			end
		end
		return res;
	end
end

local function  printTable(entry)
	print(entry)
	each(print, entry)
end

local function convertCursorToTable(cursor)
	return cursor:currentValue();
end


local function printJournalFields(selector, flags)
	flags = flags or 0
	local jnl1 = SDJournal();

	if selector then
		each(printTable, map(selector, map(convertCursorToTable, jnl1:entries())))
	else
		each(printTable, map(convertCursorToTable, jnl1:entries()))	
	end
end

-- print all fields, but filter the kind of journal being looked at
--printJournalFields(nil, sysd.SD_JOURNAL_CURRENT_USER)
--printJournalFields(nil, sysd.SD_JOURNAL_SYSTEM)

-- printing specific fields
--printJournalFields(selection({"_HOSTNAME", "SYSLOG_FACILITY"}));
printJournalFields(selection({"_EXE", "_CMDLINE"}));

-- to print all the fields available per entry
--printJournalFields();

In this case, we have a simple journal printer, which will take a list of fields, as well as a selection of the kinds of journals to look at. That’s quite useful as you can easily generate JSON or XML, or Lua tables on the output end, without much work. You can easily select which fields you want to display, and you could even change the names along the way. You have the full power of lua at your disposal to do whatever you want with the data.

In this case, the SDJournal object is pretty straight forward. It simply wraps the various ‘sd_xxx’ calls within the library to get its work done. What about some other cases? Does the systemd library need to be used for absolutely everything that it does? The answer is ‘no’, you can do a lot of the work yourself, because at the end of the day, the passive part of systemd is just a bunch of file system manipulation.

Here’s where it gets interesting in terms of decomposition.

Within the libsystemd library, there is the sd_get_machine_names() function:

_public_ int sd_get_machine_names(char ***machines) {
        char **l = NULL, **a, **b;
        int r;

        assert_return(machines, -EINVAL);

        r = get_files_in_directory("/run/systemd/machines/", &l);
        if (r < 0)
                return r;

        if (l) {
                r = 0;

                /* Filter out the unit: symlinks */
                for (a = l, b = l; *a; a++) {
                        if (startswith(*a, "unit:") || !machine_name_is_valid(*a))
                                free(*a);
                        else {
                                *b = *a;
                                b++;
                                r++;
                        }
                }

                *b = NULL;
        }

        *machines = l;
        return r;
}

The lua wrapper for this would simply be:

ffi.cdef("int sd_get_machine_names(char ***machines)")

Great, for those who already know this call, you can allocate a char * array, get the array of string values, and party on. But what about the lifetime of those strings, and if you’re doing it as an iterator, when do you ever free stuff, and isn’t this all wasteful?

So, looking at that code in the library, you might think, ‘wait a minute, I could just replicate that in Lua, and get it done without doing any ffi stuff at all!

local fun = require("fun")

local function isNotUnit(name)
	return not strutil.startswith(name, "unit:")
end

function SDMachine.machineNames(self)
	return fun.filter(isNotUnit, fsutil.files_in_directory("/run/systemd/machines/"))
end

OK, that looks simple. But what’s happening with that ‘files_in_directory()’ function? Well, that’s the meat and potatoes of this operation.

local function nil_iter()
    return nil;
end

-- This is a 'generator' which will continue
-- the iteration over files
local function gen_files(dir, state)
    local de = nil

    while true do
       de = libc.readdir(dir)
    
        -- if we've run out of entries, then return nil
        if de == nil then return nil end

    -- check the entry to see if it's an actual file, and not
    -- a directory or link
        if dirutil.dirent_is_file(de) then
            break;
        end
    end

    
    local name = ffi.string(de.d_name);

    return de, name
end

local function files_in_directory(path)
    local dir = libc.opendir(path)

    if not dir==nil then return nil_iter, nil, nil; end

    -- make sure to do the finalizer
    -- for garbage collection
    ffi.gc(dir, libc.closedir);

    return gen_files, dir, initial;
end

In this case, files_in_directory() takes a string path, like “/run/systemd/machines”, and just iterates over the directory, returning only the files found there. It’s convenient in that it will skip so called ‘hidden’ files, and things that are links. This simple technique/function can be the cornerstone of a lot of things that view files in Linux. The function leverages the libc opendir(), readdir(), functions, so there’s nothing new here, but it wraps it all up in this convenient iterator, which is nice.

systemd is about a whole lot more than just browsing directories, but that’s certain a big part of it. When you break it down like this, you begin to realize that you don’t actually need to use a ton of stuff from the library. In fact, it’s probably better and less resource intensive to just ‘go direct’ where it makes sense. In this case, it was just implementing a few choice routines to make file iteration work the same as it does in systemd. As this binding evolves, I’m sure there is other low lying fruit that I’ll be able to pluck to make it even more interesting, useful, and independent of the libsystemd library.


Experiences of a Confessed LuaJIT Binder – The all in one

It seems I’ve been writing binders, wrappers, frameworks, and the like for the entirety of my 35 years of programming.  Why is that?  Well, because the tools I use to program are often times not the same as those used to create various libraries, frameworks, and the like.  Sometimes it’s just that there is no universal bond between all those disparate libraries that you want to use, so you end up writing some wrappers, to make things look and behave similar.

I’ve written quite a lot about LuaJIT bindings, and produced rather a lot of them myself.  Most recently, I was writing a binding for the D-Bus library on Linux.  D-Bus is a communications protocol and mechanism meant to support Interprocess Communications.  As such, it’s about low latency, speed on read and write, and relative simplicity.  You can do everything from pop-up and alert, to logging commands in a persistent log.

Here I want to show an approach that I’ve slowly grown and evolved over the past few years.  This approach to writing bindings is written with a few constraints and aspirations in mind:

  • Provide a low level interface that is as true to the underlying framework as possible.
  • Porting typical C code to the wrapping should be straight forward and obvious
  • Provide an interface that supports and leverages Lua idioms and practices
  • Provide a wrapper that does not require a separate compilation step

That list of constraints is brief, but can cause enough trouble depending on how seriously you take each one of them.  So, let’s get to it.

Here I will use the LJIT2dbus project, because it gives a chance to exhibit all of the constraints listed above.  Here’s a bit of code:

 

--dbus.lua
local ffi = require("ffi")

if not DBUS_INCLUDED then

local C = {}

require ("dbus-arch-deps");
require ("dbus-address");
require ("dbus-bus");
require ("dbus-connection");
require ("dbus-errors");
require ("dbus-macros");
require ("dbus-message");
require ("dbus-misc");
require ("dbus-pending-call");
require ("dbus-protocol");
require ("dbus-server");
require ("dbus-shared");
require ("dbus-signature");
require ("dbus-syntax");
require ("dbus-threads");
require ("dbus-types");

C.TRUE = 1;
C.FALSE = 0;

Yah, ok, not so exciting, just a bunch of ‘require()’ statements pulling in other modules. What’s in one of these modules?

-- dbus-misc.lua

local ffi = require("ffi")

require("dbus-types")
require("dbus-errors")

ffi.cdef[[
char*       dbus_get_local_machine_id  (void);

void        dbus_get_version           (int *major_version_p,
                                        int *minor_version_p,
                                        int *micro_version_p);
]]

Yes, again, more require() statements. But then, there are those couple of C functions that are defined. The other files have similar stuff in them. This is the basis of the constraint related to making the code familiar to a C programmer.

Let’s look at another file which might turn out to be more illustrative:

local ffi = require("ffi")


require("dbus-macros")
require("dbus-types")
require("dbus-protocol")




ffi.cdef[[
/** Mostly-opaque type representing an error that occurred */
typedef struct DBusError DBusError;
]]

ffi.cdef[[
/**
 * Object representing an exception.
 */
struct DBusError
{
  const char *name;    /**< public error name field */
  const char *message; /**< public error message field */

  unsigned int dummy1 : 1; /**< placeholder */
  unsigned int dummy2 : 1; /**< placeholder */
  unsigned int dummy3 : 1; /**< placeholder */
  unsigned int dummy4 : 1; /**< placeholder */
  unsigned int dummy5 : 1; /**< placeholder */

  void *padding1; /**< placeholder */
};
]]


local function DBUS_ERROR_INIT()
  return ffi.new("struct DBusError", { NULL, NULL, TRUE, 0, 0, 0, 0, NULL });
end


ffi.cdef[[
void        dbus_error_init      (DBusError       *error);
void        dbus_error_free      (DBusError       *error);
void        dbus_set_error       (DBusError       *error,
                                  const char      *name,
                                  const char      *message,
                                  ...);
void        dbus_set_error_const (DBusError       *error,
                                  const char      *name,
                                  const char      *message);
void        dbus_move_error      (DBusError       *src,
                                  DBusError       *dest);
dbus_bool_t dbus_error_has_name  (const DBusError *error,
                                  const char      *name);
dbus_bool_t dbus_error_is_set    (const DBusError *error);
]]

So more require() statements, a data structure, and some C functions. And how to use it? For that, let’s look at the bottom of the dbus.lua file.

 

local Lib_dbus = ffi.load("dbus-1")


local exports = {
	Lib_dbus = Lib_dbus;
}
setmetatable(exports, {
	__index = function(self, key)
		local value = nil;
		local success = false;

		-- try looking in table of constants
		value = C[key]
		if value then
			rawset(self, key, value)
			return value;
		end


		-- try looking in the library for a function
		success, value = pcall(function() return Lib_dbus[key] end)
		if success then
			rawset(self, key, value);
			return value;
		end

		-- try looking in the ffi.C namespace, for constants
		-- and enums
		success, value = pcall(function() return ffi.C[key] end)
		--print("looking for constant/enum: ", key, success, value)
		if success then
			rawset(self, key, value);
			return value;
		end

		-- Or maybe it's a type
		success, value = pcall(function() return ffi.typeof(key) end)
		if success then
			rawset(self, key, value);
			return value;
		end

		return nil;
	end,
})

DBUS_INCLUDED = exports
end

return DBUS_INCLUDED

And just for reference, a typical usage of the same:

local dbus = require("dbus")
  local err = dbus.DBusError();
  dbus.dbus_error_init(err);
  local bus = dbus.dbus_bus_get(dbus.DBUS_BUS_SESSION, err);

  if (dbus.dbus_bus_name_has_owner(bus, SYSNOTE_NAME, err) == 0) then
    io.stderr:write("Name has no owner on the bus!\n");
    return EXIT_FAILURE;
  end

First case is the creation of the ‘local err’ value.

  local err = dbus.DBusError();
  dbus.dbus_error_init(err);

The dbus.lua file does not have a function called ‘DBusError()’. All it has done is load a bunch of type and function declarations, to be used by the LuaJIT ffi mechanism. So, how do we get a function of that name? It doesn’t even exist in one of the required modules.

The trick here is in the ‘__index’ function of the dbus.lua table. The way the Lua language works, any time you make what looks like an access to the member of a table, if it can’t be found in the table, and if the __index function is implemented, it will get called, with the key passed in as a parameter.

In this case, the ‘__index’ function implements a series of lookups, trying to find the value that is associated with the specified key. First it tries looking in a table of constants. Then it tries looking in the actual library, for a function with the specified name. If it doesn’t find the value as a function, it will try to find it as a constant in the C namespace. This will find any enum, or static const int values that have been defined in ffi.cdef[[]] blocks. Finally, if it doesn’t find the key as one of the constants, it tries to figure out if maybe it’s a type (‘ffi.typeof’). This is in the case of DBusError, this one will succeed, and we’ll get back a type.

As it turns out, the types returned from the LuaJIT ffi can be used as constructors.
So, what we really have is this:

local errType = dbus.DBusError;
local err = errType();

The fact that you can just do it all on one line is a short hand.

This is a great convenience. Also, once the value is found, it is stuck into the dbus table itself, so that the next time it’s used, this long lookup won’t occur. The type is already known, and will just be retrieved from the dbus (exports) table directly.

Well, this works for the other kinds of types as well. For example, the ‘dbus_error_init()’ function is defined in a ffi.cdef[[]] block, and that’s about all. So, when re reference it through dbus.dbus_error_init(), we’re going to look it up in the library, and try to use the function found there. And again, once it’s found, it will be stuffed into the dbus (exports) table, for future reference.

This works out great, in that it’s fairly minimal amount of work to get all interesting things defined in ffi.cdef blocks, then just use this __index lookup trick to actually return the values.

I’ve come to this style because otherwise you end up doing a lot more work trying to make the constants, types, enums, and functions universally accessible from anywhere within your program, without resorting to global values. Of course you can make the __index lookup as complex as you like. You can lazily load in modules, for example.

That’s it for this round. An all in one interface that gives you access to constants, enums, and functions in a ffi.cdef wrapped library.


Spelunking Linux – Yes, the system truly is a database

In this article: Isn’t the whole system just a database? – libdrm, I explored a little bit of the database nature of Linux by using libudev to enumerate and open libdrm devices.  After that, I spent some time bringing up a USB module: LJIT2libusb.  libusb is a useful cross platform library that makes it relatively easy to gain access to the usb functions on multiple platforms.  It can enumerate devices, deal with hot plug notifications, open up, read, write, etc.

At its core, on Linux at least, libusb tries to leverage the uvdev capabilities of the target system, if those capabilities are there.  This means that device enumeration and hot plugging actually use the libuvdev stuff.  In fact, the code for enumerating those usb devices in libusb looks like this:

 

	udev_enumerate_add_match_subsystem(enumerator, "usb");
	udev_enumerate_add_match_property(enumerator, "DEVTYPE", "usb_device");
	udev_enumerate_scan_devices(enumerator);
	devices = udev_enumerate_get_list_entry(enumerator);

There’s more stuff of course, to turn that into data structures which are appropriate for use within the libusb view of the world. But, here’s the equivalent using LLUI and the previously developed UVDev stuff:

local function isUsbDevice(dev)
	if dev.IsInitialized and dev:getProperty("subsystem") == "usb" and
		dev:getProperty("devtype") == "usb_device" then
		return true;
	end

	return false;
end

each(print, filter(isUsbDevice, ctxt:devices()))

It’s just illustrative, but it’s fairly simple to understand I think. The ‘ctxt:devices()’ is an iterator over all the devices in the system. The ‘filter’ function is part of the luafun functional programming routines available to Lua. the ‘isUsbDevice’ is a predicate function, which returns ‘true’ when the device in question matches what it believes makes a device a ‘usb’ device. In this case, its the subsystem and dev_type properties which are used.

Being able to easily query devices like this makes life a heck of a lot easier. No funky code polluting my pure application. Just these simple query predicates written in Lua, and I’m all set. So, instead of relying on libusb to enumerate my usb devices, I can just enumerate them directly using uvdev, which is what the library does anyway. Enumeration and hotplug handing is part of the library. The other part is the actual send and receiving of data. For that, the libusb library is still primarily important, as replacing that code will take some time.

Where else can this great query capability be applied? Well, libudev is just a nice wrapper atop sysfs, which is that virtual file system built into Linux for gaining access to device information and control of the same. There’s all sorts of stuff in there. So, let’s say you want to list all the block devices?

local function isBlockDevice(dev)
	if dev.IsInitialized and dev:getProperty("subsystem") == "block" then
		return true;
	end

	return false;
end

That will get all the devices which are in the subsystem “block”. That includes physical disks, virtual disks, partitions, and the like. If you’re after just the physical ones, then you might use something like this:

local function isPhysicalBlockDevice(dev)
	if dev.IsInitialized and dev:getProperty("subsystem") == "block" and
		dev:getProperty("devtype") == "disk" and
		dev:getProperty("ID_BUS") ~= nil then
		return true;
	end

	return false;
end

Here, a physical device is indicated by subsystem == ‘block’ and devtype == ‘disk’ and the ‘ID_BUS’ property exists, assuming any physical disk would show up on one of the system’s buses. This won’t catch a SD card though. For that, you’d use the first one, and then look for a property related to being an SD card. Same goes for ‘cd’ vs ramdisk, or whatever. You can make these queries as complex or simple as you want.

Once you have a device, you can simply open it using the “SysName” parameter, handed to an fopen() call.

I find this to be a great way to program. It makes the creation of utilities such as ‘lsblk’ relatively easy. You would just look for all the block devices and their partitions, and put them into a table. Then separately, you would have a display routine, which would consume the table and generate whatever output you want. I find this much better than the typical Linux tools which try to do advanced display using the terminal window. That’s great as far as it goes, but not so great if what you really want is a nice html page generated for some remote viewing.

At any rate, this whole libudev exploration is a great thing. You can list all devices easily, getting every bit of information you care to examine. Since it’s all scriptable, it’s fairly easy to taylor your queries on the fly, looking at, discovering, and the like. I discovered that the thumb print reader in my old laptop was made by Broadcom, and my webcam by 3M? It’s just so much fun.

Well there you have it. The more you spelunk, the more you know, and the more you can fiddle about.


Functional Programming with libevdev

Previously, I wrote about the LuaJIT binding to libevdev: LJIT2libevdev – input device tracking on Linux

In that case, I went over the basics, and it all works out well, and you get get at keyboard and mouse events from Lua, with very little overhead.  Feels natural, light weight iterators, life is good.

Recently, I wanted to go one step further though.  Once you go down the path of using iterators, you begin to think about functional programming.  At least that’s how I’m wired.  So, I’ve pushed it a bit further, and incorporated the fun.lua module to make things a bit more interesting.  fun.lua provides things like each, any, all, map, filter, and other stuff which makes dealing with streams of data really brainlessly simple.

Here’s an example of spewing out the stream of events for a device:

 

package.path = package.path..";../?.lua"

local EVEvent = require("EVEvent")
local dev = require("EVDevice")(arg[1])
assert(dev)

local utils = require("utils")
local fun = require("fun")

-- print out the device particulars before 
-- printing out the stream of events
utils.printDevice(dev);
print("===== ===== =====")

-- perform the actual printing of the event
local function printEvent(ev)
    print(string.format("{'%s', '%s', %d};",ev:typeName(),ev:codeName(),ev:value()));
end

-- decide whether an event is interesting enough to 
-- print or not
local function isInteresting(ev)
	return ev:typeName() ~= "EV_SYN" and ev:typeName() ~= "EV_MSC"
end

-- convert from a raw 'struct input_event' to the EVEvent object
local function toEVEvent(rawev)
    return EVEvent(rawev)
end


fun.each(printEvent, 
    fun.filter(isInteresting, 
        fun.map(toEVEvent,
            dev:rawEvents())));

 
The last line there, where it starts with “fun.each”, can be read from the inside out.

dev:rawEvents() – is an iterator, it returns a steady stream of events coming off of the specified device.

fun.map(toEVEvent,…) – the map function will take some input, run it through the provided function, and send that as the output. In functional programming, this would be transformation.

fun.filter(isInteresting, …) – filter takes an input, allplies the specified predicate, and allows that item to pass through or not based on the predicate returning true or false.

fun.each(printEvent, …) – the ‘each’ function takes each of the items coming from the stream of items, and applies the specified function, in this case the printEvent function.

This is a typical pull chain. The events are pulled out of the lowest level iterator as they are needed by subsequent operations in the chain. If the chain stops, then nothing further is pulled.

This is a great way to program because doing different things with the stream of events is simply a matter of replacing items in the chain. For example, if we just want the first 100 items, we could write

each(printEvent, 
    take_n(100, 
      filter(isInteresting, 
        map(toEVEvent, dev:rawEvents()))));

You can create tees, send data elsewhere, construct new streams by combining streams. There are a few key operators, and with them you can do a ton of stuff.

At the very heart, the EVDevice:rawEvents() iterator looks like this:

function EVDevice.rawEvents(self)
	local function iter_gen(params, flags)
		local flags = flags or ffi.C.LIBEVDEV_READ_FLAG_NORMAL;
		local ev = ffi.new("struct input_event");
		--local event = EVEvent(ev);
		local rc = 0;


		repeat
			rc = evdev.libevdev_next_event(params.Handle, flags, ev);
		until rc ~= -libc.EAGAIN
		
		if (rc == ffi.C.LIBEVDEV_READ_STATUS_SUCCESS) or (rc == ffi.C.LIBEVDEV_READ_STATUS_SYNC) then
			return flags, ev;
		end

		return nil, rc;
	end

	return iter_gen, {Handle = self.Handle}, state 
end

This is better than the previous version where we could pass in a predicate. In this case, we don’t even convert to the EVEvent object at this early low level stage, because we’re not sure what subsequent links in the chain will want to do, so we leave it out. This simplifies the code at the lowest levels, and makes it more composable, which is a desirable effect.

And so it goes. This binding started out as a simple ffi.cdef hack job, then objects were added to encapsulate some stuff, and now the iterators are being used in a functional programming style, which makes the whole thing that much more useful and integratable.


LJIT2pixman – Drawing on Linux

On the Linux OS, libpixman is at the heart of doing some low level drawing. It’s very simple stuff like compositing, rendering of gradients and the like. It takes up the slack where hardware acceleration doesn’t exist. So, graphics libraries such as Cairo leverage libpixman at the bottom to take care of the basics. LJIT2pixman is a little project that delivers a fairly decent LuaJIT binding to that library.

Here’s one demo of what it can do.

checkerboard

Of course, given the title of this post, you know LuaJIT is involved, so you can expect that there’s some way of doing this in LuaJIT.

package.path = package.path..";../?.lua"

local ffi = require("ffi")
local bit = require("bit")
local band = bit.band

local pixman = require("pixman")()
local pixlib = pixman.Lib_pixman;
local ENUM = ffi.C
local utils = require("utils")
local save_image = utils.save_image;

local function D2F(d) return (pixman_double_to_fixed(d)) end

local function main (argc, argv)

	local WIDTH = 400;
	local HEIGHT = 400;
	local TILE_SIZE = 25;

    local trans = ffi.new("pixman_transform_t", { {
	    { D2F (-1.96830), D2F (-1.82250), D2F (512.12250)},
	    { D2F (0.00000), D2F (-7.29000), D2F (1458.00000)},
	    { D2F (0.00000), D2F (-0.00911), D2F (0.59231)},
	}});

    local checkerboard = pixlib.pixman_image_create_bits (ENUM.PIXMAN_a8r8g8b8,
					     WIDTH, HEIGHT,
					     nil, 0);

    local destination = pixlib.pixman_image_create_bits (ENUM.PIXMAN_a8r8g8b8,
					    WIDTH, HEIGHT,
					    nil, 0);

    for i = 0, (HEIGHT / TILE_SIZE)-1  do
		for j = 0, (WIDTH / TILE_SIZE)-1 do
	    	local u = (j + 1) / (WIDTH / TILE_SIZE);
	    	local v = (i + 1) / (HEIGHT / TILE_SIZE);
	    	local black = ffi.new("pixman_color_t", { 0, 0, 0, 0xffff });
	    	local white = ffi.new("pixman_color_t", {
				v * 0xffff,
				u * 0xffff,
				(1 - u) * 0xffff,
				0xffff });
	    	
	    	local c = white;

	    	if (band(j, 1) ~= band(i, 1)) then
				c = black;
	    	end

	    	local fill = pixlib.pixman_image_create_solid_fill (c);

	    	pixlib.pixman_image_composite (ENUM.PIXMAN_OP_SRC, fill, nil, checkerboard,
				    0, 0, 0, 0, j * TILE_SIZE, i * TILE_SIZE,
				    TILE_SIZE, TILE_SIZE);
		end
    end

    pixlib.pixman_image_set_transform (checkerboard, trans);
    pixlib.pixman_image_set_filter (checkerboard, ENUM.PIXMAN_FILTER_BEST, nil, 0);
    pixlib.pixman_image_set_repeat (checkerboard, ENUM.PIXMAN_REPEAT_NONE);

    pixlib.pixman_image_composite (ENUM.PIXMAN_OP_SRC,
			    checkerboard, nil, destination,
			    0, 0, 0, 0, 0, 0,
			    WIDTH, HEIGHT);

	save_image (destination, "checkerboard.ppm");

    return true;
end


main(#arg, arg)


With a couple of exceptions, the code looks almost exactly like its C based counterpart. I actually think this is a very good thing, because you can rapidly prototype something from a C coding example, but have all the support and protection that a dynamic language such as lua provides.

And here’s another:

conical-test

In this case is the conical-test.lua demo doing the work.

package.path = package.path..";../?.lua"

local ffi = require("ffi")
local bit = require("bit")
local band = bit.band
local lshift, rshift = bit.lshift, bit.rshift

local pixman = require("pixman")()
local pixlib = pixman.Lib_pixman;
local ENUM = ffi.C
local utils = require("utils")
local save_image = utils.save_image;
local libc = require("libc")

local SIZE = 128
local GRADIENTS_PER_ROW = 7
local NUM_GRADIENTS = 35

local NUM_ROWS = ((NUM_GRADIENTS + GRADIENTS_PER_ROW - 1) / GRADIENTS_PER_ROW)
local WIDTH = (SIZE * GRADIENTS_PER_ROW)
local HEIGHT = (SIZE * NUM_ROWS)

local function double_to_color(x)
    return (x*65536) - rshift( (x*65536), 16)
end

local function PIXMAN_STOP(offset,r,g,b,a)		
   return ffi.new("pixman_gradient_stop_t", { pixman_double_to_fixed (offset),		
	{					
	    double_to_color (r),		
		double_to_color (g),		
		double_to_color (b),		
		double_to_color (a)		
	}					
    });
end

local stops = ffi.new("pixman_gradient_stop_t[4]",{
    PIXMAN_STOP (0.25,       1, 0, 0, 0.7),
    PIXMAN_STOP (0.5,        1, 1, 0, 0.7),
    PIXMAN_STOP (0.75,       0, 1, 0, 0.7),
    PIXMAN_STOP (1.0,        0, 0, 1, 0.7)
});

local  NUM_STOPS = (ffi.sizeof (stops) / ffi.sizeof (stops[0]))


local function create_conical (index)
    local c = ffi.new("pixman_point_fixed_t")
    c.x = pixman_double_to_fixed (0);
    c.y = pixman_double_to_fixed (0);

    local angle = (0.5 / NUM_GRADIENTS + index / NUM_GRADIENTS) * 720 - 180;

    return pixlib.pixman_image_create_conical_gradient (c, pixman_double_to_fixed (angle), stops, NUM_STOPS);
end


local function main (argc, argv)

    local transform = ffi.new("pixman_transform_t");

    local dest_img = pixlib.pixman_image_create_bits (ENUM.PIXMAN_a8r8g8b8,
					 WIDTH, HEIGHT,
					 nil, 0);
 
    utils.draw_checkerboard (dest_img, 25, 0xffaaaaaa, 0xff888888);

    pixlib.pixman_transform_init_identity (transform);

    pixlib.pixman_transform_translate (NULL, transform,
				pixman_double_to_fixed (0.5),
				pixman_double_to_fixed (0.5));

    pixlib.pixman_transform_scale (nil, transform,
			    pixman_double_to_fixed (SIZE),
			    pixman_double_to_fixed (SIZE));
    pixlib.pixman_transform_translate (nil, transform,
				pixman_double_to_fixed (0.5),
				pixman_double_to_fixed (0.5));

    for i = 0, NUM_GRADIENTS-1 do
    
	   local column = i % GRADIENTS_PER_ROW;
	   local row = i / GRADIENTS_PER_ROW;

	   local src_img = create_conical (i); 
	   pixlib.pixman_image_set_repeat (src_img, ENUM.PIXMAN_REPEAT_NORMAL);
   
	   pixlib.pixman_image_set_transform (src_img, transform);
	
	   pixlib.pixman_image_composite32 (
	       ENUM.PIXMAN_OP_OVER, src_img, nil,dest_img,
	       0, 0, 0, 0, column * SIZE, row * SIZE,
	       SIZE, SIZE);
	
	   pixlib.pixman_image_unref (src_img);
    end

    save_image (dest_img, "conical-test.ppm");

    pixlib.pixman_image_unref (dest_img);

    return true;
end


main(#arg, arg)

linear-gradient

Linear Gradient demo

screen-test

screen demo (transparency).
Perhaps this is the easiest one of all. All the interesting functions are placed into the global namespace, so they can be accessed easily, just like everything in C is globally available.

package.path = package.path..";../?.lua"

local ffi = require("ffi")
local bit = require("bit")
local band = bit.band
local lshift, rshift = bit.lshift, bit.rshift

local pixman = require("pixman")()
local pixlib = pixman.Lib_pixman;
local ENUM = ffi.C
local utils = require("utils")
local save_image = utils.save_image;
local libc = require("libc")


local function main (argc, argv)

    WIDTH = 40
    HEIGHT = 40
    
    local src1 = ffi.cast("uint32_t *", libc.malloc (WIDTH * HEIGHT * 4));
    local src2 = ffi.cast("uint32_t *", libc.malloc (WIDTH * HEIGHT * 4));
    local src3 = ffi.cast("uint32_t *", libc.malloc (WIDTH * HEIGHT * 4));
    local dest = ffi.cast("uint32_t *", libc.malloc (3 * WIDTH * 2 * HEIGHT * 4));

    for i = 0, (WIDTH * HEIGHT)-1 do
	   src1[i] = 0x7ff00000;
	   src2[i] = 0x7f00ff00;
	   src3[i] = 0x7f0000ff;
    end

    for i = 0, (3 * WIDTH * 2 * HEIGHT)-1 do
	   dest[i] = 0x0;
    end

    local simg1 = pixman_image_create_bits (ENUM.PIXMAN_a8r8g8b8, WIDTH, HEIGHT, src1, WIDTH * 4);
    local simg2 = pixman_image_create_bits (ENUM.PIXMAN_a8r8g8b8, WIDTH, HEIGHT, src2, WIDTH * 4);
    local simg3 = pixman_image_create_bits (ENUM.PIXMAN_a8r8g8b8, WIDTH, HEIGHT, src3, WIDTH * 4);
    local dimg  = pixman_image_create_bits (ENUM.PIXMAN_a8r8g8b8, 3 * WIDTH, 2 * HEIGHT, dest, 3 * WIDTH * 4);

    pixman_image_composite (ENUM.PIXMAN_OP_SCREEN, simg1, NULL, dimg, 0, 0, 0, 0, WIDTH, HEIGHT / 4, WIDTH, HEIGHT);
    pixman_image_composite (ENUM.PIXMAN_OP_SCREEN, simg2, NULL, dimg, 0, 0, 0, 0, (WIDTH/2), HEIGHT / 4 + HEIGHT / 2, WIDTH, HEIGHT);
    pixman_image_composite (ENUM.PIXMAN_OP_SCREEN, simg3, NULL, dimg, 0, 0, 0, 0, (4 * WIDTH) / 3, HEIGHT, WIDTH, HEIGHT);

    save_image (dimg, "screen-test.ppm");
    
    return true;
end

main(#arg, arg)

I really like this style of rapid prototyping. The challenge I have otherwise is that it’s just too time consuming to consume things in their raw C form. Things like build systems, compiler versions, and other forms of magic always seem to get in the way. And if it’s not that stuff, it’s memory management, and figuring out the inevitable crashes.

Once you wrap a library up in a bit of lua goodness though, it becomes much more approachable. It may or may not be the most performant thing in the world, but you can worry about that later.

Having this style of rapid prototyping available saves tremendous amounts of time. Since you’re not wasting your time on the mundane (memory management, build system, compiler extensions and the like), you can spend much more time on doing mashups of the various pieces of technology at hand.

In this case it was libpixman. Previously I’ve tackled everything from hot plugging usb devices, to consuming keystrokes, and putting a window on the screen.

What’s next? Well, someone inquired as to whether I would be doing a Wayland binding. My response was essentially, “since I now have all the basics, there’s no need for wayland, I can just call what it was going to call directly.

And so it goes. One more library in the toolbox, more interesting demos generated.
 


LJIT2libevdev – input device tracking on Linux

Once you start going down into the rabbit hole that is UI on Linux, there seems to be no end. I was wanting to get to the bottom of the stack as it were, because I just want to get raw keyboard and mouse events, and do stuff with them. There is a library that helps you do that call libevdev. Here is the luajit binding to it:

LJIT2libevdev

As it turns out, getting keyboard and mouse activity is highly dependent on what environment you’re in of course. Are you sitting at a Terminal, in which cases ncurses or similar might be your best choice. If you’re looking at a graphics display, then something related to X, or the desktop manager might be appropriate. At the very bottom of it all though is the kernel, and it’s ability to read the keyboard and mouse, and report what it finds up the chain to interested parties. Down there at the very bottom is a userspace library libevdev, which takes care of making the ioctl calls into the kernel to get the raw values. Great! Only caveat is that you need to be setup with the proper permissions to do it because you’re getting ALL of the keyboard and mouse events on the system. Great for key loggers…

Alright, so what does this mean in the context of Lua? Well, libevdev is a straight up C interface to which is a very thin veneer atop the ioctl calls. It would not be that hard to actually replace the ioctl calls with ioctl calls from luajit directly, but the maintainers of libevdev seem to have it covered quite nicely, so ffi calls to the library are sufficient. The library provides some conveniences like tables of strings to convert from the integer values of things to their string name equivalents. These could probably be replaced with the same within lua land, and save the round trips and string conversions. As a low level interface, it does not provide managedment of the various input devices. You can not ask it “give me the logitech mouse”. You have to know which device is the mouse in the first place before you can start asking for input. Similarly, it’s giving you a ton of raw data that you may not be interested in. Things like the sync signals, indicating the end of an event train. Or the skipped data events, so you can catch up if you prefer not to lose any. How to manage it all?

Let’s start at the beginning.

I have found it challenging to find appropriate discussions relating to UI on Linux. Linux has such a long history, and for most of it, the UI subsystems have been evolving and changing in fundamental ways. So, as soon as you find that juicy article dated from 2002, it’s been replaced by something in 2006, and then again in 2012. It also depends on whether you’re talking about X11, Wayland, Qt, Gnome, SDL, terminal, or some other context.

Recently, I was trying to track down the following scenario: I want to start reading input from the attached logitech mouse on my laptop. Not the track pad under my thumbs, and not the little red nubby stick in the middle of the keyboard, but that mouse specifically. How do I do that?

libevdev is the right library to use, but in order to use it, you need a file descriptor for the specific device. The interwebs tell me you simply open up the appropriate /dev/input/eventxxx file and start reading from it? Right. And how do I know which is the correct ‘eventxxx’ file I should be reading from? You can simply do:

$ cat /proc/bus/input

Look at the output, find the device you’re interested in, look at which event it indicates it’s attached to, then go open up that event…

And how do I do that programatically, and consistently such that it will be the same when I move the mouse to a different system? Ah yes, there’s a library for that, and why don’t you just use Python, and…

Or, how about this:

local EVContext = require("EVContext")

local function isLogitech(dev)
    return dev:name():lower():find("logitech") ~= nil
end

local dev = EVContext:getMouse(isLogitech);

assert(dev, "no mouse found")

print(string.format("Input device name: \"%s\"", dev:name()));
print(string.format("Input device ID: bus %#x vendor %#x product %#x\n",
        dev:busType(),
        dev:vendorId(),
        dev:productId()));

-- print out a constant stream of events
for _, ev in dev:events() do
    print(string.format("Event: %s %s %d",
        ev:typeName(),
        ev:codeName(),
        ev:value()));
end

How can I get to this state? First, how about that EVContext thing, and the ‘getMouse()’ call?

EVContext is a convenience class which wraps up all the things in libevdev which aren’t related to a specific instance of a device. So, doing things like iterating over devices, setting the logging level, getting a specific device, etc. Device iteration is a core piece of the puzzle. So, here it is.

function EVContext.devices(self)
    local function dev_iter(param, idx)
        local devname = "/dev/input/event"..tostring(idx);
        local dev, err = EVDevice(devname)

        if not dev then
            return nil; 
        end

	return idx+1, dev
    end

    return dev_iter, self, 0
end

That’s a quick and dirty iterator that will get the job done. Basically, just construct a string of the form ‘/dev/input/eventxxx’, and vary the ‘xxx’ with numbers until you can no longer open up devices. For each one, create a EVDevice object from that name. A bit wasteful, but highly beneficial. Once we can iterate all the input devices, we can leverage this for greater mischief.

Looking back at our code, there was this bit to get the keyboard:

local function isLogitech(dev)
    return dev:name():lower():find("logitech") ~= nil
end

local dev = EVContext:getMouse(isLogitech);

It looks like we could just call the ‘EVContext:getMouse()’ function and be done with it. What’s with the extra ‘isLogitech()’ part? Well, on its own, getMouse() will simply return the first device which reportedly is like a mouse. That code looks like this:

function EVDevice.isLikeMouse(self)
	if (self:hasEventType(EV_REL) and
    	self:hasEventCode(EV_REL, REL_X) and
    	self:hasEventCode(EV_REL, REL_Y) and
    	self:hasEventCode(EV_KEY, BTN_LEFT)) then
    	
    	return true;
    end

    return false;
end

It’s basically saying, a ‘mouse’ is something that has relative movements, at least an x and y axis, and a ‘left’ button. On my laptop, the little mouse nub on the keyboard qualifies, and since it has a lower /dev/input/event number (3), it will be reported before any other mouse on my laptop. So, I need a way to filter on anything that reports to be a mouse, as well as having “logitech” in its name. The code for that is the following from EVContext:

function EVContext.getDevice(self, predicate)
	for _, dev in self:devices() do
		if predicate then
			if predicate(dev) then
				return dev
			end
		else
			return dev
		end
	end

	return nil;
end

function EVContext.getMouse(self, predicate)
	local function isMouse(dev)
		if dev:isLikeMouse() then
			if predicate then
				return predicate(dev);
			end
			
			return true;
		end

		return false;
	end

	return self:getDevice(isMouse);
end

As you can see, ‘EVContext:getDevice()’ takes a predicate (a function that returns true or false). It will iterate through all the devices, applying the predicate to each device in turn. When it finds a device matching the predicate, it will return that device. Of course, you could easily change this to return ALL the devices that match the predicate, but that’s a different story.

The ‘predicate’ in this case is the internal ‘isMouse’ function within ‘getMouse()’, which in turn applies two filters. The first is calling the ‘isLikeMouse()’ function on the device. If that’s satisfied, then it will call the predicate that was passed in, which in this case would be our ‘isLogitech()’ function. If that is satisfied, then the device is returned.

In the end, here’s some output:

Input device name: "Logitech USB Optical Mouse"
Input device ID: bus 0x3 vendor 0x46d product 0xc018

Event: EV_REL REL_Y -1
Event: EV_SYN SYN_REPORT 0
Event: EV_REL REL_Y -1
Event: EV_SYN SYN_REPORT 0
Event: EV_REL REL_X -1
Event: EV_REL REL_Y -2
Event: EV_SYN SYN_REPORT 0
Event: EV_REL REL_Y -1
Event: EV_SYN SYN_REPORT 0
Event: EV_MSC MSC_SCAN 589827
Event: EV_KEY BTN_MIDDLE 1
Event: EV_SYN SYN_REPORT 0
Event: EV_MSC MSC_SCAN 589827
Event: EV_KEY BTN_MIDDLE 0
Event: EV_SYN SYN_REPORT 0
Event: EV_REL REL_Y -2
Event: EV_SYN SYN_REPORT 0
Event: EV_REL REL_X 1
Event: EV_SYN SYN_REPORT 0
Event: EV_REL REL_Y -1

Some relative movements, a middle button press/release, some more movement.

The libevdev library represents some pretty low level stuff, and for the moment it seems to be the ‘correct’ way to deal with system level input device event handling. The LJIT2libevdev binding provide both the fundamental access to the library as well as the higher level device access which is sorely needed in this environment. I’m sure over time it will be beneficial to pull some of the conveniences that libevdev provides directly into the binding, further shrinking the required size of the library. For now though, I am simply happy that I can get my keyboard and mouse events into my application without too much fuss.