Spelunking Windows – Exploring the file system

I’m working on programs that generally make your stuff available to you “from any device anywhere”. While some of the work is of the generic internet high performance server variety, some of it is much more esoteric. A lot of your “stuff” is in files located on your machine. Widows has a wild array of filesytem related APIs, and it can be a daunting task to try and wrap your head around it to achieve some given task.

I set out a task for myself to turn the file system into a relatively easy to query “database” to get lists of files that meet certain criteria based on their various attributes. Some queries that I’m interested in:

List all directories on my machine
List all files that contain “.lua”
List all files which are hidden
List all system files
List all compressed files

Of course, using the standard search capabilities that are built into Windows, you can perform some of these tasks. This is different from the generally useful ‘find’ command of the command shell as well, as that command is interested in searching the contents of the file.

Yes, there are numerous tools and ways to perform these tasks, so what I am exploring is how you would actually create such a tool from scratch if you were so inclined.

I will begin at the beginning. Exploring the file system doesn’t require that much. Just a few data structures, and 3 function calls. The key components are the following:
(Full Source Available here: https://github.com/Wiladams/TINN/tree/master/tests FileSystemItem.lua, test_filesystem.lua)

ffi.cdef[[
typedef struct _WIN32_FIND_DATAW {
    DWORD dwFileAttributes;
    FILETIME ftCreationTime;
    FILETIME ftLastAccessTime;
    FILETIME ftLastWriteTime;
    DWORD nFileSizeHigh;
    DWORD nFileSizeLow;
    DWORD dwReserved0;
    DWORD dwReserved1;
    WCHAR  cFileName[ MAX_PATH ];
    WCHAR  cAlternateFileName[ 14 ];
} WIN32_FIND_DATAW, *PWIN32_FIND_DATAW, *LPWIN32_FIND_DATAW;

HANDLE
FindFirstFileExW(
  LPCWSTR lpFileName,
  FINDEX_INFO_LEVELS fInfoLevelId,
  LPVOID lpFindFileData,
  FINDEX_SEARCH_OPS fSearchOp,
  LPVOID lpSearchFilter,
  DWORD dwAdditionalFlags);

BOOL
FindNextFileW(HANDLE hFindFile,
  LPWIN32_FIND_DATAW lpFindFileData);

BOOL
FindClose(HANDLE hFindFile);
]]

The three functions; FindFirstFileExW, FindNextFileW, FindClose; combine to form an ‘iteration’ set. The iteration begins with FindFirstFileExW, which also returns the first results, and continues with FindNextFileW. After all is done, you finish up with FindClose, to recover the system resources that were allocated for this little search. Although there are flags to be set to enhance the search capabilitie, I don’t actually want to use them as I can create much more interesting search filters from the Lua side.

First things first though. The ‘handle’ that is created when you call ‘FindFirstFileExW’ must be cleaned up with a matching ‘FindClose’. If you don’t do this, you’ll end up with a leaked handle, which is essentially a resource leak. You don’t want that, so a ‘smart pointer’ is created to deal with the lifetime of that thing.

ffi.cdef[[
typedef struct {
  HANDLE Handle;
} FsFindFileHandle;
]]
local FsFindFileHandle = ffi.typeof("FsFindFileHandle");
local FsFindFileHandle_mt = {
  __gc = function(self)
    core_file.FindClose(self.Handle);
  end,

  __index = {
    isValid = function(self)
      return self.Handle ~= INVALID_HANDLE_VALUE;
    end,
  },
};
ffi.metatype(FsFindFileHandle, FsFindFileHandle_mt);

This little wrapper ensures that whenever the handle is no longer being referenced, it will automatically get cleaned up because the __gc method will call ‘FindClose()’, which is exactly what we want. Here is how it can be used:

local rawHandle = core_file.FindFirstFileExW(lpFileName,
  fInfoLevelId,
  lpFindFileData,
  fSearchOp,
  lpSearchFilter,
  dwAdditionalFlags);

local handle = FsFindFileHandle(rawHandle);

Alright, so there’s a nicely wrapped handle to the beginning of the iterator. The full iterator looks like this:

-- Iterate over the subitems this item might contain
FileSystemItem.items = function(self, pattern)
	pattern = pattern or self:getFullPath().."\\*";
	local lpFileName = core_string.toUnicode(pattern);
	--local fInfoLevelId = ffi.C.FindExInfoStandard;
	local fInfoLevelId = ffi.C.FindExInfoBasic;
	local lpFindFileData = ffi.new("WIN32_FIND_DATAW");
	local fSearchOp = ffi.C.FindExSearchNameMatch;
	local lpSearchFilter = nil;
	local dwAdditionalFlags = 0;

	local rawHandle = core_file.FindFirstFileExW(lpFileName,
		fInfoLevelId,
		lpFindFileData,
		fSearchOp,
		lpSearchFilter,
		dwAdditionalFlags);

	local handle = FsFindFileHandle(rawHandle);
	local firstone = true;

	local closure = function()
		if not handle:isValid() then 
			return nil;
		end

		if firstone then
			firstone = false;
			return FileSystemItem({
				Parent = self;
				Attributes = lpFindFileData.dwFileAttributes;
				Name = core_string.toAnsi(lpFindFileData.cFileName);
				Size = (lpFindFileData.nFileSizeHigh * (MAXDWORD+1)) + lpFindFileData.nFileSizeLow;
				});
		end

		local status = core_file.FindNextFileW(handle.Handle, lpFindFileData);

		if status == 0 then
			return nil;
		end

		return FileSystemItem({
				Parent = self;
				Attributes = lpFindFileData.dwFileAttributes;
				Name = core_string.toAnsi(lpFindFileData.cFileName);
				});

	end
	
	return closure;
end

Refer to the full source to see it in context.

Here is how I would use it:

local depthQuery = {}

depthQuery.traverseItems = function(starting, indentation, filterfunc)
  indentation = indentation or "";

  starting = starting or FileSystemItem({Name="c:"});

  for item in starting:items() do
    if filterfunc then
      if filterfunc(item) then
        if item.Name ~= '.' and item.Name ~= ".." then
          io.write(indentation, item.Name, '\n');
        end
      end
    else
      if item.Name ~= '.' and item.Name ~= ".." then
        io.write(indentation, item.Name, '\n');
      end
    end
		
    if item:isDirectory() and item.Name ~= "." and item.Name ~= ".." then
      depthQuery.traverseItems(item, indentation.."  ", filterfunc);
    end
  end
end

-- Iterate all files/directories on 'c:' drive
depthQuery.traverseItems(FileSystemItem({Name="c:"}));

Scanning the entire c: drive on my recently new lenovo X1 carbon takes about 6 seconds, and there are a few hundred thousand files on it. Most of the time is spent on the string creation and io. On smaller directory searches, the scan is essentially instantaneous.

With these basics in hand, I can now do some more interesting queries.

local function passLua(item)
  return item.Name:find(".lua", 1, true);
end

depthQuery.traverseItems(FileSystemItem({Name="c:"}), "", passLua);

In this case, I want to find all the files on my system that have ‘.lua’ in their name. The ‘passLua()’ function is very simple, just doing a string compare. Of course, you have the full power of Lua and any libraries at your disposal. You could even open up the file if you like, and read the contents and decide whether you wanted to pass it along or not. Your filter just returns ‘true’ to pass it along, or ‘false’ to block it.

The FileSystemItem object has some of the file’s properties readily available, so they can be a part of the filtering as well. If I wanted a list of all the directories on my ‘c:’ drive, I would do the following:

local function passDirectory(item)
  return item:isDirectory();
end

depthQuery.traverseItems(FileSystemItem({Name="c:"}), "", passDirectory);

Of course, if you were using the .net frameworks, or Java, or Python, or any number of mature libraries in the world, you’d be thinking this was very simple work indeed, and what’s all the fuss? No fuss, really. The key here is showing how simple it really is to do these things, and create your own. I find it useful to do it in Lua because it’s easier than trying to do it with some more involved environments. Once the basic wrappers are in place, spelunking around becomes much easier.

Windows is a vast landscape of mature APIs which have evolved over time to meet the needs of diverse consumers over the years. The APIs are raw, and at times daunting. With a little bit of wrapping though, exploring them becomes much easier. In this particular case, by doing everything from the low level ffi to the higher level iterator, I’ve put in place some rope ladders, pitons, and other core exploration equipment. Now I can reap the benefits and do some more exploration with relative ease and safety.

Advertisements


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s