Optimize At the Right Time and LevelPosted: October 1, 2013
It takes a long time to do the simplest things correctly. You want to receive some bytes on a socket?
int recv(SOCKET s,char *buf,int len,int flags);
No problem right? How about that socket? How did that get setup? Is it right for low latency, big buffers, small buffers? what? And what about that buffer? and lord knows which are the best flags to use? First question I guess is, ‘for what purpose’?
For the longest time, I have known that my IP sockets were not the most performant. Through rounds of trying to get low level networking, all the way up through http handling working correctly, I deliberately made some things extremely brain dead simple.
One of those things was my readline() function. this is the workhorse of http processing. Previously, I would literally read a single character at a time, looking for those line terminators. Which meant making that recv() call a bazillion times during normal http processing.
Well, it has worked great, reduces the surface area to help flesh out various bugs, and allowed me to finally implement the IO Completion Port based networking stack that I now have. It works well enough. But, it’s time to optimize. I am now about to add various types of streams such as compressed, encrypted, and the like. For that I need to actually buffer up chunks at a time, not read single characters at a time.
So, I finally optimized this code.
function IOCPNetStream:refillReadingBuffer() print("IOCPNetStream:RefillReadingBuffer(): ",self.ReadingBuffer:BytesReadyToBeRead()); -- if the buffer already has data in it -- then just return the number of bytes that -- are currently ready to be read. if self.ReadingBuffer:BytesReadyToBeRead() > 0 then return self.ReadingBuffer:BytesReadyToBeRead(); end -- If there are currently no bytes ready, then we need -- to refill the buffer from the source local err local bytesread bytesread, err = self.Socket:receive(self.ReadingBuffer.Buffer, self.ReadingBuffer.Length) print("-- LOADED BYTES: ", bytesread, err); if not bytesread then return false, err; end if bytesread == 0 then return false, "eof" end self.ReadingBuffer:Reset() self.ReadingBuffer.BytesWritten = bytesread return bytesread end
Basically, whenever some bytes are needed within the stream, the refillReadingBuffer() function is called. If there are still bytes sitting in the buffer, then those bytes are used. If it’s empty, then new bytes are read in. The buffer can be whatever size you want, so if your app is better off reading 1500 bytes at a time, then make it so. If you find performance is best at 4096 bytes per read, then set it up that way.
Now that pre-fetching is occuring, all operations related to reading bytes from the source MUST go through this buffer, or bytes will be missed. So, the readLine() function looks like this now:
function IOCPNetStream:readByte() -- First see if we can get a byte out of the -- Reading buffer local abyte,err = self.ReadingBuffer:ReadByte() if abyte then return abyte end -- If we did not get a byte out of the reading buffer -- try refilling the buffer, and try again local bytesread, err = self:refillReadingBuffer() if bytesread then abyte, err = self.ReadingBuffer:ReadByte() return abyte, err end -- If there was an error -- then return that error immediately print("-- IOCPNetStream:readByte, ERROR: ", err) return false, err end function IOCPNetStream:readOneLine(buffer, size) local nchars = 0; local ptr = ffi.cast("uint8_t *", buff); local bytesread local byteread local err while nchars < size do byteread, err = self:readByte(); if not byteread then -- err is either "eof" or some other socket error break else if byteread == 0 then break elseif byteread == LF then --io.write("LF]\n") break elseif byteread == CR then -- swallow it and continue on --io.write("[CR") else -- Just a regular character, so save it buffer[nchars] = byteread nchars = nchars+1 end end end if err and err ~= "eof" then return nil, err end return nchars end
The readLine() is still reading a byte at a time, but this time, instead of going to the underlying socket directly, it’s calling the stream’s version of readByte(), which in turn is reading a single byte from the pre-fetched buffer. That’s a lot faster than making the recv() system call, and when the buffer runs out of bytes, it will be automatically refilled, until there is a socket error.
Well, that’s just great. The performance boost is one thing, but there is another benefit. Now that I am reading in controllable chunks at a time, I could up the size to say 16K for the first chunk read for an http request. That would likely get all the data for whatever headers there are included in the buffer up front. From there I could do simple pointer offset/size objects, instead of creating actual strings for every line (creates copies). That would be a great optimization.
I can wait for further optimizations. Getting the pre-fetch working, and continuing to work with IOCP is a more important optimization at this time.
And so it goes. Don’t optimize too early until you really understand the performance characteristics of your system. Then, only belatedly, do what minimal amount you can to achieve the specific goal you are trying to achieve before moving on.