Thursday, January 8, 2009

Buffered java.io.RandomAccessFile

Now for some total randomness. I'm posting this because I know someday, someone, somewhere out there, will come across this exact same problem and will google for it. The solutions I found via google were meh, and I know I would've liked to have this. So I post this for the children of the future (unless Sun gets its act together).

/**
*  A subclass of RandomAccessFile to enable basic buffering to a byte array
*  Copyright (C) 2009 minddumped.blogspot.com

*  This program is free software: you can redistribute it and/or modify
*  it under the terms of the GNU General Public License as published by
*  the Free Software Foundation, either version 3 of the License, or
*  (at your option) any later version.

*  This program is distributed in the hope that it will be useful,
*  but WITHOUT ANY WARRANTY; without even the implied warranty of
*  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
*  GNU General Public License for more details.

*  You should have received a copy of the GNU General Public License
*  along with this program.  If not, see .
*/

package ed.javatools;

import java.io.RandomAccessFile;
import java.io.FileNotFoundException;
import java.io.File;
import java.io.IOException;

/**
*
* @author minddumped.blogspot.com
*/
public class BufferedRaf extends RandomAccessFile {

 public BufferedRaf(File file, String mode) throws FileNotFoundException {
  super(file, mode);
  bufferlength = 65536;
  bytebuffer = new byte[bufferlength];
  maxread = 0;
  buffpos = 0;
  sb = new StringBuilder("0");
 }

 private byte[] bytebuffer;
 private int bufferlength;
 private int maxread;
 private int buffpos;
 private StringBuilder sb;

 public int getbuffpos() {
  return buffpos;
 }

 @Override
 public int read() throws IOException {
  if (buffpos >= maxread) {
   maxread = readchunk();
   if (maxread == -1) {
    return -1;
   }
  }
  buffpos++;
  return bytebuffer[buffpos - 1] & 0xFF;
 }

 public String readLine2() throws IOException {
  sb.delete(0, sb.length());
  int c = -1;
  boolean eol = false;
  while (!eol) {
   switch (c = read()) {
   case -1:
   case '\n':
    eol = true;
    break;
   case '\r':
    eol = true;
    long cur = getFilePointer();
    if ((read()) != '\n') {
     seek(cur);
    }
    break;
   default:
    sb.append((char) c);
    break;
   }
  }

  if ((c == -1) && (sb.length() == 0)) {
   return null;
  }
  return sb.toString();
 }

 @Override
 public long getFilePointer() throws IOException {
  return super.getFilePointer() + buffpos;
 }

 @Override
 public void seek(long pos) throws IOException {
  if (maxread != -1 && pos < (super.getFilePointer() + maxread) && pos > super.getFilePointer()) {
   Long diff = (pos - super.getFilePointer());
   if (diff < Integer.MAX_VALUE) {
    buffpos = diff.intValue();
   } else {
    throw new IOException("something wrong w/ seek");
   }
  } else {
   buffpos = 0;
   super.seek(pos);
   maxread = readchunk();
  }
 }

 private int readchunk() throws IOException {
  long pos = super.getFilePointer() + buffpos;
  super.seek(pos);
  int read = super.read(bytebuffer);
  super.seek(pos);
  buffpos = 0;
  return read;
 }
}

Some notes:
1) This is only for buffered reading.

2) Most read type methods in RAF end up calling read(), so read() is the only method that really needs to be overridden. The exceptions are the read(byte b[]), read(byte b[], int off, int len), readFully(byte b[]) and readFully(byte b[], int off, int len) methods. They end up calling a private readBytes method. It's probably not a big deal, since in those methods you're asking for a byte array anyway, but it will throw off the buffpos file pointer!

3) If you're calling lots of readLine(), the readLine() in the original RAF sucks. It constantly creates a new StringBuffer object which is unnecessary and can really slow things down. In my class, I reuse a StringBuilder and just delete the contents. Unfortunately, for some dumbass reason, readLine() is final, so my method is readLine2(). From a simple test of reading through a file using readline, I find the performance is nearly the same as using BufferedReader.

4) people on the Sun Forums are total fucking douchebags. Don't ever go there looking for help.

7 comments:

  1. Hey,
    I needed this! Can you clarify your licensing?

    Thanks you for thinking about the future.

    -guy from the future

    ReplyDelete
  2. Hi jshook, I've put it under the GNU GPL v3 license, so feel free to use or modify it. Cheers!

    ReplyDelete
  3. I created sample application that logs text data at rate of 1 line per 10 ms (i line has 300 characters) and need to monitor it in real time. I used your class and found it is taking 50% of CPU. Then I used RandomAccessFile.readFully using buffer size of 4096 and found it is taking 3-4% CPU. Its huge difference! Is there anyway you can optimize your class? Thanks

    ReplyDelete
  4. Hey Rahul,
    What are you calling and how?

    ReplyDelete
  5. Update for you. This doesn't work for Extended ASCII (ISO-8859-1). It's very simple to fix though. Change the line

    return bytebuffer[buffpos - 1];

    into

    return bytebuffer[buffpos - 1] & 0xFF;

    The reason is because a byte is a signed primitive, but you basically want an unsigned int for the character representation.

    ReplyDelete