Sunday, December 21, 2008

Blogger.com not a success

I’ve had great trouble to make the previous entry to appear properly.  I’m using raw HTML, but somehow when I publish, the server adds a lot of <br/> tags in my text.

I think I’ll switch to wordpress.com (http://fkieviet.wordpress.com).

The size of Java objects

Here's a blog entry / research note / note-to-self that I wrote months ago, but never found the time to publish.

At JavaOne there was an interesting BOF about Efficient XML. It made me wonder about how efficiently the use of DOM in Java is. To find out, I wrote a small program to find out how many bytes of RAM a Java object uses. The program counts how many objects it can allocate before it runs out of memory. Here's the class:

    public static class H {
public H next;
byte[] b = new byte[8];
}

By changing the size of the byte array and running the program again, the size of an H instance can be calculated, as well as the total memory space available for object allocation in the test program. This number can be used in subsequent runs. Running the program on this trivial class on a Windows machine yielded a size of 16 bytes for each allocation.

    public static class H {
public H next;
}
16 bytes

The VM apparently uses 8 byte alignment, because when another reference is added to the class above, the size does not increase, but it does increase in steps of 8 bytes when more members are added:

    public static class H {
public H next;
public H next1;
}
16 bytes

public static class H {
public H next;
public H next1;
public H next2;
}
24 bytes

public static class H {
public H next;
public H next1;
public H next2;
public H next3;
}
24 bytes

public static class H {
public H next;
public H next1;
public H next2;
public H next3;
public H next4;
}
32 bytes

Hence an empty class takes 8 bytes, and each object references takes 4 bytes. This is much better than what I intuitively expected: underneath there should be at least a pointer to a virtual method table (4 bytes), there should be a pointer to this object from some VM list of objects for memory management. I'd expected that list to be a linked list, and I'd expected that there perhaps would be some additional flags for garbage collection, etc. Clearly the VM does it much more efficiently than I would have done.

Similarly, the size of arrays can be measured. A char[] comes down to 12 bytes plus two times the number of characters, rounded up to an 8 byte boundary. The size of a String object is more interesting: it is 32 bytes plus two times the number of characters; the minimum size is 40 bytes. Hence, a string of 3 characters is 48 bytes, and an eight character string is 56 bytes.

Now let's take a look at XML, a DOM structure to be exact. How much does an element with a text node take, as in this XML snippet?

<root>
<ElementName>1000000</ElementName>
<ElementName>1000001</ElementName>
<ElementName>1000002</ElementName>
...

(line breaks and indentation added for clarity; "ElementName" is a string constant)

Measuring this yields 144 bytes per node. Let's add another child element:

<root>
<ElementName>
<SubEl>1000000</SubEl>
</ElementName>
<ElementName>
<SubEl>1000001</SubEl>
</ElementName>
<ElementName>
<SubEl>1000002</SubEl>
</ElementName>
...

Measuring this yields 200 bytes per repeating element. When we add an attribute the size increases again:

<root>
<ElementName>
<SubEl last="false">1000000</SubEl>
</ElementName>
<ElementName last="false">
<SubEl>1000001</SubEl>
</ElementName>
<ElementName last="false">
<SubEl>1000002</SubEl>
</ElementName>
...

This increases the size to 312 bytes per repeating element

Summarizing the results:

new Object()
8 bytes
""
40 bytes
"123"
48 bytes
"12345678"
56 bytes
int data member
4 bytes
byte data member
1 byte
Object reference data member
4 bytes
char[n]
12 + 2*n
<ElementName>1234567</ElementName>
144 bytes
<ElementName>
<SubEl>1234567</SubEl>
</ElementName>
200 bytes
<ElementName>
<SubEl last="false">1234567</SubEl>
</ElementName>
312 bytes

Indeed, DOM is not very efficient with respect to memory usage. The last XML snippet was only 62 single byte characters. The information content is actual a lot less: since ElementName, SubEl, and last are constant, they could be replaced with a reference. Without using schema information, the information content can be encoded with about 25 bytes. With JAXB it can be done with 88 bytes.

Hello world

This is my first blog entry on blogger.com. It's really a continuation of my other blog, the one that I started at http://blogs.sun.com/fkieviet. Why did I start a new blog? Blogs at blogs.sun.com are tied to Sun Microsystems, my employer. With the world in economic turmoil, Sun included, I felt it was a good idea to have a blog not tied to my employer. A few words about me: I'm a software engineer. Building software has been my hobby since the Commodore 64. I'm glad that I've been able to turn this hobby into a full time profession. I currently work as a Senior Staff Engineer at Sun Microsystems. There I work as a lead in the SOA/Business Integration group, and am a contributor to OpenESB, an open source platform for Integration and SOA.

It's been a long time since my last blog entry on blogs.sun.com. Been busy -- work has been piling up, and it's difficult to justify taking time to write a blog when people are waiting on me for work to be finished. (And that's another reason to start this blog instead of continuing on blogs.sun.com.) How I find the time to write this then? I'm on vacation this week!