MattHicks.com: January 2008

java

personal

programming

In Java, Beans are an important part of good programming practices and good Object-Oriented coding. There has been a lot of discussion recently about monitoring changes on beans other such functionality that is not inherently built into Java. With my Magic Beans project I've already done a lot of things along these lines, but recently I've been thinking more and more about how powerful beans could be if they were to extend outside the boundaries of the norm. What if you could deal with Beans like you do with Connections in SQL? What if you could create a transaction that would give you a new copy of a Bean and when you are done with it you can "commit" that bean back to the original? What if you had transactional monitoring of beans that goes beyond the normal Observer/Observable concepts?

I keep coming back to beans as one of my biggest stumbling blocks for writing good and efficient code. Inevitably I'm going to have to revisit Magic Beans and finally create the end-all-be-all for Bean handling.

Multithreaded Unzip?

java

programming

A co-worker and I were discussing the possibilities of performance gain doing extraction of ZIP files using multiple threads rather than the typically single-threading extraction that the majority (if not all) mainstream archive extraction utilities use and given Java's great ZIP file support built-in it seemed rather trivial to give this a shot, thus, UberZip was born.

UberZip is my simple little sample Java command-line application to extract files where you can specify the number of threads to utilize during extraction. My #1 biggest headache is extracting Eclipse from the ZIP file you get, so I decided that would be an ideal test of my little program. I downloaded the Eclipse J2EE bundle (3.1.1), which is a happy 132 meg with thousands of files.

I need to test further with a machine that actually can better utilize multiple threads, but here are the stats for my AMD Athlon 64 3200+ (Hyperthreaded, so I actually get a very slight benefit) running on Windows XP:

14.7 seconds with 1 thread

13.4 seconds with 5 threads

11.6 seconds with 30 threads

Anything above 30 threads seems to actually create more overhead than gain.

Now, to be fair I matched this up against the fastest unzipper I am aware of, 7zip (http://www.7-zip.org). It took right at about 14 seconds to unzip the file inside 7zip, which seems to fall nearly perfectly in-line with the single-threaded execution of UberZip.

For those of you that would be interested in taking a look at this very simple example, I have committed the source to my public repository:

svn://captiveimagination.com/public/uberzip/trunk

Further, if you'd simply rather download the JAR or EXE I have uploaded copies of it for those of you that would like to unzip files "uber" fast. :)

uberzip.exe

uberzip.jar

After taking this to try out on my work machine (Dual Xeon dual-core 3.2 GHz processors + 4 gig of memory = 8 processors in Windows) I got the following stats:

7zip - 25 seconds
UberZip (1 thread) - 22.17 seconds
UberZip (5 threads) - 19.5 seconds
UberZip (30 threads) - 21.6 seconds

Oddly it would seem about 5 threads is the "sweet-spot" for this machine. Though some of the results are a bit strange and it would seem on this machine the hard drives don't perform quite as well as my home machine it is obvious that at some level of configuration multithreading you can gain some good performance on the single-threaded applications out there.

Perhaps someone else will find this information useful and turn UberZip into a product people can use. ;)

Update (2017.05.31)

I've completely re-written this functionality in Scala and posted it on GitHub: https://github.com/outr/uberzip. It's faster than ever and more cleanly written. Tested on the latest Eclipse ZIP (320 meg) it can unzip 2,981 files in 0.73 seconds. Doing the same test on the same machine with Linux unzip took 1.7 seconds.

MattHicks.com

Programming on the Edge

Smarter Beans?

Multithreaded Unzip?

Update (2017.05.31)

Labels

MattHicks.com

Blog Archive

Popular Posts

Pages

Blogger templates

Blogger news

Blogger templates

Blogroll

About