Toward more efficient ArrayLists  
Author Message
d94-rol





PostPosted: 2003-6-28 17:55:00 Top

java-programmer, Toward more efficient ArrayLists Roedy Green (email***@***.com) wrote:
: IIRC Arraylists are about 10 times slower than plain arrays. This is
: quite a penalty to pay for not knowing ahead of time how many items
: you will have.

Can you give a pointer to a benchmark that shows this?
Or can you build a benchmark that shows this?

Can you give proof that this really is your hot spot? Most guesses
about what to optimize is wrong.

Anyway I find it hard to believe that arraylist is 10 times slower
since the methods (get and set) does something like
{ RangeCheck(index); return elementData[index]; }.
Ok the range check is unneccasry, it will be checked by the jvm also.
If you use add a lot then it will check the size of the array and
expand if neccessary. If you know in advance how many objects you will
store this should not be a problem (you do create the list with the
size), if you dont know in advance it will be hard to write something
that is a magnitude faster...

/robo
 
xarax





PostPosted: 2003-6-28 23:16:00 Top

java-programmer >> Toward more efficient ArrayLists email***@***.com (Robert Olofsson) wrote in message news:<bdjol7$274$email***@***.com>...
> Roedy Green (email***@***.com) wrote:
> : IIRC Arraylists are about 10 times slower than plain arrays. This is
> : quite a penalty to pay for not knowing ahead of time how many items
> : you will have.
>
> Can you give a pointer to a benchmark that shows this?
> Or can you build a benchmark that shows this?
>
> Can you give proof that this really is your hot spot? Most guesses
> about what to optimize is wrong.
>
> Anyway I find it hard to believe that arraylist is 10 times slower
> since the methods (get and set) does something like
> { RangeCheck(index); return elementData[index]; }.
> Ok the range check is unneccasry, it will be checked by the jvm also.
/snip/

wrong. the range check IS necessary, because elementData[]
can have excess space on the end. A successful indexing of
elementData doesn't always mean that valid data was accessed.
The range check verifies that the index is within the
range of valid data elements.

As to the other remarks about ArrayList performance, I
agree with Robert. ArrayList is one of the most efficient
implementations of the List interface when performing
random get(). The internal array grows by a capacity factor
so that adding many elements to the ArrayList will amortize
the growth cost. Removing elements from high index to low
index is much faster than removing elements from low index
to high index, using the remove(int) method.

Having said that, the remove(Object) method is probably
one of the slowest implementations, because it devolves
to the abstract parent class implementation that uses an
iterator to search for the object. A better ArrayList
implementation would override remove(Object), and a few
other inherited method implementations, for better
performance.

The remarks about casting will be addressed with generic java
in version 1.5, for those programmers that don't like to
write casts.
 
Roedy Green





PostPosted: 2003-7-1 7:57:00 Top

java-programmer >> Toward more efficient ArrayLists On 28 Jun 2003 08:16:27 -0700, email***@***.com (xarax) wrote or quoted
:

>As to the other remarks about ArrayList performance, I
>agree with Robert.

The 10 time probably refers to Vector rather than ArrayList. I can't
remember where I read this or the details. I just remember the number
10. The penalty should be less now the synchronized has less penalty.

 
 
Adam Maass





PostPosted: 2003-7-1 22:16:00 Top

java-programmer >> Toward more efficient ArrayLists
"Roedy Green" <email***@***.com> wrote:
> On 28 Jun 2003 08:16:27 -0700, email***@***.com (xarax) wrote or quoted:
>
> >As to the other remarks about ArrayList performance, I
> >agree with Robert.
>
> The 10 time probably refers to Vector rather than ArrayList. I can't
> remember where I read this or the details. I just remember the number
> 10. The penalty should be less now the synchronized has less penalty.
>

From personal experience in 1.0 and 1.1 JVMs (yes, it's been a long time),
"synchronized" had a 2-5x penalty.

It's probably better now, but I haven't bothered to check.

-- Adam Maass


 
 
Jon Skeet





PostPosted: 2003-7-1 23:19:00 Top

java-programmer >> Toward more efficient ArrayLists Adam Maass <email***@***.com> wrote:
> > The 10 time probably refers to Vector rather than ArrayList. I can't
> > remember where I read this or the details. I just remember the number
> > 10. The penalty should be less now the synchronized has less penalty.
> >
>
> From personal experience in 1.0 and 1.1 JVMs (yes, it's been a long time),
> "synchronized" had a 2-5x penalty.
>
> It's probably better now, but I haven't bothered to check.

And this is where performance worries *really* need testing. Here we
have one vaguely remembered figure, and one which is irrelevant due to
the world having moved on.

Now, what operation are we interested in? Reading or writing? Let's
assume reading, for the moment. I've tested the following situations:

o Fetching elements from an ArrayList
o Fetching elements from a Vector
o Fetching elements from a String array (no casting)
o Fetching elements from an Object array and casting

For each iteration I took the length of the string element and added
that to a running total, so that the VM couldn't optimise away the
element access.

Here's the JBench (http://www.pobox.com/~skeet/java/jbench) code for
the ArrayList test. The other classes are incredibly similar.

import uk.org.skeet.jbench.tasks.*;
import uk.org.skeet.jbench.*;
import java.util.*;

public class ArrayListTest extends SimpleTask
{
private ArrayList list;
private int iterations;
private long result;

public void setIterations (int iterations)
{
this.iterations = iterations;
}

public void prepareTest()
{
int size = (int) getSize();
list = new ArrayList (size);
for (int i=0; i < size; i++)
list.add ("hello");
}

public void runTest()
{
long totalLength=0;

int size = (int) getSize();

for (int i=0; i < iterations; i++)
{
for (int j=0; j < size; j++)
totalLength += ((String)list.get(j)).length();
}
result = totalLength;
}

public void checkResults() throws TaskException
{
if (result != 5*(int)getSize()*iterations)
throw new TaskException ("Invalid result");
}
}

Here's the properties file:
global.size = 10000
global.iterations = 10000

task.1.class = ArrayListTest
task.2.class = VectorTest
task.3.class = ArrayTestNoCast
task.4.class = ArrayTestWithCast

And here are the results:

--- JBench log starts Tue Jul 01 16:05:16 BST 2003 ---
General Configuration:
Number of test runs per task: 5
Abort on task configuration error: no
Excluding worst result and best result from stats.
System information:
VM: Java HotSpot(TM) Client VM;1.4.2-b28;Sun Microsystems Inc.
OS: Windows XP;x86;5.1
Timer: uk.org.skeet.jbench.ClockTimer; granularity: 10ms
Task configuration:
Configured 1: ArrayListTest; size=10,000
Configured 2: VectorTest; size=10,000
Configured 3: ArrayTestNoCast; size=10,000
Configured 4: ArrayTestWithCast; size=10,000
4 tasks successfully configured
----------
ArrayListTest; size=10,000
Testing.....All tests passed. Results:
8903ms
8913ms
8883ms
8993ms [excluded]
8863ms [excluded]
Mean=8899; variance=156; standard deviation=12
----------
VectorTest; size=10,000
Testing.....All tests passed. Results:
9313ms [excluded]
9144ms
9093ms
9153ms
9053ms [excluded]
Mean=9130; variance=698; standard deviation=26
----------
ArrayTestNoCast; size=10,000
Testing.....All tests passed. Results:
2664ms
2684ms
2654ms [excluded]
2674ms
2684ms [excluded]
Mean=2674; variance=66; standard deviation=8
----------
ArrayTestWithCast; size=10,000
Testing.....All tests passed. Results:
3816ms
3845ms
3826ms
3866ms [excluded]
3815ms [excluded]
Mean=3829; variance=144; standard deviation=12
--- JBench log ends Tue Jul 01 16:07:21 BST 2003 ---


So:

o Vector and ArrayList are roughly equal - synchronization is pretty
cheap now

o When the array is of the right type, accessing an element is at least
3-4x faster than using an ArrayList. (Don't forget we have the overhead
of calling String.length() 100 million times here.)

o When casting is required, the advantage falls quiet significantly -
nearly a third of the time spent by the ArrayTestWithCast test seemed
to be taken up casting.


Now, whether or not the difference between ArrayList and a plain array
is actually *significant* or not will depend on what you're planning to
do with the element - I chose the simplest thing I could think of in
order to minimise the impact on the results. (Taking the call out makes
a big difference; adding more calls doesn't change things much. Make of
that what you will...)

Personally I will almost always use a List when I've got something
which is probably going to be resized. The performance difference just
isn't worth the candle. Of course, I use an array for large sequences
of primitives - that's a different kettle of fish.

Anyway, still just microbenchmarking, but hopefully more useful than
the previous figures...

--
Jon Skeet - <email***@***.com>
http://www.pobox.com/~skeet/
If replying to the group, please do not mail me too
 
 
Jon Skeet





PostPosted: 2003-7-2 2:31:00 Top

java-programmer >> Toward more efficient ArrayLists Roedy Green <email***@***.com> wrote:
> This is good news. The generics should make quite a speed boost
> getting rid of needless casts.

They don't though - they only remove them from the source code, not the
byte code, as I understand it.

--
Jon Skeet - <email***@***.com>
http://www.pobox.com/~skeet/
If replying to the group, please do not mail me too
 
 
timjowers





PostPosted: 2003-7-3 20:13:00 Top

java-programmer >> Toward more efficient ArrayLists Jon Skeet <email***@***.com> wrote in message news:<email***@***.com>...
> Adam Maass <email***@***.com> wrote:
> > > The 10 time probably refers to Vector rather than ArrayList. I can't
> > > remember where I read this or the details. I just remember the number
> > > 10. The penalty should be less now the synchronized has less penalty.
> > >

JBench did not compile due to needing some outside classes. I wrote a
quick test. The whole issue is really how the arrays are used and
maybe the best "test" would be one that allows this to be configured
so we can test based on the expected/measured usage patterns. I heard
a talk on a GC that used statistical analysis...OT.

Results:

No deletes: array timing 681
No deletes: array w/Exc timing 682
No deletes: ArrayList w/Exc timing 782
No deletes: ArrayList timing 793
No deletes: Vector w/Exc timing 921
No deletes: Vector timing 1062
No deletes: ArrayList synched w/Exc timing 963
Deletes: array timing 704
Deletes: array w/Exc timing 725
Deletes: ArrayList w/Exc timing 3572
Deletes: ArrayList timing 3603
Deletes: Vector w/Exc timing 3726
Deletes: Vector timing 3869
Deletes: ArrayList synched w/Exc timing 4876
Interspered: array timing 656
Interspered: array w/Exc timing 671
Interspered: ArrayList w/Exc timing 801
Interspered: ArrayList timing 787
Interspered: Vector w/Exc timing 975
Interspered: Vector timing 1074
Interspered: ArrayList synched w/Exc timing 1043
Highlights:
1) Synchronization is a penalty up to about 4x
2) Synchronization is no real penalty for read-only/read-mostly
arrays.
3) The recommended synchronized array Collections.synchronizedList(new
ArrayList()); may be even slower than Vector.

Notes:
1. w/Exc means it used the old C++ way of doing dynamic arrays... when
an exception occurs then the array is grown.
2. Some analysis of casting needs to be added such as highlighted with
the JBench test. I preferred to use native types if possible as this
would be standard coding practice.
3. The test probably has some sources of errors.

My free test below, Tim

/**
* all code is in this one class. run main to test.
* Test to see which array performs the best for steady state size and
just doing lookups.
* todo... see how performance of basic arrays differs if Float is
stored rather than float.
* More like the Java Array class but not how you'd write real code.
* Also, test with collections and other data structures that may
handle sparse lookups better.
*
* @author twj
* @version 1
* @date 5:19:39 PM
*/
import java.util.*;

public class ArrayTest {

List list = Collections.synchronizedList(new ArrayList());
ArrayList alSyn = new ArrayList();
Vector vec = new Vector();
ArrayList al = new ArrayList();
float af[] = new float[0];
int iaf=0; // count of how many elements are valid in the array
int type=0;
private static final int TYPE_array = 0;
private static final int TYPE_array_unmanaged = 1;
private static final int TYPE_ArrayList = 2;
private static final int TYPE_ArrayList_safer = 3;
private static final int TYPE_Vector = 4;
private static final int TYPE_Vector_safer = 5;
private static final int TYPE_List_synched = 6;
private static final int TYPE_COUNT = 7;
private static String TypesHuman[] = {"array","array w/Exc",
"ArrayList w/Exc", "ArrayList", "Vector w/Exc", "Vector", "ArrayList
synched w/Exc"};

private void arraycopy( float []a1, int start, float []a2, int num)
{ for( int i=start; i<(num+start); i++ )
a2[i-start] = a1[i];
}
public void grow( int size)
{
switch( type )
{ case( TYPE_array ) :
if( iaf < size )
{ float anew[] = new float[size];
arraycopy(af,0,anew,iaf );
af = anew;
while( iaf<af.length )
af[iaf++] = (float) Math.random();
}
break;
case( TYPE_array_unmanaged ) : // nothing to do... if a lookup
occurs then an exception will trigger growth
case( TYPE_ArrayList ) : // nothing to do as ArrayList size is
managed internally
case( TYPE_ArrayList_safer ) :
case( TYPE_Vector ) :
case( TYPE_Vector_safer ) :
case( TYPE_List_synched ) :
break;
default:
System.err.println( "bad type " + type );
break;
}
}

public float delete( int times, int size )
{ float fsum=0F;
for( int t=0; t< times; t++ )
{ int index = (int)((float)size * Math.random());
// do the lookup
switch( type )
{ case( TYPE_array ) :
af[index]=-1F;
break;
case( TYPE_array_unmanaged ) :
try{
af[index]=-1F;
} catch( Exception e )
{ // past end so already deleted so ignore it
}
break;
case( TYPE_ArrayList ) :
case( TYPE_ArrayList_safer ) :
try{
al.remove(index); // could be array out of bounds excpetion
} catch( Exception e)
{}
break;
case( TYPE_Vector ) :
case( TYPE_Vector_safer ) :
try{
vec.remove(index);
} catch( Exception e)
{}
break;
case( TYPE_List_synched ) :
try{
list.remove(index); // could be array out of bounds excpetion
} catch( Exception e)
{}
break;
default:
System.err.println( "bad type " + type );
System.exit(-1);
break;
}

}
return fsum;
}

public float lookup( int times, int size )
{ float fsum=0F;
for( int t=0; t< times; t++ )
{ int index = (int)((float)size * Math.random());
// do the lookup
switch( type )
{ case( TYPE_array ) :
float fvalid = af[index];
if( fvalid >= 0F )
fsum += fvalid;
break;
case( TYPE_array_unmanaged ) :
try{
float fvalid2 = af[index];
if( fvalid2 >= 0F )
fsum += fvalid2;
} catch( Exception e )
{ // past end so increase size
float anew[] = new float[index+1];
arraycopy(af,0,anew,iaf );
af = anew;
while( iaf<af.length )
af[iaf++] = (float) Math.random();
fsum += af[index];
}
break;
case( TYPE_ArrayList ) :
Float b=null;
try{
b = (Float) al.get(index); // could be array out of bounds
excpetion
} catch( Exception e)
{
if( b == null )
for( int ial=al.size(); ial<=index; ial++)
al.add( new Float( (float) Math.random() ) );
b = (Float) al.get(index);
}
if( b != null )
fsum += b.floatValue();
break;
case( TYPE_ArrayList_safer ) :
Float b2=null;
if( al.size() <= index )
for( int ial2=al.size(); ial2<=index; ial2++)
al.add( new Float( (float) Math.random() ) );
b = (Float) al.get(index);
if( b != null )
fsum += b.floatValue();
break;
case( TYPE_Vector ) :
Float bv=null;
try{
bv = (Float) vec.get(index);
} catch( Exception e)
{
if( bv == null )
for( int ial=vec.size(); ial<=index; ial++)
vec.add( new Float( (float) Math.random() ) );
bv = (Float) vec.get(index);
}
if( bv != null )
fsum += bv.floatValue();
break;
case( TYPE_Vector_safer ) :
Float bv2=null;
if( vec.size() <= index )
for( int ial2=vec.size(); ial2<=index; ial2++)
vec.add( new Float( (float) Math.random() ) );
bv2 = (Float) vec.get(index);
if( bv2 != null )
fsum += bv2.floatValue();
break;
case( TYPE_List_synched ) :
Float b3=null;
try{
b3 = (Float) list.get(index); // could be array out of bounds
excpetion
} catch( Exception e)
{
if( b3 == null )
for( int ial=list.size(); ial<=index; ial++)
list.add( new Float( (float) Math.random() ) );
b3 = (Float) list.get(index);
}
if( b3 != null )
fsum += b3.floatValue();
break;
default:
System.err.println( "bad type " + type );
System.exit(-1);
break;
}

}
return fsum;
}


public static void main(String[] args) {
ArrayTest at = new ArrayTest();
// determine the testing suite
int icreates[] = {1024,30,-200,8096,-500,20,300,920,1,5}; // 10
times
int ilookups = 102400; // assume 100 to 1 or so lookups
int iAvg[] = new int[TYPE_COUNT];
for( int dodeletes=0; dodeletes<2; dodeletes++, System.out.println(
"\t\t-------------------------------" ) )
{ for(int iRuns=0; iRuns < 6; iRuns++ )
for( int type=0; type<TYPE_COUNT; type++ )
{ at.type = type;
int cursize=0;
at.iaf=0; // for the two vanilla arrays
at.vec = new Vector();
at.al = new ArrayList();
at.af = new float[0];
at.list = Collections.synchronizedList(new ArrayList());
int ideletes= ilookups / 20; // 5% as many
long time = System.currentTimeMillis();
for( int c=0; c<icreates.length; c++ )
{ cursize += icreates[c];
at.grow(cursize);
at.lookup( ilookups, cursize );
if( dodeletes>0 )
{ at.delete( ideletes, cursize );
}
}
time = System.currentTimeMillis() - time;
iAvg[type] = iAvg[type]*iRuns + (int) time;
iAvg[type] /= (iRuns+1);
System.out.println( "Type " + type + " : " + time );
}
for( int type=0; type<TYPE_COUNT; type++ )
System.out.println( ((dodeletes==0)?"No deletes":"Deletes") + ": "
+ TypesHuman[type] + " timing " + iAvg[type] );
}

// Interspersed: redo but with creates, lookups, and deletes
interspersed
for(int iRuns=0; iRuns < 6; iRuns++ )
for( int type=0; type<TYPE_COUNT; type++ )
{ at.type = type;
int cursize=0;
at.iaf=0; // for the two vanilla arrays
at.vec = new Vector();
at.al = new ArrayList();
at.af = new float[0];
at.list = Collections.synchronizedList(new ArrayList());
int ideletes= ilookups / 20; // 5% as many
long time = System.currentTimeMillis();
for( int c=0; c<icreates.length; c++ )
{ cursize += icreates[c];
// since ilookups is the larger than cursize or ideletes the use
// it to control the looping and intersperse based on dividing it
up.
for( int looks=0; looks< ilookups; looks+=ideletes )
{ int grow = ilookups/cursize;
if( grow <= ideletes ) grow = ideletes+1;
at.grow(grow);
at.lookup( ideletes, grow );
at.delete( 1, grow );
}
}
time = System.currentTimeMillis() - time;
iAvg[type] = iAvg[type]*iRuns + (int) time;
iAvg[type] /= (iRuns+1);
System.out.println( "Type " + type + " : " + time );
}
for( int type=0; type<TYPE_COUNT; type++ )
System.out.println( "Interspered: " + TypesHuman[type] + " timing "
+ iAvg[type] );


}
}
 
 
Jon Skeet





PostPosted: 2003-7-3 21:18:00 Top

java-programmer >> Toward more efficient ArrayLists Tim Jowers <email***@***.com> wrote:
> Jon Skeet <email***@***.com> wrote:
> > Adam Maass <email***@***.com> wrote:
> > > > The 10 time probably refers to Vector rather than ArrayList. I can't
> > > > remember where I read this or the details. I just remember the number
> > > > 10. The penalty should be less now the synchronized has less penalty.
> > > >
>
> JBench did not compile due to needing some outside classes.

Did you not just download the jar file, JBench.jar? That contains
everything it needs to. I believe it also says on the website where to
get the other classes which are needed.

> I wrote a quick test. The whole issue is really how the arrays are used and
> maybe the best "test" would be one that allows this to be configured
> so we can test based on the expected/measured usage patterns. I heard
> a talk on a GC that used statistical analysis...OT.

Indeed, actual usage would (I believe) show interesting results - like
that the performance difference between arrays and ArrayList probably
has no significant bearing on performance in most circumstances.

--
Jon Skeet - <email***@***.com>
http://www.pobox.com/~skeet/
If replying to the group, please do not mail me too