Infinite loops in hashCode() and equals()  
Author Message
Mike Schilling





PostPosted: 2004-6-10 7:14:00 Top

java-programmer, Infinite loops in hashCode() and equals() What do you think the following fragment prints?:

List l = new ArrayList();
l.add(l);
System.out.println(l.hashCode());

Trick question. The result is an infinite recursion (i.e. a stack
overflow), since the hash code for a List is calculated from the hash codes
of its members.

Likewise, the following causes an infinite recursion, since lists are
defined to be equal when they have the same members.

List l1 = new ArrayList();
List l2 = new ArrayList();

l1.add(l2);
l2.add(l1);

System.out.println(l1.equals(l2));

These cases are simple to spot, but the same situation could occur with any
cyclic graph of objects that compute hash code and/or equality based on the
values of their members. Also, equals() and hashCode() are often called
implicitly, for instance when adding an object to a HashSet or HashMap.

I don't see any way this could be addressed other than by keeping track of
which objects have already been processed by the current hashCode() or
equals() call, and there's no way to do that without changing the method
signatures, to, say, add a Set:

in Object:

public final int hashCode() {
return hashCode(new HashSet());
}

public final int hashCode(Set processed)
{
if (processed.contains(this)) {
return System.identityHashCode(this);
}
else {
processed.add(this);
return this.calculateHashCode(processed);
}
}

public int calculateHashCode(processed) {
return return System.identityHashCode(this);
}

with calculateHashCode(Set) being the method that classes can override and
hashCode(Set) the method List et al. call on their members. And that is a
pipe dream, since it isn't possible to make this sort of incompatible change
at this late date.



 
megagurka





PostPosted: 2004-6-10 16:23:00 Top

java-programmer >> Infinite loops in hashCode() and equals() "Mike Schilling" <email***@***.com> wrote in message news:<gRMxc.68838$email***@***.com>...
> What do you think the following fragment prints?:
>
> List l = new ArrayList();
> l.add(l);
> System.out.println(l.hashCode());
>
> Trick question. The result is an infinite recursion (i.e. a stack
> overflow), since the hash code for a List is calculated from the hash codes
> of its members.
>
> Likewise, the following causes an infinite recursion, since lists are
> defined to be equal when they have the same members.
>
> List l1 = new ArrayList();
> List l2 = new ArrayList();
>
> l1.add(l2);
> l2.add(l1);
>
> System.out.println(l1.equals(l2));
>
> These cases are simple to spot, but the same situation could occur with any
> cyclic graph of objects that compute hash code and/or equality based on the
> values of their members. Also, equals() and hashCode() are often called
> implicitly, for instance when adding an object to a HashSet or HashMap.

Hmmm, I can't think of a case where this problem would occur, so I
don't think there is a need for a solution. If object A reference
another object B you don't include B in A.equals() (or A.hashCode()),
unless B is part of A (composition). And in that case B is clearly not
a part of A, so there is no cycle. Can you provide a real world
example where this problem occurs?

/Jesper Nordenberg
 
Andy Fish





PostPosted: 2004-6-10 18:39:00 Top

java-programmer >> Infinite loops in hashCode() and equals()
"Jesper Nordenberg" <email***@***.com> wrote in message
news:email***@***.com...
> "Mike Schilling" <email***@***.com> wrote in message
news:<gRMxc.68838$email***@***.com>...
> > What do you think the following fragment prints?:
> >
> > List l = new ArrayList();
> > l.add(l);
> > System.out.println(l.hashCode());
> >
> > Trick question. The result is an infinite recursion (i.e. a stack
> > overflow), since the hash code for a List is calculated from the hash
codes
> > of its members.
> >
> > Likewise, the following causes an infinite recursion, since lists are
> > defined to be equal when they have the same members.
> >
> > List l1 = new ArrayList();
> > List l2 = new ArrayList();
> >
> > l1.add(l2);
> > l2.add(l1);
> >
> > System.out.println(l1.equals(l2));
> >
> > These cases are simple to spot, but the same situation could occur with
any
> > cyclic graph of objects that compute hash code and/or equality based on
the
> > values of their members. Also, equals() and hashCode() are often called
> > implicitly, for instance when adding an object to a HashSet or HashMap.
>
> Hmmm, I can't think of a case where this problem would occur, so I
> don't think there is a need for a solution. If object A reference
> another object B you don't include B in A.equals() (or A.hashCode()),
> unless B is part of A (composition). And in that case B is clearly not
> a part of A, so there is no cycle. Can you provide a real world
> example where this problem occurs?

Although mike's examples aren't necessarily real-world ones, I don't see any
reason they couldn't appear in real life

having an arrayList contain itself seems a perfectly reasonable thing to do
when manipulating complex data structures (though the only example I can
think of off hand is trying to compute an empirical solution to russell's
paradox ;-). And computing the hashcode of an ArrayList also seems entirely
reasonable.

I would say it's a bug in the implementation of ArrayList.HashCode()

>
> /Jesper Nordenberg


 
 
IkRhcmlvIChkcmlua2luZyBjb++sgGVlIGluIHRoZSBv76yDY2XigKYpIg





PostPosted: 2004-6-11 0:26:00 Top

java-programmer >> Infinite loops in hashCode() and equals() Andy Fish wrote:

> I would say it's a bug in the implementation of ArrayList.HashCode()

The Java List specification says:
Note: While it is permissible for lists to contain
themselves as elements, extreme caution is advised:
the equals and hashCode methods are no longer
well defined on a such a list.

- Dario
 
 
John C. Bollinger





PostPosted: 2004-6-11 0:48:00 Top

java-programmer >> Infinite loops in hashCode() and equals() Andy Fish wrote:
> Although mike's examples aren't necessarily real-world ones, I don't see any
> reason they couldn't appear in real life
>
> having an arrayList contain itself seems a perfectly reasonable thing to do
> when manipulating complex data structures (though the only example I can
> think of off hand is trying to compute an empirical solution to russell's
> paradox ;-). And computing the hashcode of an ArrayList also seems entirely
> reasonable.
>
> I would say it's a bug in the implementation of ArrayList.HashCode()

I would say that the following excerpt from List's class-level Javadocs
is precisely to the point:

====
Note: While it is permissible for lists to contain themselves as
elements, extreme caution is advised: the equals and hashCode methods
are no longer well defined on a such a list.
====

List defines the algorithms for List implementations' hashCode() and
equals() methods in terms of their elements, which significantly eases
some applications. This is purposeful. I think it is in general _not_
a very reasonable thing to do to add a List to itself, and I would be
very interested in even a hypothetical scenario where it would make
sense. I would be especially interested in such a scenario where in
addition there is no good workaround. Until I see such I'll agree with
Jesper that this looks like a theoretical problem that need not occur in
practice.


John Bollinger
email***@***.com
 
 
Mike Schilling





PostPosted: 2004-6-11 3:43:00 Top

java-programmer >> Infinite loops in hashCode() and equals()
"John C. Bollinger" <email***@***.com> wrote in message
news:caa3bs$b9r$email***@***.com...

>
> List defines the algorithms for List implementations' hashCode() and
> equals() methods in terms of their elements, which significantly eases
> some applications. This is purposeful. I think it is in general _not_
> a very reasonable thing to do to add a List to itself, and I would be
> very interested in even a hypothetical scenario where it would make
> sense. I would be especially interested in such a scenario where in
> addition there is no good workaround. Until I see such I'll agree with
> Jesper that this looks like a theoretical problem that need not occur in
> practice.

First, note that this behavior doesn't depend upon a List being a member of
itself. Create a graph of objects that have value-like semantics for
equality (and thus for hash code). This includes all Lists, Sets and Maps.
If there are any cycles in the graph, the result of hashCode() is an
infinite loop. Arguing that there will never be cycles in a well-designed
application is a bit like arguing that reference counting works as well as a
true GC, since cycles, which is where reference counting breaks down, should
be avoided.

Anyway, this came up while investigating automated generation of classes to
represent types in SOAP messages. These classes would have no notion of
identity, so value semantics are indicated: two objects received in
different messages are equal if all of their values are equal. hashcode()
and equals() should be generated in the obvious way, by iterating over the
members of the class. But what happens if there's a cycle (which is
possible in encoded SOAP messages, though not in literal ones)? How do you
avoid the infinite loop? Well, let's see how containers avoid it...


 
 
Mike Schilling





PostPosted: 2004-6-11 3:44:00 Top

java-programmer >> Infinite loops in hashCode() and equals()
"Dario (drinking co铿?ee in the o铿fce??" <email***@***.com> wrote in
message news:caa29s$pkt$email***@***.com...
> Andy Fish wrote:
>
> > I would say it's a bug in the implementation of ArrayList.HashCode()
>
> The Java List specification says:
> Note: While it is permissible for lists to contain
> themselves as elements, extreme caution is advised:
> the equals and hashCode methods are no longer
> well defined on a such a list.

Less weaselly wording would have been: "Doing so can lead to unavaoidable
infinite recursion."


 
 
John C. Bollinger





PostPosted: 2004-6-11 7:10:00 Top

java-programmer >> Infinite loops in hashCode() and equals() Mike Schilling wrote:

> "John C. Bollinger" <email***@***.com> wrote in message
> news:caa3bs$b9r$email***@***.com...
>
>
>>List defines the algorithms for List implementations' hashCode() and
>>equals() methods in terms of their elements, which significantly eases
>>some applications. This is purposeful. I think it is in general _not_
>>a very reasonable thing to do to add a List to itself, and I would be
>>very interested in even a hypothetical scenario where it would make
>>sense. I would be especially interested in such a scenario where in
>>addition there is no good workaround. Until I see such I'll agree with
>>Jesper that this looks like a theoretical problem that need not occur in
>>practice.
>
>
> First, note that this behavior doesn't depend upon a List being a member of
> itself. Create a graph of objects that have value-like semantics for
> equality (and thus for hash code). This includes all Lists, Sets and Maps.
> If there are any cycles in the graph, the result of hashCode() is an
> infinite loop.

Acknowledged.

> Arguing that there will never be cycles in a well-designed
> application is a bit like arguing that reference counting works as well as a
> true GC, since cycles, which is where reference counting breaks down, should
> be avoided.

Hold on a moment there. I certainly wouldn't argue for a minute that a
well-designed application wouldn't have cycles of references. I would,
however, claim that I currently see no use for such a cycle in which all
members have the kind of value-like semantics that we're discussing.
Whether a List is directly or indirectly a member of itself makes little
difference -- neither makes sense in any practical application as far as
I can see.

> Anyway, this came up while investigating automated generation of classes to
> represent types in SOAP messages. These classes would have no notion of
> identity, so value semantics are indicated: two objects received in
> different messages are equal if all of their values are equal. hashcode()
> and equals() should be generated in the obvious way, by iterating over the
> members of the class. But what happens if there's a cycle (which is
> possible in encoded SOAP messages, though not in literal ones)?

I know enough about SOAP to find that confusing, but not enough to be
certain whether I really ought to be confused. Are you talking about
types defined via XML Schema, to be used in schemas for SOAP message
bodies? IIRC, type _definitions_ can be recursive. Type
_implementations_, on the other hand, cannot be recursive because XML
documents are flat. (Or they're tree-shaped, if you prefer; you still
can't get recursion.)

> How do you
> avoid the infinite loop?

If I understand correctly (and quite possibly I don't) then you don't
need to worry about it.

> Well, let's see how containers avoid it...

Gotcha. The point that we have now beaten to death being that they
don't. My further point being that they probably don't need to do for
any sensible application I can imagine. The same applies to your
analogous XML type classes as far as I can tell from your brief description.


John Bollinger
email***@***.com
 
 
xarax





PostPosted: 2004-6-11 8:54:00 Top

java-programmer >> Infinite loops in hashCode() and equals() "Mike Schilling" <email***@***.com> wrote in message
news:gRMxc.68838$email***@***.com...
> What do you think the following fragment prints?:
>
> List l = new ArrayList();
> l.add(l);
> System.out.println(l.hashCode());
>
> Trick question. The result is an infinite recursion (i.e. a stack
> overflow), since the hash code for a List is calculated from the hash codes
> of its members.
>
> Likewise, the following causes an infinite recursion, since lists are
> defined to be equal when they have the same members.
>
> List l1 = new ArrayList();
> List l2 = new ArrayList();
>
> l1.add(l2);
> l2.add(l1);
>
> System.out.println(l1.equals(l2));
>
> These cases are simple to spot, but the same situation could occur with any
> cyclic graph of objects that compute hash code and/or equality based on the
> values of their members. Also, equals() and hashCode() are often called
> implicitly, for instance when adding an object to a HashSet or HashMap.
>
> I don't see any way this could be addressed other than by keeping track of
> which objects have already been processed by the current hashCode() or
> equals() call, and there's no way to do that without changing the method
> signatures, to, say, add a Set:
>
> in Object:
>
> public final int hashCode() {
> return hashCode(new HashSet());
> }
>
> public final int hashCode(Set processed)
> {
> if (processed.contains(this)) {
> return System.identityHashCode(this);
> }
> else {
> processed.add(this);
> return this.calculateHashCode(processed);
> }
> }
>
> public int calculateHashCode(processed) {
> return return System.identityHashCode(this);
> }
>
> with calculateHashCode(Set) being the method that classes can override and
> hashCode(Set) the method List et al. call on their members. And that is a
> pipe dream, since it isn't possible to make this sort of incompatible change
> at this late date.
>


Not infinite recursion, because you don't have
infinite memory. You will elicit a stack overflow
exception.


 
 
megagurka





PostPosted: 2004-6-11 16:14:00 Top

java-programmer >> Infinite loops in hashCode() and equals() "Andy Fish" <email***@***.com> wrote in message news:<QSWxc.543$email***@***.com>...
> Although mike's examples aren't necessarily real-world ones, I don't see any
> reason they couldn't appear in real life
>
> having an arrayList contain itself seems a perfectly reasonable thing to do
> when manipulating complex data structures (though the only example I can
> think of off hand is trying to compute an empirical solution to russell's
> paradox ;-). And computing the hashcode of an ArrayList also seems entirely
> reasonable.
>
> I would say it's a bug in the implementation of ArrayList.HashCode()

Although ArrayList could be designed to solve this problem by using
thread local variables and sets, it's not worth the effort (and
performance hit) because I can't think of, and you haven't presented,
a case where cyclic hash code calculation would be useful. It's better
to document that object cycles are not supported in the current
implementation.

/Jesper Nordenberg
 
 
megagurka





PostPosted: 2004-6-11 16:27:00 Top

java-programmer >> Infinite loops in hashCode() and equals() "Mike Schilling" <email***@***.com> wrote in message news:<8R2yc.69301$email***@***.com>...
> "John C. Bollinger" <email***@***.com> wrote in message
> news:caa3bs$b9r$email***@***.com...
>
> >
> > List defines the algorithms for List implementations' hashCode() and
> > equals() methods in terms of their elements, which significantly eases
> > some applications. This is purposeful. I think it is in general _not_
> > a very reasonable thing to do to add a List to itself, and I would be
> > very interested in even a hypothetical scenario where it would make
> > sense. I would be especially interested in such a scenario where in
> > addition there is no good workaround. Until I see such I'll agree with
> > Jesper that this looks like a theoretical problem that need not occur in
> > practice.
>
> First, note that this behavior doesn't depend upon a List being a member of
> itself. Create a graph of objects that have value-like semantics for
> equality (and thus for hash code). This includes all Lists, Sets and Maps.
> If there are any cycles in the graph, the result of hashCode() is an
> infinite loop. Arguing that there will never be cycles in a well-designed
> application is a bit like arguing that reference counting works as well as a
> true GC, since cycles, which is where reference counting breaks down, should
> be avoided.

That is not a valid comparison, I can easily give an example in which
a object reference cycle occurs naturally (parent-child) and a
reference counting GC would fail. Can you give an example where a
cyclic hash code calculation would occur?

> Anyway, this came up while investigating automated generation of classes to
> represent types in SOAP messages. These classes would have no notion of
> identity, so value semantics are indicated: two objects received in
> different messages are equal if all of their values are equal. hashcode()
> and equals() should be generated in the obvious way, by iterating over the
> members of the class.

Not all class members should be included in the calculation of a hash
code, some "weak" references are created for traversal and
optimization (for example a parent reference in a child). Your cycle
can probably be broken by removing some member that shouldn't be
included in your hash code calculation. You should ask yourself if the
members you include in your hash code calculation for an object really
is a part of the object.

/Jesper Nordenberg
 
 
Mike Schilling





PostPosted: 2004-6-12 12:47:00 Top

java-programmer >> Infinite loops in hashCode() and equals()
"John C. Bollinger" <email***@***.com> wrote in message
news:caapo4$k7l$email***@***.com...
> Mike Schilling wrote:
>
> > "John C. Bollinger" <email***@***.com> wrote in message
> > news:caa3bs$b9r$email***@***.com...
> >
> >
> >>List defines the algorithms for List implementations' hashCode() and
> >>equals() methods in terms of their elements, which significantly eases
> >>some applications. This is purposeful. I think it is in general _not_
> >>a very reasonable thing to do to add a List to itself, and I would be
> >>very interested in even a hypothetical scenario where it would make
> >>sense. I would be especially interested in such a scenario where in
> >>addition there is no good workaround. Until I see such I'll agree with
> >>Jesper that this looks like a theoretical problem that need not occur in
> >>practice.
> >
> >
> > First, note that this behavior doesn't depend upon a List being a member
of
> > itself. Create a graph of objects that have value-like semantics for
> > equality (and thus for hash code). This includes all Lists, Sets and
Maps.
> > If there are any cycles in the graph, the result of hashCode() is an
> > infinite loop.
>
> Acknowledged.
>
> > Arguing that there will never be cycles in a
well-designed
> > application is a bit like arguing that reference counting works as well
as a
> > true GC, since cycles, which is where reference counting breaks down,
should
> > be avoided.
>
> Hold on a moment there. I certainly wouldn't argue for a minute that a
> well-designed application wouldn't have cycles of references. I would,
> however, claim that I currently see no use for such a cycle in which all
> members have the kind of value-like semantics that we're discussing.
> Whether a List is directly or indirectly a member of itself makes little
> difference -- neither makes sense in any practical application as far as
> I can see.
>
> > Anyway, this came up while investigating automated generation of classes
to
> > represent types in SOAP messages. These classes would have no notion of
> > identity, so value semantics are indicated: two objects received in
> > different messages are equal if all of their values are equal.
hashcode()
> > and equals() should be generated in the obvious way, by iterating over
the
> > members of the class. But what happens if there's a cycle (which is
> > possible in encoded SOAP messages, though not in literal ones)?
>
> I know enough about SOAP to find that confusing, but not enough to be
> certain whether I really ought to be confused. Are you talking about
> types defined via XML Schema, to be used in schemas for SOAP message
> bodies? IIRC, type _definitions_ can be recursive. Type
> _implementations_, on the other hand, cannot be recursive because XML
> documents are flat. (Or they're tree-shaped, if you prefer; you still
> can't get recursion.)

No, encoded SOAP messages can express graphs with cycles. Attributes are
used to express directed references from one object to another, and the
graph can be arbitrarily complex.