What is a type error?  
Author Message
Joe Marshall





PostPosted: 2006-7-15 3:47:00 Top

java-programmer, What is a type error?
Marshall wrote:
> Joe Marshall wrote:
> > Marshall wrote:
> > >
> > > Consider the following Java fragment:
> > >
> > > void foo() {
> > > int i = 0;
> > > int j = 0;
> > >
> > > // put any code here you want
> > >
> > > j = 1;
> > > i = 2;
> > > // check value of j here. It is still 1, no matter what you filled in
> > > above.
> > > // The assignment to i cannot be made to affect the value of j.
> > >
> > > }
> >
> > True, but you have hidden the pointers. Semantically, the identifiers
> > i and j refer not to integers but to locations that hold integers. The
> > assignment modifies the location.
>
> What the implementation looks like shouldn't affect how we speak
> of the logical model. In the above code, there are no pointers.
>
> By your definition, "pointer" and "variable" are synonyms. That doesn't
> seem like a good idea to me. (What if i and j end up in registers?
> I have not heard it said that registers have addresses.)

You probably won't like this, but....

The example you give above is a bit ambiguous as to whether or not it
shows `true' mutation. So what do I mean by `true' mutation? The code
above could be transformed by alpha-renaming into an equivalent version
that does no mutation:

void foo() {
int i = 0;
int j = 0;

// put any code here you want

int jj = 1;
int ii = 2;
// rename all uses of i to ii and j to jj after this point.

}

Once I've done the renaming, we're left with a pure function. I claim
that this sort of `mutation' is uninteresting because it can be
trivially eliminated through renaming. Since we can eliminate it
through renaming, we can use a simplified semantics that does not
involve a `store', and use a substitution model of evaluation.

The more interesting kind of mutation cannot be eliminated through
renaming. The semantic model requires the notion of a stateful
`store', and the model is not referentially transparent: substitution
does not work. These qualities are what make mutation problematic. My
claim is that this kind of mutation cannot be had without some model of
pointers and references.

There is a further distinguishing characteristic: Can I write a
program that detects the mutation (and acts differently depending on
whether there is mutation or not)? It is unclear from your example
above whether you intended this to be possible, but certainly there is
no way for a caller of foo to tell that foo reassigns an internal
variable. If there a procedure is extrinsically a pure function, then
there is no real meaning to any `mutation' that goes on inside; there
is always a way to turn the internal mutation into a pure function
through alpha-renaming.

>
> Clearly there is *some* difference between a language which allows
> explicit pointers and thus aliasing and one that doesn't. What term
> would you use? First-class variables?

My argument to you in re this statement:
> Again, I disagree: it is posible to have mutability without
> pointers/identity/objects.

is that mutability without a notion of pointers, identity, or objects
is trivially eliminated through alpha renaming and thus isn't the
`mutability' of interest.

 
Joe Marshall





PostPosted: 2006-7-15 3:47:00 Top

java-programmer >> What is a type error?
Marshall wrote:
> Joe Marshall wrote:
> > Marshall wrote:
> > >
> > > Consider the following Java fragment:
> > >
> > > void foo() {
> > > int i = 0;
> > > int j = 0;
> > >
> > > // put any code here you want
> > >
> > > j = 1;
> > > i = 2;
> > > // check value of j here. It is still 1, no matter what you filled in
> > > above.
> > > // The assignment to i cannot be made to affect the value of j.
> > >
> > > }
> >
> > True, but you have hidden the pointers. Semantically, the identifiers
> > i and j refer not to integers but to locations that hold integers. The
> > assignment modifies the location.
>
> What the implementation looks like shouldn't affect how we speak
> of the logical model. In the above code, there are no pointers.
>
> By your definition, "pointer" and "variable" are synonyms. That doesn't
> seem like a good idea to me. (What if i and j end up in registers?
> I have not heard it said that registers have addresses.)

You probably won't like this, but....

The example you give above is a bit ambiguous as to whether or not it
shows `true' mutation. So what do I mean by `true' mutation? The code
above could be transformed by alpha-renaming into an equivalent version
that does no mutation:

void foo() {
int i = 0;
int j = 0;

// put any code here you want

int jj = 1;
int ii = 2;
// rename all uses of i to ii and j to jj after this point.

}

Once I've done the renaming, we're left with a pure function. I claim
that this sort of `mutation' is uninteresting because it can be
trivially eliminated through renaming. Since we can eliminate it
through renaming, we can use a simplified semantics that does not
involve a `store', and use a substitution model of evaluation.

The more interesting kind of mutation cannot be eliminated through
renaming. The semantic model requires the notion of a stateful
`store', and the model is not referentially transparent: substitution
does not work. These qualities are what make mutation problematic. My
claim is that this kind of mutation cannot be had without some model of
pointers and references.

There is a further distinguishing characteristic: Can I write a
program that detects the mutation (and acts differently depending on
whether there is mutation or not)? It is unclear from your example
above whether you intended this to be possible, but certainly there is
no way for a caller of foo to tell that foo reassigns an internal
variable. If there a procedure is extrinsically a pure function, then
there is no real meaning to any `mutation' that goes on inside; there
is always a way to turn the internal mutation into a pure function
through alpha-renaming.

>
> Clearly there is *some* difference between a language which allows
> explicit pointers and thus aliasing and one that doesn't. What term
> would you use? First-class variables?

My argument to you in re this statement:
> Again, I disagree: it is posible to have mutability without
> pointers/identity/objects.

is that mutability without a notion of pointers, identity, or objects
is trivially eliminated through alpha renaming and thus isn't the
`mutability' of interest.

 
Joachim Durchholz





PostPosted: 2006-7-15 5:43:00 Top

java-programmer >> What is a type error? Marshall schrieb:
> Joachim Durchholz wrote:
>> Marshall schrieb:
>>> What about my example of SQL? Mutation, no pointers, no aliasing.
>>> Yet: useful.
>> Sorry, but SQL does have aliasing.
>
> Well. I suppose we do not have an agreed upon definition
> of aliasing, so it is hard to evaluate either way. I would
> propose using the same one you used for identity:
>
> if there are two variables and modifying one also modifies
> the other, then there is aliasing between them.

I think that's an adequate example.
For a definition, I'd say it's a bit too narrow unless we use a fairly
broad definition for "variable". I.e. in a C context, I'd say that a->b
is a variable, too, as would be foo(blah)->x.

> I avoided mentioning equality to include, for example,
> having an array i that is an alias to a subset of array j.

This would mean that there's aliasing between, say, a list of
transaction records and the balance of the account (since modifying the
list of transactions will change the balance, unless the object isn't
properly encapsulated).
For purposes of this discussion, it's probably fair to say "that's a
form of aliasing, too", even though it's quite indirect.

>> E.g. if you have records that have name="John", surname="Doe", the
>> statements
>> SELECT * FROM persons WHERE name = "John"
>> and
>> SELECT * FROM persons WHERE name = "Doe"
>> are aliases of each other.

Arrrrrgh... I made a most braindamaged, stupid mistake here. The second
SELECT should have used the *surname* field, so it would be

>> SELECT * FROM persons WHERE surname = "Doe"

Then, if there's a record that has name = "John", surname = "Doe", the
two WHERE clauses have aliasing in the form of overlapping result sets.

>> The alias is actually in the WHERE clause.
>
> Not by my definition, because there is only one variable here.

Sorry, my wording was sloppy.

I meant to say that 'in SQL, you identify records via clauses like WHERE
name = "John", i.e. WHERE clauses are a kind of identity'.

This is still not precise - the identity of an SQL record isn't
explicitly accessible (except the nonstandard OID facility that some SQL
engines offer). Nevertheless, they do have an identity, and there's a
possibility of aliasing - if you change all Johns, you may also change a
Doe.

>> And this *can* get you into
>> trouble if you have something that does
>> UPDATE ... WHERE name = "John"
>> and
>> UPDATE ... WHERE surname = "Doe"
>> e.g. doing something with the Johns, then updating the names of all
>> Does, and finally restoring the Johns (but not recognizing that changing
>> the names of all Does might have changed your set of Johns).
>
> The fact that some person might get confused about the
> semantics of what they are doing does not indicate aliasing.
> It is easy enough to do an analysis of your updates and
> understand what will happen; this same analysis is impossible
> with two arbitrary pointers, unless one has a whole-program
> trace. That strikes me as a significant difference.

Sure. I said that aliases in SQL aren't as bad as in other programs.

Once you get abstraction mixed in, the analysis becomes less
straightforward. In the case of SQL, views are such an abstraction
facility, and they indeed can obscure what you're doing and make the
analysis more difficult. If it's just SQL we're talking about, you
indeed have to look at the whole SQL to check whether there's a view
that may be involved in the queries you're analysing, so the situation
isn't *that* different from pointers - it's just not a problem because
the amount of code is so tiny!

>> Conceptually, this is just the same as having two different access path
>> to the same memory cell. Or accessing the same global variable through a
>> call-by-reference parameter and via its global name.
>
> There are similarities, but they are not the same.

What are the relevant differences? How does the semantics of a WHERE
clause differ from that of a pointer, in terms of potential aliasing?

My line of argument would be this:
Pointers can be simulated using arrays, so it's no surprise that arrays
can emulate all the aliasing of pointers.
Arrays can be simulated using WHERE (with SELECT and UPDATE), so I'd say
that SQL can emulate all the aliasing of arrays, and hence that of pointers.

>> BTW with views, you get not just aliasing but what makes aliasing really
>> dangerous. Without views, you can simply survey all the queries that you
>> are working with and lexically compare table and field names to see
>> whether there's aliasing. With views, the names that you see in a
>> lexical scope are not enough to determine aliasing.
>> E.g. if you use a view that builds upon the set of Johns but aren't
>> aware of that (possibly due to abstraction barriers), and you change the
>> name field of all Does, then you're altering the view without a chance
>> to locally catch the bug. That's just as bad as with any pointer
>> aliasing problem.
>
> It is certainly aliasing, but I would not call it "just as bad." There
> are
> elaborate declarative constraint mechanisms in place, for example.

Yes, but they can give you unexpected results.
It smells a bit like closing the gates after the horses are out.

On the other hand, *if* there is identity and updates, no matter how we
handle them, there must be *some* way to deal with the problems that
ensue. Having declarative constraints doesn't sound like the worst way
to do that.

> And the definition of the view itsef is a declarative, semantic
> entity. Whereas a pointer is an opaque, semantics-free address.

Oh, that's a very BCPL view. It was still somewhat true for K&R C, but
it's not really applicable to ANSI C, not at all to C++, and even less
to languages that associate well-encapsulated types with pointers (such
as Ada, Eiffel, or Java).
Unless, of course, you mean "address" if you say "pointer" ;-)

Regards,
Jo
 
 
Joachim Durchholz





PostPosted: 2006-7-15 6:01:00 Top

java-programmer >> What is a type error? Marshall schrieb:
> Joachim Durchholz wrote:
>> You can have aliasing without pointers; e.g. arrays are fully sufficient.
>> If i = j, then a [i] and a [j] are aliases of the same object.
>
> I am having a hard time with this very broad definition of aliasing.
> Would we also say that a[1+1] and a[2] are aliases? It seems
> to me, above, that we have only a, and with only one variable
> there can be no aliasing.

a[1] is a variable, too. You can assign values to it.

Inside function foo when called as foo(a[1]), it may even be referenced
with yet another name and updated in isolation; foo may not even know
it's an array element. (This is sort-of true in C, and definitely true
in languages that don't have pointer arithmetic. If you pass a[1] to a
call-by-reference parameter in Pascal, the callee has no way to access
the parameter as if it were an array element.)

> A further question:
>
> given a 32 bit integer variable x, and offsets i and j (equal as in
> the above example) would you say that
>
> x &= (1 << i)
> and
> x &= (1 << j)
>
> are aliased expressions for setting a particular bit in x?

Yes.

> I am not being facetious; I am trying to understand the limits
> of your definition for aliasing.

Understood and appreciated.

>> (After I observed that, I found it no longer a surprise that array
>> optimizations are what Fortran compiler teams sink most time into. The
>> aliasing problems are *exactly* the same as those with pointers in C -
>> though in practice, the Fortranistas have the advantage that the
>> compiler will usually know that a [i] and b [j] cannot be aliases of
>> each other, so while they have the same problems, the solutions give the
>> Fortranistas more leverage.)
>
> I don't understand this paragraph. On the one hand, it seems you
> are saying that C and Fortran are identically burdened with the
> troubles caused by aliasing, and then a moment later it seems
> you are saying the situation is distinctly better with Fortran.

That's exactly the situation.
Aliasing is present in both language, but in Fortran, the guarantee that
two names aren't aliased are easier to come by.
Even more non-aliasing guarantees exist if the language has a more
expressive type system. In most cases, if you know that two expressions
have different types, you can infer that they cannot be aliases.

Of course, the life of compiler writers if of less importance to us; I
added them only because the reports of Fortran compilers having aliasing
problems due to array indexes made me aware that pointers aren't the
only source of aliasing. That was quite a surprise to me then.

> Now with this, it appears you are agreeing that SQL has an advantage
> vis-a-vis aliasing compared to OO languages. Yes?

Sure.
I think that making an aliasing bug in SQL is just as easy as in any
pointer-ridden language, but transactions and constraints (plus the
considerably smaller typical size of SQL code) make it far easier to
diagnose any such bugs.

> If so, we are agreeing on the part I care about, and the specifics of
> just what we call aliasing are not so important to me.

I'm not sure whether I'm getting the same mileage out of this, but the
relevance of a problem is obviously a function on what one is doing on a
day-to-day basis, so I can readily agree with that :-)

Regards,
Jo
 
 
Chris Smith





PostPosted: 2006-7-15 8:54:00 Top

java-programmer >> What is a type error? Marshall wrote...
> I am having a hard time with this very broad definition of aliasing.
> Would we also say that a[1+1] and a[2] are aliases? It seems
> to me, above, that we have only a, and with only one variable
> there can be no aliasing.

The problem with this (and with the relational one as well) is that the
practical problems don't go away by redefining "variable" to mean larger
collections of values.

> given a 32 bit integer variable x, and offsets i and j (equal as in
> the above example) would you say that
>
> x &= (1 << i)
> and
> x &= (1 << j)
>
> are aliased expressions for setting a particular bit in x?

I don't know about Joachim, but yes, I would.

--
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation
 
 
Chris F Clark





PostPosted: 2006-7-16 3:53:00 Top

java-programmer >> What is a type error? Joachim Durchholz wrote:
>
> You can have aliasing without pointers; e.g. arrays are fully sufficient.
> If i = j, then a [i] and a [j] are aliases of the same object.

"Marshall" <email***@***.com> writes:

> I am having a hard time with this very broad definition of aliasing.
> Would we also say that a[1+1] and a[2] are aliases? It seems
> to me, above, that we have only a, and with only one variable
> there can be no aliasing.

As you can see from the replies, all these things which you do not
consider aliasing are considered aliasing (by most everyone else, in
fact these are exactly what is meant by aliasing). There is good
reason for this. Let us explain this by looking again at the SQL
example that has been kicked around some.

>> SELECT * FROM persons WHERE name = "John"
>> and
>> SELECT * FROM persons WHERE surname = "Doe"

Some set of records are shared by both select clauses. Those records
are the aliased records. If the set is empty there is no problem. If
we do not update the records from one of the select clauses, we also
have no problem. However, if we update the records for one of the
select clauses and the set of aliased records is not empty, the
records returned by the other select clause have changed. Therefore,
everything we knew to be true about the "other" select clause may or
may not still be true. It's the may not case we are worried about.

For example, in the "imperative Mi-5" Q asks Moneypenny to run the
first query (name = John) to get a list of agents for infiltrating
Spectre. In another department, they're doing sexual reassignments
and updating the database with the second query (surname = Doe) and
making the name become Jane (along with other changes to make the
transformation correct). So, when Q select "John Doe" from the first
list and sends that agent on a mission Q is really sending "Jane Doe"
and Spectre realizes that the agent is one and kills her. Not a happy
result.

In the "functional Mi-5", the sexual reassignment group, makes clones
of the agents reassigned, so that there is now both a "John Doe" and a
"Jane Doe". Of course, the payroll is a little more bloated, so that
when Q says "John Doe" isn't needed for any further assignments, the
"Garbage Collection" department does a reduction in force and gets rid
of "John Doe". The good news is that they didn't send Jane Doe to her
death.

The problem with aliasing, we want the things we knew true in one part
of our program, i.e. the Johns on Q's list are still Johns and not
Janes, to be true after we've run another part of our program. If the
second part of our program can change what was true after the first
part of our program, we can't depend on the results from the first
part of our program, i.e. Q can send Jane Doe into certain death.

The problem is compounded by the complexities of our programs. When Q
selected his list, he didn't know of the department doing the
reassignments (and couldn't know because they were part of a
top-secret project). So, since bureaucracies grow to meet the
increasing needs of the bureaucracy, often the solution is increased
complexity, more regulations and paperwork to fill out, semaphores and
locks to hold on critical data, read-only data, status requests, etc.
All to keep Q from killing Jane by accident. Sometimes they work. the
reassignment department has to wait 30-days for the clearance to
perform their operation in which time John Doe completes the
infiltration of Spectre saves the world from destruction and is ready
for his next assignment.

The key point is that each record (and even each field in each record)
if it can be known by two names, is an alias. It is not sufficient to
talk about "whole" variables as not being aliased if there is some way
to refer to some part of the variable and change that part of the
variable. Thus, a[1+1] is an alias for a[2] if you can have one part
of the code talking about one and another part of the code talking
about the other.

To put it one still final way, consider the following code;
assert(sorted(array a));
a[1] = 2;
assert(sorted(array a)); // is this still true?

Because a[1] is an alias of array a, you cannot be certain that the
2nd assert will not fire (fail) without additional analysis.

assert(sorted(array a));
assert(a[0] <= 2);
assert(2 <= a[2]);
a[1] = 2;
assert(sorted(array a)); // this is still true!

-Chris
 
 
Marshall





PostPosted: 2006-7-16 4:56:00 Top

java-programmer >> What is a type error? Joachim Durchholz wrote:
> Marshall schrieb:
>
> >> In some cases, you need an additional level of conceptual indirection -
> >> instead of *doing* the updates, you write a function that *describes* them.
> >
> > But then what do you do with that function?
>
> I pass it to an engine that's imperative.
>
> However, that engine doesn't need to do aliasing anymore.
>
> In other words, I'm separating mutability and aliasing, so that they
> can't combine and explode.

This would seem to be identical to what I am proposing.


> > Let's say I have an
> > employee database. John Smith just got hired on 1/1/2006 with
> > a salary of $10,000. I need to record this fact somewhere. How
> > do I do that without variables? Current-employees is a variable.
> > Even if I have the space to keep all historical data, so I'm not
> > deleting anything, I still have to have a variable for the latest
> > version of the accumulated data. I can solve this without
> > pointers, but I can't solve it without variables.
>
> As I said elsewhere, the record has an identity even though it isn't
> explicit in SQL.

Hmmmm. What can this mean?

In general, I feel that "records" are not the right conceptual
level to think about.

In any event, I am not sure what you mean by non-explicit
identity. I would say, records in SQL have value, and their
identity is exactly their value. I do not see that they have
any identity outside of their value. We can uniquely identify
any particular record via a key, but a table may have more
than one key, and an update may change the values of one
key but not another. So it is not possible in general to
definitely and uniquely assign a mapping from each record
of a table after an update to each record of the table before
the update, and if you can't do that, then where
is the record identity?

(The situation is even more dire when one considers that
SQL actually uses bag semantics and not set semantics.)


[...]

> > I should like to learn more about these. I have some vague
> > perception of the existence of linear logic, but not much
> > else. However, I also already have an excellent solution
> > to the pointer aliasing problem, so I'm less motivated.
>
> Pointers are just the most obvious form of aliasing. As I said
> elsewhere, there are others: array indexing, by-reference parameters, or
> even WHERE clauses in SQL statements. In fact, anything that selects an
> updatable piece of data from a collection is an alias, and incurs the
> same problems as pointers, though they may come in different disguises.

Okay. At this point, though, the term aliasing has become extremely
general. I believe "i+1+1" is an alias for "i+2" under this definition.
That is so general that I am concerned it has lost its ability to
identify problems specific to pointers.


> >> Actually SQL has references - they are called "primary keys", but they
> >> are references nevertheless.
> >
> > I strongly object; this is quite incorrect. I grant you that from the
> > 50,000 foot level they appear identical, but they are not. To
> > qualify as a reference, there need to be reference and dereference
> > operations on the reference datatype; there is no such operation
> > is SQL.
> >
> > Would you say the relational algebra has references?
>
> Yes.
>
> > Or, consider the classic prolog ancestor query. Let's say we're
> > setting up as follows
> >
> > father(bob, joe).
> > father(joe, john).
> >
> > Is "joe" a reference, here? After all, when we do the ancestor
> > query for john, we'll see his father is joe and then use that to
> > find joe's father. Keys in SQL are isomorphic to joe in the
> > above prolog.
>
> Agreed. They are both references ;-P

Again, by generalizing the term this far, I am concerned with a
loss of precision. If "joe" in the prolog is a references, then
"reference" is just a term for "data" that is being used in a
certain way. The conection with a specfic address space
has been lost in favor of the full domain of the datatype.


> >> (Some SQL dialects also offer synthetic
> >> "ID" fields that are guaranteed to remain stable over the lifetime of a
> >> record.
> >
> > Primary keys are updatable; there is nothing special about them.
>
> Right - I was wrong with identifying keys and identities. In fact the
> identity of an SQL record cannot be directly obtained (unless via those
> ID fields).
> The records still have identities. It's possible to have two WHERE
> clauses that refer to the same record, and if you update the record
> using one WHERE clause, the record returned by the other WHERE clause
> will have changed, too.

Is this any different from saying that an expression that includes
a variable will produce a different value if the variable changes?

In other words, "i+1" will return a different value if we update i.

It seems odd to me to suggest that "i+1" has identity. I can see
that i has identity, but I would say that "i+1" has only value.

But perhaps the ultimate upshoot of this thread is that my use
of terminology is nonstandard.


> >> With a "repeatable read" isolation level, you actually return to a
> >> declarative view of the database: whatever you do with it, you won't see
> >> it until you commit the transaction. (As soon as you commit, the
> >> declarative peace is over and you better watch out that your data
> >> doesn't become inconsistent due to aliasing.)
> >
> > Alas, transaction isolation levels are a performance hack.
> > I cannot defend them on any logical basis. (Aside: did you mean
> > serializable, rather than repeatable read?)
>
> Possibly. There are so many isolation levels that I have to look them up
> whenever I want to get the terminology 100% correct.

Hmmm. Is it that there are so many, or that they are simply not
part of our daily concern? It seems to me there are more different
styles of parameter passing than there are isolation levels, but
I don't usually see (competent) people (such as yourself) getting
call-by-value confused with call-by-reference.


> >> The only thing that can really be done about it is not adding it
> >> artificially into a program. In those cases where aliasing is part of
> >> the modelled domain, you really have to carefully inspect all
> >> interactions and never, never, never dream about abstracting it away.
> >
> > Yes, aliasing introduces a lot of problems. This is one reason
> > why closures make me nervous.
>
> Me too.
> On the other hand, they are so immensely useful.
> Simply don't create closures with mutable data in them. Or, to the very
> least, make sure that no aliases outside the closure exist.

Thank you for the interesting exchange.


Marshall

 
 
Joachim Durchholz





PostPosted: 2006-7-16 6:37:00 Top

java-programmer >> What is a type error? Marshall schrieb:
> Joachim Durchholz wrote:
>> As I said elsewhere, the record has an identity even though it isn't
>> explicit in SQL.
>
> Hmmmm. What can this mean?
>
> In general, I feel that "records" are not the right conceptual
> level to think about.

They are, when it comes to aliasing of mutable data. I think it's
justified by the fact that aliased mutable data has a galling tendency
to break abstraction barriers. (More on this on request.)

> In any event, I am not sure what you mean by non-explicit
> identity.

The identity isn't visible from inside SQL. (Unless there's an OID
facility available, which *is* an explicit identity.)

> I would say, records in SQL have value, and their
> identity is exactly their value.

Definitely not. You can have two equal records and update just one of
them, yielding non-equal records; by my definition (and by intuition),
this means that the records were equal but not identical.

> I do not see that they have
> any identity outside of their value. We can uniquely identify
> any particular record via a key, but a table may have more
> than one key, and an update may change the values of one
> key but not another. So it is not possible in general to
> definitely and uniquely assign a mapping from each record
> of a table after an update to each record of the table before
> the update, and if you can't do that, then where
> is the record identity?

Such a mapping is indeed possible. Simply extend the table with a new
column, number the columns consecutively, and identify the records via
that column.

But even if you don't do that, there's still identity. It is irrelevant
whether the programs can directly read the value of the identity field;
the adverse effects happen because updates are in-place. (If every
update generated a new record, then we'd indeed have no identity.)

> Okay. At this point, though, the term aliasing has become extremely
> general. I believe "i+1+1" is an alias for "i+2" under this definition.

No, "i+1+1" isn't an alias in itself. It's an expression - to be an
alias, it would have to be a reference to something.

However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
important - 1+1 is replacable by 2 in every context, so this is
essentially the same as saying "a[i+2] is an alias of a[i+2]", which is
vacuously true.

There's another aspect here. If two expressions are always aliases to
the same mutable, that's usually easy to determine; this kind of
aliasing is usually not much of a problem.
What's more of a problem are those cases where there's occasional
aliasing. I.e. a[i] and a[j] may or may not be aliases of each other,
depending on the current value of i and j, and *that* is a problem
because the number of code paths to be tested doubles. It's even more of
a problem because testing with random data will usually not uncover the
case where the aliasing actually happens; you have to go around and
construct test cases specifically for the code paths that have aliasing.
Given that references may cross abstraction barriers (actually that's
often the purpose of constructing a reference in the first place), this
means you have to look for your test cases at multiple levels of
software abstraction, and *that* is really, really bad.

> That is so general that I am concerned it has lost its ability to
> identify problems specific to pointers.

If the reference to "pointers" above means "references", then I don't
know about any pointer problems that cannot be reproduced, in one form
or the other, in any of the other aliasing mechanisms.

> Again, by generalizing the term this far, I am concerned with a
> loss of precision. If "joe" in the prolog is a references, then
> "reference" is just a term for "data" that is being used in a
> certain way. The conection with a specfic address space
> has been lost in favor of the full domain of the datatype.

Aliasing is indeed a more general idea that goes beyond address spaces.

However, identity and aliasing can be defined in fully abstract terms,
so I welcome this opportunity to get rid of a too-concrete model.

>> The records still have identities. It's possible to have two WHERE
>> clauses that refer to the same record, and if you update the record
>> using one WHERE clause, the record returned by the other WHERE clause
>> will have changed, too.
>
> Is this any different from saying that an expression that includes
> a variable will produce a different value if the variable changes?

Yes.
Note that the WHERE clause properly includes array indexing (just set up
a table that has continuous numeric primary key, and a single other column).

I.e. I'm not talking about how a[i] is an alias of a[i+1] after updating
i, I'm talking about how a[i] may be an alias of a[j].

> It seems odd to me to suggest that "i+1" has identity.

It doesn't (unless it's passed around as a closure, but that's
irrelevant to this discussion).
"i" does have identity. "a[i]" does have identity. "a[i+1]" does have
identity.
Let me say that for purposes of this discussion, if it can be assigned
to (or otherwise mutated), it has identity. (We *can* assign identity to
immutable things, but it's equivalent to equality and not interesting
for this discussion.)

> I can see
> that i has identity, but I would say that "i+1" has only value.

Agreed.

> But perhaps the ultimate upshoot of this thread is that my use
> of terminology is nonstandard.

It's somewhat restricted, but not really nonstandard.

>> Possibly. There are so many isolation levels that I have to look them up
>> whenever I want to get the terminology 100% correct.
>
> Hmmm. Is it that there are so many, or that they are simply not
> part of our daily concern?

I guess it's the latter. IIRC there are four or five isolation levels.

> It seems to me there are more different
> styles of parameter passing than there are isolation levels, but
> I don't usually see (competent) people (such as yourself) getting
> call-by-value confused with call-by-reference.

Indeed.
Though the number of parameter passing mechanisms isn't that large
anyway. Off the top of my head, I could recite just three (by value, by
reference, by name aka lazy), the rest are just combinations with other
concepts (in/out/through, most notably) or a mapping to implementation
details (by reference vs. "pointer by value" in C++, for example).

Regards,
Jo
 
 
Marshall





PostPosted: 2006-7-16 10:36:00 Top

java-programmer >> What is a type error? Joachim Durchholz wrote:
> Marshall schrieb:
> > Joachim Durchholz wrote:
> >> As I said elsewhere, the record has an identity even though it isn't
> >> explicit in SQL.
> >
> > Hmmmm. What can this mean?
> >
> > In general, I feel that "records" are not the right conceptual
> > level to think about.
>
> They are, when it comes to aliasing of mutable data. I think it's
> justified by the fact that aliased mutable data has a galling tendency
> to break abstraction barriers. (More on this on request.)

I can see how pointers, or other kinds of aliasing
(by my more restricted definition) break abstraction barries;
it is hard to abstract something that can change out from
under you uncontrollably.


> > In any event, I am not sure what you mean by non-explicit
> > identity.
>
> The identity isn't visible from inside SQL. (Unless there's an OID
> facility available, which *is* an explicit identity.)

Agreed about OIDs.


> > I would say, records in SQL have value, and their
> > identity is exactly their value.
>
> Definitely not. You can have two equal records and update just one of
> them, yielding non-equal records; by my definition (and by intuition),
> this means that the records were equal but not identical.

This sort of thing comes up when one has made the mistake of
not defining any keys on one's table and needs to rectify the
situation. It is in fact considered quite a challenge to do.
My preferred technique for such a situation is to create
a new table with the same columns and SELECT DISTINCT ...
INTO ... then recreate the original table. So that way doesn't
fit your model.

How else might you do it? I suppose there are nonstandard
extensions, such as UPDATE ... LIMIT. And in fact there
might be some yucky thing that made it in to the standard
that will work.

But what you descrbe is certainly *not* possible in the
relational algebra; alas that SQL doesn't hew closer
to it. Would you agree? Would you also agree that
if a language *did* adhere to relation semantics,
then relation elements would *not* have identity?
(Under such a language, one could not have two
equal records, for example.)

[The above paragraph is the part of this post that I
really care about; feel free to ignore the rest if it's naive
or annoying or whatever, but please do respond to this.]


> > I do not see that they have
> > any identity outside of their value. We can uniquely identify
> > any particular record via a key, but a table may have more
> > than one key, and an update may change the values of one
> > key but not another. So it is not possible in general to
> > definitely and uniquely assign a mapping from each record
> > of a table after an update to each record of the table before
> > the update, and if you can't do that, then where
> > is the record identity?
>
> Such a mapping is indeed possible. Simply extend the table with a new
> column, number the columns consecutively, and identify the records via
> that column.

That doesn't work for me. It is one thing to say that for all
tables T, for all elements e in T, e has identity. It is a different
thing to say that for all tables T, there exists a table T' such
that for all elements e in T', e has identity.


> But even if you don't do that, there's still identity. It is irrelevant
> whether the programs can directly read the value of the identity field;
> the adverse effects happen because updates are in-place. (If every
> update generated a new record, then we'd indeed have no identity.)
>
> > Okay. At this point, though, the term aliasing has become extremely
> > general. I believe "i+1+1" is an alias for "i+2" under this definition.
>
> No, "i+1+1" isn't an alias in itself. It's an expression - to be an
> alias, it would have to be a reference to something.
>
> However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
> important - 1+1 is replacable by 2 in every context, so this is
> essentially the same as saying "a[i+2] is an alias of a[i+2]", which is
> vacuously true.

To me, the SQL with the where clause is like i+2, not like a[2].
A query is simply an expression that makes mention of a variable.
However I would have to admit that if SQL table elements have
identity, it would be more like the array example.

(The model of SQL I use in my head is one with set semantics,
and I am careful to avoid running in to bag semantics by, for
example, always defining at least one key, and using SELECT
DISTINCT where necessary. But it is true that the strict set
semantics in my head are not the wobbly bag semantics in
the standard.)


> There's another aspect here. If two expressions are always aliases to
> the same mutable, that's usually easy to determine; this kind of
> aliasing is usually not much of a problem.
> What's more of a problem are those cases where there's occasional
> aliasing. I.e. a[i] and a[j] may or may not be aliases of each other,
> depending on the current value of i and j, and *that* is a problem
> because the number of code paths to be tested doubles. It's even more of
> a problem because testing with random data will usually not uncover the
> case where the aliasing actually happens; you have to go around and
> construct test cases specifically for the code paths that have aliasing.
> Given that references may cross abstraction barriers (actually that's
> often the purpose of constructing a reference in the first place), this
> means you have to look for your test cases at multiple levels of
> software abstraction, and *that* is really, really bad.
>
> > That is so general that I am concerned it has lost its ability to
> > identify problems specific to pointers.
>
> If the reference to "pointers" above means "references", then I don't
> know about any pointer problems that cannot be reproduced, in one form
> or the other, in any of the other aliasing mechanisms.

It seems we agree that, for example, raw C pointers can easily
cause aliasing problems. To me, Java references, which are much
more tightly controlled, are still subject to aliasing problems,
although
in practice the situation is significantly less severe. (It is possible
to have multiple references to a single Java mutable object and
not know it.) I would assume the same is posible with, say, SML
refs.

However the situation in SQL strikes me as being different.
If one is speaking of base tables, and one has two queries,
one has everything one needs to know simply looking at
the queries. If views are involved, one must further examine
the definition of the view.

For example, we might ask in C, if we update a[i], will
the value of b[j] change? And we can't say, because
we don't know if there is aliasing. But in Fortran, we
can say that it won't, because they are definitely
different variables. But even so in Fortran, if we ask
if we update a[i], will a[j] change, and we know we
can't tell unless we know the values of i and j.

In SQL, if we update table T, will a SELECT on table
S change? We know it won't; they are different variables.
But if we update T, will a select on T change? It
certainly might, but I wouldn't call it aliasing, because
the two variables involved are the same variable, T.


> > Again, by generalizing the term this far, I am concerned with a
> > loss of precision. If "joe" in the prolog is a references, then
> > "reference" is just a term for "data" that is being used in a
> > certain way. The conection with a specfic address space
> > has been lost in favor of the full domain of the datatype.
>
> Aliasing is indeed a more general idea that goes beyond address spaces.
>
> However, identity and aliasing can be defined in fully abstract terms,
> so I welcome this opportunity to get rid of a too-concrete model.
>
> >> The records still have identities. It's possible to have two WHERE
> >> clauses that refer to the same record, and if you update the record
> >> using one WHERE clause, the record returned by the other WHERE clause
> >> will have changed, too.
> >
> > Is this any different from saying that an expression that includes
> > a variable will produce a different value if the variable changes?
>
> Yes.
> Note that the WHERE clause properly includes array indexing (just set up
> a table that has continuous numeric primary key, and a single other column).

I am still not convinced. It is not sufficient to show how sometimes
you can establish identity, or how you could establish identity based
on some related table; it must be shown how the identity can be
established in general, and I don't think it can. If it can't, then
I claim we are back to expressions that include a variable, and
not aliasing, even by the wider definition.



> I.e. I'm not talking about how a[i] is an alias of a[i+1] after updating
> i, I'm talking about how a[i] may be an alias of a[j].
>
> > It seems odd to me to suggest that "i+1" has identity.
>
> It doesn't (unless it's passed around as a closure, but that's
> irrelevant to this discussion).
> "i" does have identity. "a[i]" does have identity. "a[i+1]" does have
> identity.
> Let me say that for purposes of this discussion, if it can be assigned
> to (or otherwise mutated), it has identity. (We *can* assign identity to
> immutable things, but it's equivalent to equality and not interesting
> for this discussion.)
>
> > I can see
> > that i has identity, but I would say that "i+1" has only value.
>
> Agreed.

Cool.


> > But perhaps the ultimate upshoot of this thread is that my use
> > of terminology is nonstandard.
>
> It's somewhat restricted, but not really nonstandard.

Whew!


> >> Possibly. There are so many isolation levels that I have to look them up
> >> whenever I want to get the terminology 100% correct.
> >
> > Hmmm. Is it that there are so many, or that they are simply not
> > part of our daily concern?
>
> I guess it's the latter. IIRC there are four or five isolation levels.

Yes, four.


> > It seems to me there are more different
> > styles of parameter passing than there are isolation levels, but
> > I don't usually see (competent) people (such as yourself) getting
> > call-by-value confused with call-by-reference.
>
> Indeed.
> Though the number of parameter passing mechanisms isn't that large
> anyway. Off the top of my head, I could recite just three (by value, by
> reference, by name aka lazy), the rest are just combinations with other
> concepts (in/out/through, most notably) or a mapping to implementation
> details (by reference vs. "pointer by value" in C++, for example).

Fair enough. I could only remember three, but I was sure someone
else could name more. :-)


Marshall

 
 
Joachim Durchholz





PostPosted: 2006-7-16 17:01:00 Top

java-programmer >> What is a type error? Marshall schrieb:
> Joachim Durchholz wrote:
>> Marshall schrieb:
>>> I would say, records in SQL have value, and their
>>> identity is exactly their value.
>>
>> Definitely not. You can have two equal records and update just one of
>> them, yielding non-equal records; by my definition (and by intuition),
>> this means that the records were equal but not identical.
>
> This sort of thing comes up when one has made the mistake of
> not defining any keys on one's table and needs to rectify the
> situation. It is in fact considered quite a challenge to do.
> My preferred technique for such a situation is to create
> a new table with the same columns and SELECT DISTINCT ...
> INTO ... then recreate the original table. So that way doesn't
> fit your model.

I was mentioning multiple equal records just as an example where the
records have an identity that's independent of the record values.

> How else might you do it? I suppose there are nonstandard
> extensions, such as UPDATE ... LIMIT. And in fact there
> might be some yucky thing that made it in to the standard
> that will work.

Right, it might be difficult to update multiple records that are exactly
the same. It's not what SQL was intended for, and difficult to do anyway
- I was thinking of LIMIT (I wasn't aware that it's nonstandard), and I
agree that there may be other ways to do it.

However, I wouldn't overvalue that case. The Jane/John Doe example
posted elsewhere in this thread is strictly within the confines of what
SQL was built for, yet there is aliasing.

> But what you descrbe is certainly *not* possible in the
> relational algebra; alas that SQL doesn't hew closer
> to it. Would you agree?

Yup, SQL (particularly its update semantics) aren't relational semantics.
Still, it's SQL we've been talking about.

> Would you also agree that
> if a language *did* adhere to relation semantics,
> then relation elements would *not* have identity?
> (Under such a language, one could not have two
> equal records, for example.)

Any identity that one could define within relational semantics would be
just equality, so no, it wouldn't be very helpful (unless as part of a
greater framework).
However, that's not really a surprise. Aliasing becomes relevant (even
observable) only when there's updates, and relational algebra doesn't
have updates.

>> > I do not see that they have
>>> any identity outside of their value. We can uniquely identify
>>> any particular record via a key, but a table may have more
>>> than one key, and an update may change the values of one
>>> key but not another. So it is not possible in general to
>>> definitely and uniquely assign a mapping from each record
>>> of a table after an update to each record of the table before
>>> the update, and if you can't do that, then where
>>> is the record identity?
>> Such a mapping is indeed possible. Simply extend the table with a new
>> column, number the columns consecutively, and identify the records via
>> that column.
>
> That doesn't work for me. It is one thing to say that for all
> tables T, for all elements e in T, e has identity. It is a different
> thing to say that for all tables T, there exists a table T' such
> that for all elements e in T', e has identity.

Let me continue that argument:
Records in T' have identity.
From an SQL point-of-view, there's no difference between the records in
T' and records in other tables, so they must share all properties, in
particular that of having an identity.

(I agree that's not as convincing as seeing aliasing in action. See below.)

>> But even if you don't do that, there's still identity. It is irrelevant
>> whether the programs can directly read the value of the identity field;
>> the adverse effects happen because updates are in-place. (If every
>> update generated a new record, then we'd indeed have no identity.)
>>
>>> Okay. At this point, though, the term aliasing has become extremely
>>> general. I believe "i+1+1" is an alias for "i+2" under this definition.
>> No, "i+1+1" isn't an alias in itself. It's an expression - to be an
>> alias, it would have to be a reference to something.
>>
>> However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
>> important - 1+1 is replacable by 2 in every context, so this is
>> essentially the same as saying "a[i+2] is an alias of a[i+2]", which is
>> vacuously true.
>
> To me, the SQL with the where clause is like i+2, not like a[2].

I'd say WHERE is like [i+2]: neither is valid on its own, it's the
"selector" part of a reference into a table.

> A query is simply an expression that makes mention of a variable.
> However I would have to admit that if SQL table elements have
> identity, it would be more like the array example.

OK.

> (The model of SQL I use in my head is one with set semantics,
> and I am careful to avoid running in to bag semantics by, for
> example, always defining at least one key, and using SELECT
> DISTINCT where necessary. But it is true that the strict set
> semantics in my head are not the wobbly bag semantics in
> the standard.)

Set-vs.-bag doesn't affect SQL's aliasing in general, it was a specific
property of the example that I chose. So you don't have to worry wether
that might be the cause of the misunderstanding.

>>> That is so general that I am concerned it has lost its ability to
>>> identify problems specific to pointers.
>>
>> If the reference to "pointers" above means "references", then I don't
>> know about any pointer problems that cannot be reproduced, in one form
>> or the other, in any of the other aliasing mechanisms.
>
> It seems we agree that, for example, raw C pointers can easily
> cause aliasing problems. To me, Java references, which are much
> more tightly controlled, are still subject to aliasing problems,
> although
> in practice the situation is significantly less severe. (It is possible
> to have multiple references to a single Java mutable object and
> not know it.)

Exactly.

> I would assume the same is posible with, say, SML refs.

So do I.

> However the situation in SQL strikes me as being different.
> If one is speaking of base tables, and one has two queries,
> one has everything one needs to know simply looking at
> the queries. If views are involved, one must further examine
> the definition of the view.

Sure. Usually, SQL itself is enough abstract that no additional
abstraction happens; so even if you have aliasing, you usually know
about it.
I.e. when compared to what you were saying about Java: "It is possible
to have multiple references to a single Java mutable object and
not know it.", the "and not know it" part is usually missing.

The picture changes as soon as the SQL statements get hidden behind
layers of abstraction, say result sets and prepared query handles.

... Um... let me also restate the preconditions for aliasing become a
problem. You need three elements for that:
1) Identities that can be stored in multiple places.
2) Updates.
3) Abstraction (of updatable entities).

Without identities, there cannot be aliasing.
Without updates, the operation that makes aliasing problematic is excluded.
Without abstraction, you may have aliasing but can clearly identify it,
and don't have a problem.

I think the roadblock you're hitting is that you keep looking for
nonlocal trouble to identify spots where SQL might have aliasing; since
SQL isn't used with abstraction often, you come up empty and conclude
there's no aliasing.
My position is that not all aliasing is evil, and that there's a
technical definition: two program entities (l-expressions in C parlance)
are aliases if changing the referenced data through one also changes the
other.

> For example, we might ask in C, if we update a[i], will
> the value of b[j] change? And we can't say, because
> we don't know if there is aliasing. But in Fortran, we
> can say that it won't, because they are definitely
> different variables.

Even that may be the case.
There's an EQUIVALENCE statement that can explicitly alias array sections.
Also, I suspect that newer versions of Fortran have reference parameters.

> But even so in Fortran, if we ask
> if we update a[i], will a[j] change, and we know we
> can't tell unless we know the values of i and j.

In fact, that's the usual form of aliasing in Fortran.

> In SQL, if we update table T, will a SELECT on table
> S change? We know it won't; they are different variables.
> But if we update T, will a select on T change? It
> certainly might, but I wouldn't call it aliasing, because
> the two variables involved are the same variable, T.

I fail to see how this is different from the aliasing in a[i] and a[j].

It's just the same kind of aliasing that I have with the statements
SELECT * FROM T WHERE key = $i
and
SELECT * FROM T WHERE key = $j
(where $i and $j are placeholders for variables from program variables).

After running
UPDATE T SET f = 5 WHERE key = $i
not only the first SELECT may give different results, the second may,
too (in those cases where $i and $j happen to have the same value).


I can also construct pointer-like aliasing in SQL. If I have
SELECT * FROM employees WHERE country = "US"
and
SELECT * FROM employees WHERE name = "Doe"
then somebody with a grudge against all US employees running
UPDATE employees SET salary = 0 WHERE country = "US"
will (on purpose or not) also modify the results of the second query.

Not a problem while all three queries are in plain view, but imagine
them wrapped in some opaque object or behind a module interface.

> I am still not convinced. It is not sufficient to show how sometimes
> you can establish identity, or how you could establish identity based
> on some related table; it must be shown how the identity can be
> established in general, and I don't think it can. If it can't, then
> I claim we are back to expressions that include a variable, and
> not aliasing, even by the wider definition.

See above.

I admit that identity cannot be reliably established in SQL. The
identity of a record isn't available (unless via OID extensions).
However, it is there and it can have effect.

Translated to real life, "my best friend" and "my employer" may be
referring to the same identity, even if these descriptions may change in
the future.

>> > It seems to me there are more different
>>> styles of parameter passing than there are isolation levels, but
>>> I don't usually see (competent) people (such as yourself) getting
>>> call-by-value confused with call-by-reference.
>> Indeed.
>> Though the number of parameter passing mechanisms isn't that large
>> anyway. Off the top of my head, I could recite just three (by value, by
>> reference, by name aka lazy), the rest are just combinations with other
>> concepts (in/out/through, most notably) or a mapping to implementation
>> details (by reference vs. "pointer by value" in C++, for example).
>
> Fair enough. I could only remember three, but I was sure someone
> else could name more. :-)

I got a private mail naming more parameter passing mechanisms. It's a
bit difficult define what's a parameter passing mechanism and what's
just an attribute for such a mechanism. If you count all combinations as
an extra mechanism, you can easily get dozens.

Regards,
Jo
 
 
Chris F Clark





PostPosted: 2006-7-17 0:02:00 Top

java-programmer >> What is a type error? "Marshall" <email***@***.com> wrote:
> In general, I feel that "records" are not the right conceptual
> level to think about.

Unfortunately, they are the right level. Actually,the right level
might even be lower, the fields within a record, but that's moving
even farther away from the direction you wish to go. The right level
is the lowest level at which the programmer can see and manipulate
values. Thus, since a programmer can see and manipulate fields within
records in SQL that is the level we need to concern ourselves with
when talking about aliasing in SQL. In other words, since a SELECT
(or UPDATE) statement can talk about specific fields of specific
records within the table (not directly, but reliably, which I'll
explain in a moment) then those are the "primitive objects" in SQL
that can be aliased.

What I meant by "not directly, but reliably":

If you have a fixed database, and you do two selects which specify the
same sets of fields to be selected and the same keys to select records
with, one expects the two selects to return the same values. You do
not necessarily expect the records to be returned in the same order,
but you do expect the same set of records to be returned and the
records to have the same values in the same fields. Further, if you
do an update, you expect certain fields of certain records to change
(and be reflected in subsequent selects).

However, therein lies the rub, if you do a select on some records, and
then an update that changes those records, the records you have from
the select have either changed or show outdated values. If there is
some way to refer to the records you first selected before the update,
then you have an aliasing problem, but maybe one can't do that. One
could also have an aliasing problem, if one were allowed to do two
updates simultaneously, so that one update could changed records in
the middle of the other changing the records. However, I don't know
if that's allowed in the relational algebra either. (I think I finally
see your point, and more on that later.)

> I do not see that they have
> any identity outside of their value. We can uniquely identify
> any particular record via a key, but a table may have more
> than one key, and an update may change the values of one
> key but not another. So it is not possible in general to
> definitely and uniquely assign a mapping from each record
> of a table after an update to each record of the table before
> the update, and if you can't do that, then where
> is the record identity?

I don't know if your point of having two keys within one table amkes a
difference. If one can uniquely identify a record, then it has
identity. The fact that there is another mapping where one may not be
able to uniquely identify the record is not necessarily relevant.

> But what you descrbe is certainly *not* possible in the
> relational algebra; alas that SQL doesn't hew closer
> to it. Would you agree? Would you also agree that
> if a language *did* adhere to relation semantics,
> then relation elements would *not* have identity?
> (Under such a language, one could not have two
> equal records, for example.)

Ok, here I think I will explain my understanding of your point. If we
treat each select statement and each update statements as a separate
program, i.e. we can't save records from one select statement to a
later one (and I think in the relational algebra one cannot), then
each single (individual) select statement or update statement is a
functional program. That is we can view a single select statement as
doing a fold over some incoming data, but it cannot change the data.
Similarly, we can view an update statement as creating a new copy of
the table with mutations in it, but the update statement can only
refer to the original immutable table that was input. (Well, that's
atleast how I recall the relational algebra.) It is the second point,
about the semantics of the update statement which is important. There
are no aliasing issues because in the relational algebra one cannot
refer to the mutated values in an update statement, every reference in
an update statement is refering to the unmutated input (again, that's
my recollection). (Note, if my recollection is off, and one can refer
to the mutated records in an update statement, perhaps they can
explain why aliasing is not an issue in it.)

Now for some minor points:

> For example, we might ask in C, if we update a[i], will
> the value of b[j] change? And we can't say, because
> we don't know if there is aliasing. But in Fortran, we
> can say that it won't, because they are definitely
> different variables.

Unfortunately, your statement about FORTRAN is not true, if a and b
are parameters to a subroutine (or members of a common block, or
equivalenced, or ...), then an update of a[i] might change b[j].

This is in fact, an important point, it is in subroutines (and
subroutine like things), where we have local names for things
(bindings) that are actually names for things which may or may not be
distinct, that aliasing is the primary issue. I don't recall the
relational algebra having subroutines, and even if it did they would
be unimportant, given that one could not refer within a single update
statement to the records that a subroutine changed (since one cannot
refer in an update statement to any record which has been changed, see
the caveats above).

Joachim Durchholz wrote: > >
> > Indeed.
> > Though the number of parameter passing mechanisms isn't that large
> > anyway. Off the top of my head, I could recite just three (by value, by
> > reference, by name aka lazy), the rest are just combinations with other
> > concepts (in/out/through, most notably) or a mapping to implementation
> > details (by reference vs. "pointer by value" in C++, for example).

Marshall again:
> Fair enough. I could only remember three, but I was sure someone
> else could name more. :-)

There actual are some more, but very rare, for example there was call-
by-text-string, which is sort of like call-by-name, but allows a
parameter to reach into the calling routine and mess with local
variables therein. Most of the rare ones have really terrible
semantics, and are perhaps best forgotten except to keep future
implementors from choosing them. For example, call-by-text-string is
really easy to implement in a naive interpreter that reparses each
statement at every invocation, but but it exposes the implementation
properties so blatantly that writing a different interpretor is nearly
impossible.

If I recall correctly, there are also some call-by- methods that take
into account networks and addressing space issues--call-by-reference
doesn't work if one cannot see the thing being referenced. Of coruse,
one must then ask whether one considers "remote-procedure-call" and/or
message-passing to be the same thing as a local call.

One last nit, Call-by-value is actually call-by-copy-in-copy-out when
generalized.

There are some principles that one can use to organize the parameter
passing methods. However, the space is much richer than people
commonly know about (and that is actually a good thing, that most
people aren't aware of the richness, simplicity is good).

-Chris
 
 
Dr.Ruud





PostPosted: 2006-7-17 0:51:00 Top

java-programmer >> What is a type error? Chris F Clark schreef:

> If you have a fixed database, and you do two selects which specify the
> same sets of fields to be selected and the same keys to select records
> with, one expects the two selects to return the same values.

When your "fixed" means read-only, or (fully) locked, then yes.

Modern databases also have modes without locks, so without special
attention (like maybe your "fixed") the two selects do not necessarily
return the same set of records.


> Further, if you
> do an update, you expect certain fields of certain records to change
> (and be reflected in subsequent selects).

Doesn't your "fixed" block updates?


> However, therein lies the rub, if you do a select on some records, and
> then an update that changes those records, the records you have from
> the select have either changed or show outdated values.

Not necessarily outdated: values can be fixed in time for a purpose.
Example: calculating interest is done once per day, but the whole
process takes more than a day.

Some systems implemented a lock plus requery just before the update, to
check for unacceptable changes in the stored data; this to prevent
having to keep locks while waiting.


> If there is
> some way to refer to the records you first selected before the update,
> then you have an aliasing problem, but maybe one can't do that. One
> could also have an aliasing problem, if one were allowed to do two
> updates simultaneously, so that one update could changed records in
> the middle of the other changing the records.

Some databases allow you to travel back in time: run this query on the
data of 1 year ago. All previous values are kept "behind" the current
value.

--
Affijn, Ruud

"Gewoon is een tijger."


 
 
Marshall





PostPosted: 2006-7-17 1:29:00 Top

java-programmer >> What is a type error? Joachim Durchholz wrote:
> Marshall schrieb:
>
> > But what you descrbe is certainly *not* possible in the
> > relational algebra; alas that SQL doesn't hew closer
> > to it. Would you agree?
>
> Yup, SQL (particularly its update semantics) aren't relational semantics.
> Still, it's SQL we've been talking about.
>
> > Would you also agree that
> > if a language *did* adhere to relation semantics,
> > then relation elements would *not* have identity?
> > (Under such a language, one could not have two
> > equal records, for example.)
>
> Any identity that one could define within relational semantics would be
> just equality, so no, it wouldn't be very helpful (unless as part of a
> greater framework).
> However, that's not really a surprise. Aliasing becomes relevant (even
> observable) only when there's updates, and relational algebra doesn't
> have updates.

Good point. Perhaps I should have said "relational algebra +
variables with assignment." It is interesting to consider
assignment vs. the more restricted update operators: insert,
update, delete. I think I see what you mean about the restricted
update operators working on identity.


> >>> Okay. At this point, though, the term aliasing has become extremely
> >>> general. I believe "i+1+1" is an alias for "i+2" under this definition.
> >> No, "i+1+1" isn't an alias in itself. It's an expression - to be an
> >> alias, it would have to be a reference to something.
> >>
> >> However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
> >> important - 1+1 is replacable by 2 in every context, so this is
> >> essentially the same as saying "a[i+2] is an alias of a[i+2]", which is
> >> vacuously true.
> >
> > To me, the SQL with the where clause is like i+2, not like a[2].
>
> I'd say WHERE is like [i+2]: neither is valid on its own, it's the
> "selector" part of a reference into a table.

Is it possible you're being distracted by the syntax? WHERE is a
binary operation taking a relation and a filter function. I don't
think of filters as being like array indexing; do they appear
analogous to you? (Always a difficult question because often
times A and B share some qualities and not others, and
what does one call that?)


> > However the situation in SQL strikes me as being different.
> > If one is speaking of base tables, and one has two queries,
> > one has everything one needs to know simply looking at
> > the queries. If views are involved, one must further examine
> > the definition of the view.
>
> Sure. Usually, SQL itself is enough abstract that no additional
> abstraction happens; so even if you have aliasing, you usually know
> about it.
> I.e. when compared to what you were saying about Java: "It is possible
> to have multiple references to a single Java mutable object and
> not know it.", the "and not know it" part is usually missing.

Yes!


> The picture changes as soon as the SQL statements get hidden behind
> layers of abstraction, say result sets and prepared query handles.
>
> ... Um... let me also restate the preconditions for aliasing become a
> problem. You need three elements for that:
> 1) Identities that can be stored in multiple places.
> 2) Updates.
> 3) Abstraction (of updatable entities).
>
> Without identities, there cannot be aliasing.
> Without updates, the operation that makes aliasing problematic is excluded.
> Without abstraction, you may have aliasing but can clearly identify it,
> and don't have a problem.

Aha! That description works very well for me.


> I admit that identity cannot be reliably established in SQL. The
> identity of a record isn't available (unless via OID extensions).
> However, it is there and it can have effect.

Yes, I think I see what you mean now. This is in part a consequence
of the restricted update operators.

Thanks,

Marshall

 
 
Marshall





PostPosted: 2006-7-17 1:36:00 Top

java-programmer >> What is a type error? Chris F Clark wrote:
> "Marshall" <email***@***.com> wrote:
> > In general, I feel that "records" are not the right conceptual
> > level to think about.
>
> Unfortunately, they are the right level. Actually,the right level
> might even be lower, the fields within a record, but that's moving
> even farther away from the direction you wish to go. The right level
> is the lowest level at which the programmer can see and manipulate
> values.

But how is this not always "the bit"?



> > I do not see that they have
> > any identity outside of their value. We can uniquely identify
> > any particular record via a key, but a table may have more
> > than one key, and an update may change the values of one
> > key but not another. So it is not possible in general to
> > definitely and uniquely assign a mapping from each record
> > of a table after an update to each record of the table before
> > the update, and if you can't do that, then where
> > is the record identity?
>
> I don't know if your point of having two keys within one table amkes a
> difference. If one can uniquely identify a record, then it has
> identity. The fact that there is another mapping where one may not be
> able to uniquely identify the record is not necessarily relevant.

The issue with two keys is that the keys may not *agree* on the
mapping before vs. after the update.


> > For example, we might ask in C, if we update a[i], will
> > the value of b[j] change? And we can't say, because
> > we don't know if there is aliasing. But in Fortran, we
> > can say that it won't, because they are definitely
> > different variables.
>
> Unfortunately, your statement about FORTRAN is not true, if a and b
> are parameters to a subroutine (or members of a common block, or
> equivalenced, or ...), then an update of a[i] might change b[j].

Ah, common blocks. How that takes me back!

But your point is a good one; I oversimplified.


> Marshall again:
> > Fair enough. I could only remember three, but I was sure someone
> > else could name more. :-)
>
> There actual are some more, but very rare, for example there was call-
> by-text-string, which is sort of like call-by-name, but allows a
> parameter to reach into the calling routine and mess with local
> variables therein. Most of the rare ones have really terrible
> semantics, and are perhaps best forgotten except to keep future
> implementors from choosing them. For example, call-by-text-string is
> really easy to implement in a naive interpreter that reparses each
> statement at every invocation, but but it exposes the implementation
> properties so blatantly that writing a different interpretor is nearly
> impossible.
>
> If I recall correctly, there are also some call-by- methods that take
> into account networks and addressing space issues--call-by-reference
> doesn't work if one cannot see the thing being referenced. Of coruse,
> one must then ask whether one considers "remote-procedure-call" and/or
> message-passing to be the same thing as a local call.
>
> One last nit, Call-by-value is actually call-by-copy-in-copy-out when
> generalized.
>
> There are some principles that one can use to organize the parameter
> passing methods. However, the space is much richer than people
> commonly know about (and that is actually a good thing, that most
> people aren't aware of the richness, simplicity is good).


Fair enough.


Marshall

 
 
Chris Smith





PostPosted: 2006-7-17 2:29:00 Top

java-programmer >> What is a type error? We Chris's stick together, as always.

Marshall <email***@***.com> wrote:
> > Unfortunately, they are the right level. Actually,the right level
> > might even be lower, the fields within a record, but that's moving
> > even farther away from the direction you wish to go. The right level
> > is the lowest level at which the programmer can see and manipulate
> > values.
>
> But how is this not always "the bit"?

First of all, we should be consistent in speaking about the logical
language semantics, which may not include bits anyway.

That said, if bits were the primitive concept of data in a language,
then we'd be able to get away with talking about higher-level concepts
because we agree to always manipulate a larger structure (a byte, for
example) as a pure value. If we were using bitwise operators, then we
wouldn't be able to get away with it, for example. That said, it's also
quite possible to consider aliasing on higher levels as well; it's just
not possible to point out the lack of aliasing for higher levels of
abstraction, and thus conclude that no aliasing exists. Aliasing is
still possible for entities within those layers of abstraction.

--
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation
 
 
Chris Smith





PostPosted: 2006-7-17 2:45:00 Top

java-programmer >> What is a type error? Marshall <email***@***.com> wrote:
> Is it possible you're being distracted by the syntax? WHERE is a
> binary operation taking a relation and a filter function. I don't
> think of filters as being like array indexing; do they appear
> analogous to you? (Always a difficult question because often
> times A and B share some qualities and not others, and
> what does one call that?)

This is definitely a good question. Nevertheless, I think we're fooling
ourselves if we decide that we've made progress by defining part of the
problem out of existence. The original problem was that when there are
several ways of getting access to something that is identical (rather
than just equal), this causes problems for invariant checking. These
several ways of accessing data are being called 'aliasing' -- I have no
comment on whether that's standard usage or not, because I don't know.
I suspect that aliasing is normally used for something closer to the
physical level. The point is that whatever "aliasing" really means,
what we're discussing is having two paths by which one may access
identical data.

So there are infinitely complex ways in which this can occur. I can
have pointers that are aliased. I can have integers, which by
themselves are just plain values, but can alias each other as indices
into an array. I can have strings that do the same thing for an
associative array. I can, of course, have arbitrarily more complex data
structures with exactly the same issue. I can also have two different
WHERE clauses, which when applied to the same relation yield exactly the
same set of tuples. The problem arises when code is written to update
some value (or set of tuples, in the SQL case, since definitions differ
there) using one pointer/int/string/WHERE clause/etc, and at the same
time an invariant was written using the other pointer/int/WHERE/etc, the
result is that either of:

a) The compiler has to be able to prove that the two are not aliased

b) The compiler has to re-evaluate the invariant from scratch when this
operation is performed.

(Yeah, there's actually a whole continuum between a and b, based on more
limited abilities of the compiler to prove things. I'm oversimplifying,
and I think it's rather benign in this case.)

So in this sense, a WHERE clause is a binary operation in exactly the
same way that an array indexing expression is a binary operation, and
both are capable of aliasing in at least this logical sense.

Now it's certainly true that languages with unrestricted pointers are
less successful at limiting the scope of re-evaluation of invariants.
In the array or SQL relation cases, there's a limited set of value/tuple
modifications that might cause the invariant to need rechecking
(specifically those that point to the same array, or to the relation or
one of its views). A language with unrestricted untyped pointers, if it
doesn't get lucky with the capture analysis, may have to re-evaluate all
invariants in the application on any given assignment. Nevertheless,
the re-evaluation of the invariant still needs to be done.

--
Chris Smith - Lead Software Developer /Technical Trainer
MindIQ Corporation
 
 
Joachim Durchholz





PostPosted: 2006-7-17 4:00:00 Top

java-programmer >> What is a type error? Marshall schrieb:
>
> Good point. Perhaps I should have said "relational algebra +
> variables with assignment." It is interesting to consider
> assignment vs. the more restricted update operators: insert,
> update, delete.

Actually I see it the other way round: assignment is strictly less
powerful than DML since it doesn't allow creating or destroying
variables, while UPDATE does cover assignment to fields.
(However, it's usually new+assignment+delete vs. INSERT+UPDATE+DELETE,
at which point there is not much of a difference.)

>>>>> Okay. At this point, though, the term aliasing has become extremely
>>>>> general. I believe "i+1+1" is an alias for "i+2" under this definition.
>>>> No, "i+1+1" isn't an alias in itself. It's an expression - to be an
>>>> alias, it would have to be a reference to something.
>>>>
>>>> However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
>>>> important - 1+1 is replacable by 2 in every context, so this is
>>>> essentially the same as saying "a[i+2] is an alias of a[i+2]", which is
>>>> vacuously true.
>>>
>>> To me, the SQL with the where clause is like i+2, not like a[2].
>> I'd say WHERE is like [i+2]: neither is valid on its own, it's the
>> "selector" part of a reference into a table.
>
> Is it possible you're being distracted by the syntax?

I think that's very, very unlikely ;-)

> WHERE is a
> binary operation taking a relation and a filter function. I don't
> think of filters as being like array indexing; do they appear
> analogous to you?

Yes.

Filters are just like array indexing: both select a subset of variables
from a collection. In SQL, you select a subset of a table, in a
programming language, you select a subset of an array.

(The SQL selection mechanism is far more flexible in the kinds of
filtering you can apply, while array indexing allows filtering just by
ordinal position. However, the relevant point is that both select things
that can be updated.)

>> I admit that identity cannot be reliably established in SQL. The
>> identity of a record isn't available (unless via OID extensions).
>> However, it is there and it can have effect.
>
> Yes, I think I see what you mean now. This is in part a consequence
> of the restricted update operators.

I don't think so. You can update SQL records any way you want.
The unavailability of a record identity is due to the fact that, well,
it's unavailable in standard SQL.

Regards,
Jo
 
 
Marshall





PostPosted: 2006-7-17 7:08:00 Top

java-programmer >> What is a type error? Joachim Durchholz wrote:
> Marshall schrieb:
> >
> > Good point. Perhaps I should have said "relational algebra +
> > variables with assignment." It is interesting to consider
> > assignment vs. the more restricted update operators: insert,
> > update, delete.
>
> Actually I see it the other way round: assignment is strictly less
> powerful than DML since it doesn't allow creating or destroying
> variables, while UPDATE does cover assignment to fields.

Oh, my.

Well, for all table variables T, there exists some pair of
values v and v', such that we can transition the value of
T from v to v' via assignment, but not by any single
insert, update or delete. The reverse is not true. I
would consider that a solid demonstration of which
one is more powerful. Further, it is my understanding
that your claim of row identity *depends* on the restricted
nature of DML; if the only mutator operation is assignment,
then there is definitely no record identity.


> (However, it's usually new+assignment+delete vs. INSERT+UPDATE+DELETE,
> at which point there is not much of a difference.)

I am not sure what this means. Delete can be expressed in
terms of assignment. (As can insert and update.) (Assignment
can also be expressed in terms of insert and delete.) I
don't know what "new" would be in a value-semantics, relational
world. So I would say it's assignment vs. INSERT+UPDATE+DELETE,
but perhaps I'm not understanding what you mean.

Assignment by itself is complete. Insert and delete together
are complete.


> >>>>> Okay. At this point, though, the term aliasing has become extremely
> >>>>> general. I believe "i+1+1" is an alias for "i+2" under this definition.
> >>>> No, "i+1+1" isn't an alias in itself. It's an expression - to be an
> >>>> alias, it would have to be a reference to something.
> >>>>
> >>>> However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
> >>>> important - 1+1 is replacable by 2 in every context, so this is
> >>>> essentially the same as saying "a[i+2] is an alias of a[i+2]", which is
> >>>> vacuously true.
> >>>
> >>> To me, the SQL with the where clause is like i+2, not like a[2].
> >> I'd say WHERE is like [i+2]: neither is valid on its own, it's the
> >> "selector" part of a reference into a table.
> >
> > Is it possible you're being distracted by the syntax?
>
> I think that's very, very unlikely ;-)

Well, I just thought I'd ask. :-)


> > WHERE is a
> > binary operation taking a relation and a filter function. I don't
> > think of filters as being like array indexing; do they appear
> > analogous to you?
>
> Yes.
>
> Filters are just like array indexing: both select a subset of variables
> from a collection.

I can't agree with this wording. A filter produces a collection
value from a collection value. I don't see how variables
enter in to it. One can filter either a collection constant or
a collection variable; if one speaks of filtering a collection
variable, on is really speaking of filtering the collection value
that the variable currently contains; filtering is not an operation
on the variable as such, the way the "address of" operator is.
Note you can't update the result of a filter.


> In SQL, you select a subset of a table, in a
> programming language, you select a subset of an array.
>
> (The SQL selection mechanism is far more flexible in the kinds of
> filtering you can apply, while array indexing allows filtering just by
> ordinal position. However, the relevant point is that both select things
> that can be updated.)

When you have been saying "select things that can be updated"
I have been assuming you meant that one can derive values
from variables, and that some other operation can update that
variable, causing the expression, if re-evaluated, to produce
a different value. However the phrase also suggests that
you mean that the *result* of the select can *itself* be
updated. Which one do you mean? (Or is there a third
possibility?)


> >> I admit that identity cannot be reliably established in SQL. The
> >> identity of a record isn't available (unless via OID extensions).
> >> However, it is there and it can have effect.
> >
> > Yes, I think I see what you mean now. This is in part a consequence
> > of the restricted update operators.
>
> I don't think so. You can update SQL records any way you want.
> The unavailability of a record identity is due to the fact that, well,
> it's unavailable in standard SQL.

I disagree. The concept of record-identity is quite tenuous, and
depends on specifics of SQL semantics. (In fact I'm not entirely
convinced it's definitely there even in SQL.) It certainly does not
hold in, say, set theory. Set elements do not have identity; they
only have value. If we have a set-semantics language that
supports set variables with assignment, we still do not have
element-identity.


Marshall

 
 
Chris Smith





PostPosted: 2006-7-17 7:19:00 Top

java-programmer >> What is a type error? David Hopwood <email***@***.com> wrote:
> Chris Smith wrote:
> > If checked by execution, yes. In which case, I am trying to get my head
> > around how it's any more true to say that functional languages are
> > compilable postconditions than to say the same of imperative languages.
>
> A postcondition must, by definition [*], be a (pure) assertion about the
> values that relevant variables will take after the execution of a subprogram.

Okay, my objection was misplaced, then, as I misunderstood the original
statement. I had understood it to mean that programs written in pure
functional languages carry no additional information except for their
postconditions.

--
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation
 
 
Chris Smith





PostPosted: 2006-7-17 8:24:00 Top

java-programmer >> What is a type error? Darren New <email***@***.com> wrote:
> I'm not sure what linear or uniqueness typing is. It's typestate, and if
> I remember correctly the papers I read 10 years ago, the folks at
> TJWatson that invented Hermes also invented the concept of typestate.
> They at least claim to have coined the term.

Coining the term is one thing, but I feel pretty confident that the idea
was not invented in 1986 with the Hermes language, but rather far
earlier. Perhaps they may have invented the concept of considering it
any different from other applications of types, though. I still have
trouble delineating how to consider "typestate" different from ordinary
types in formal terms, unless I make the formal model complex enough to
include some kind of programmer-defined identifiers such as variables.
The distinction is not at all relevant to common type system models
based on the lambda calculus.

While acknowledging, on the one hand, that the word "typestate" is used
to describe this, I also object that types have *always* been assigned
to expressions in differing type environments. Otherwise, it would be
impossible to assign types to lambda abstractions in the simply typed
lambda calculus, the simplest of all generally studied type systems.
What is being named here is the overcoming of a limitation that
programming language designers imposed upon themselves, whether from not
understanding the theoretical research or not believing it important, I
don't know.

--
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation
 
 
Zoltan Somogyi





PostPosted: 2006-7-17 12:46:00 Top

java-programmer >> What is a type error? Andreas Rossberg <email***@***.com> writes:
>Uh, aliasing all over the place! Actually, I think that logic
>programming, usually based on deep unification, brings by far the worst
>incarnation of aliasing issues to the table.

I agree that deep unification, as implemented in Prolog, brings with it
lots of aliasing problems. However, these problems have been recognized
from the seventies, and people have tried to solve them. There have
been a huge variety of mode systems added to various dialects of Prolog
over the years, which each attempt to tackle (various parts of) the aliasing
problem, none of them fully successfully in my view, since their designers
usually actually *wanted* to retain at least *some* of the expressive power
of the logic variable.

Some non-Prolog logic languages have departed from this approach, the earliest
being the Relational Language. To tie this message to another thread, the
Relational Language had a strict mode system that is as identical as possible
to the notion of typestate in NIL and Hermes given the limitations of
comparing declarative apples with imperative oranges, but significantly
earlier, 1979 vs 1986 IIRC.

Marshall wrote:
>I have explored the OO path to its bitter end and am
>convinced it is not the way. So what is left? Uniqueness
>types and logic programming, I suppose. I enjoy logic
>programming but it doesn't seem quite right. But notice:
>no pointers there! And it doesn't seem to suffer from the
>lack.

Of course, the main logic programming language today that disallows the use
of unification for arbitrary aliasing is Mercury. It enforces this through
a strong mode system. Our motivation for the design of this mode system
was precisely to eliminate the problems Andreas identified above. However
it has also turned out to be an excellent foundation for Mercury's notion of
unique modes, its equivalent of uniqueness types, which Mercury uses to
express I/O.

Zoltan Somogyi <email***@***.com> http://www.cs.mu.oz.au/~zs/
Department of Computer Science and Software Engineering, Univ. of Melbourne