[Prevayler-discussion] Prevayler and thread-safety

Discussion:

Sergey Didenko

2009-01-25 10:19:40 UTC

Moving our discussion to the mail list...

Hi,
I'm considering using Prevayler for a web application. Could you answer a question about Prevayler and thread-safety?
--------------------------------------------------------------
Prevayler guarantees that all the writes ( through its transactions) are synchronized. But what about reads?
Is it right that dirty reads are possible if no explicit synchronizing is used (in user code)?
// get the 3rd account
Accont account = (Bank)prevayler.prevalentSystem().getAccounts().get(2);
?
If so what synchronizing strategies are good for a user code?
(Consider a business object A contains a collection of business objects Bs),
using a synchronized collection (of Bs inside of A), for example from java.util.concurrent package?
synchronize collection reads outside transactions with the collection writes inside transactions, for example using "synchronized( collection )" code around reads and writes?
--------------------------------------------------------------
Cheers, Sergey
Hi Sergey,
Take a look at the javadoc for Prevayler.execute(Query)
See you, Klaus.

Sergey Didenko

2009-01-25 10:21:17 UTC

Permalink

First of all thanks for you answer!

I wrote a simple PoC for sensible queries and it clealry shows that dirty
reads are still possible.

That happens because JMatch does not make deep copies of the matched
objects. It only copies the references to them. So if a user gets anything
other than atomic values or immutable objects, he can observe dirty reads.

I attach my PoC( testing code)

Sergey Didenko

2009-01-25 10:39:22 UTC

Permalink

Sorry for the mess, here is my answer in the plain text:

First of all thanks for you answer!

I wrote a simple PoC for sensible queries and it clealry shows that
dirty reads are still possible.

That happens because JMatch does not make deep copies of the matched
objects. It only copies the references to them. So if a user gets
anything other than atomic values or immutable objects, he can observe
dirty reads.

I attach my PoC( testing code)

William Pietri

2009-01-25 10:46:59 UTC

Permalink

Hi, Sergey. You are right that access like your examples is indeed
unsynchronized. Klaus is right that if you want to do synchronized
reads, you execute queries. This is the method to use:

http://docs.rakeshv.org/java/prevayler/org/prevayler/Prevayler.html#execute(org.prevayler.Query)

The simple way to think of it is that Prevayler provides transactional
isolation by executing commands one at a time. So if you need a query
that is isolated from all writes, package it up as a command object and
feed it to the Prevayler object to execute.

Does that help?

William

Post by Sergey Didenko
Moving our discussion to the mail list...

------------------------------------------------------------------------------
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
To unsubscribe go to the end of this page: http://lists.sourceforge.net/lists/listinfo/prevayler-discussion
_______________________________________________
"Databases in Memoriam" -- http://www.prevayler.org

Sergey Didenko

2009-01-25 12:35:10 UTC

Permalink

Hi William,

My point is:

1. Sensible query is indeed synchronized with commands (Transactions),
2. The code that accesses query result is not synchronized with Transactions

So after a user safely executes a sensible query, she is going to
unsafely access the query results ( unless they are atomic values or
immutable objects).

So the solution is either
1. to implement deep object cloning in JMatch ( I hope I did not miss
if it is already there :) )
or
2. wrap object accesses into additional "synchronize( workingObject )"
(or use java.util.concurrent features)

Are you proposing to
3. Move complex read-only logic into separate Transactions ?

Is there anything suitable in JMatch that I missed?

Cheers, Sergey

William Pietri

2009-01-25 13:09:41 UTC

Permalink

Hi, Sergey. Thanks for explaining further.

I've never used JMatch, so I can't speak to that.

Looking through the Prevayler code I've worked on, I see four patterns
in our query response object:

* returning immutable domain objects (say 30% of queries)
* building data transfer objects (40%, generally as display-layer
objects)
* building meaningful result objects (20%)
* not caring (10%)

I was expecting to find some deep copies, but didn't see any.

In the "not caring" case, it's because displaying updates would be
either harmless or beneficial.

I suspect that's not much help to you, but it's the best I've got
myself. Perhaps others more familiar with JMatch will have better advice.

William

Post by Sergey Didenko
Hi William,
1. Sensible query is indeed synchronized with commands (Transactions),
2. The code that accesses query result is not synchronized with Transactions
So after a user safely executes a sensible query, she is going to
unsafely access the query results ( unless they are atomic values or
immutable objects).
So the solution is either
1. to implement deep object cloning in JMatch ( I hope I did not miss
if it is already there :) )
or
2. wrap object accesses into additional "synchronize( workingObject )"
(or use java.util.concurrent features)
Are you proposing to
3. Move complex read-only logic into separate Transactions ?
Is there anything suitable in JMatch that I missed?
Cheers, Sergey
------------------------------------------------------------------------------
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
To unsubscribe go to the end of this page: http://lists.sourceforge.net/lists/listinfo/prevayler-discussion
_______________________________________________
"Databases in Memoriam" -- http://www.prevayler.org

Sergey Didenko

2009-01-25 13:53:50 UTC

Permalink

Hi William,

Thank you a lot for your detailed answer.

As for the "real-world" cases I suppose these dirty-reads can be
observed in rare cases under high concurrent load. So it's still ok
for a lot of applications. However that can lead to hard-to-catch bugs
for more delicate applications.

Post by William Pietri
Hi, Sergey. Thanks for explaining further.
I've never used JMatch, so I can't speak to that.
Looking through the Prevayler code I've worked on, I see four patterns in
returning immutable domain objects (say 30% of queries)
building data transfer objects (40%, generally as display-layer objects)
building meaningful result objects (20%)
not caring (10%)
I was expecting to find some deep copies, but didn't see any.
In the "not caring" case, it's because displaying updates would be either
harmless or beneficial.
I suspect that's not much help to you, but it's the best I've got myself.
Perhaps others more familiar with JMatch will have better advice.
William
Hi William,
1. Sensible query is indeed synchronized with commands (Transactions),
2. The code that accesses query result is not synchronized with Transactions
So after a user safely executes a sensible query, she is going to
unsafely access the query results ( unless they are atomic values or
immutable objects).
So the solution is either
1. to implement deep object cloning in JMatch ( I hope I did not miss
if it is already there :) )
or
2. wrap object accesses into additional "synchronize( workingObject )"
(or use java.util.concurrent features)
Are you proposing to
3. Move complex read-only logic into separate Transactions ?
Is there anything suitable in JMatch that I missed?
Cheers, Sergey
------------------------------------------------------------------------------
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
http://lists.sourceforge.net/lists/listinfo/prevayler-discussion
_______________________________________________
"Databases in Memoriam" -- http://www.prevayler.org
------------------------------------------------------------------------------
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
http://lists.sourceforge.net/lists/listinfo/prevayler-discussion
_______________________________________________
"Databases in Memoriam" -- http://www.prevayler.org

William Pietri

2009-01-25 17:54:26 UTC

Permalink

Post by Sergey Didenko
Hi William,
Thank you a lot for your detailed answer.
As for the "real-world" cases I suppose these dirty-reads can be
observed in rare cases under high concurrent load. So it's still ok
for a lot of applications. However that can lead to hard-to-catch bugs
for more delicate applications.

Yeah, the 10% or so where we passed out mutable domain objects were
things where we expected the changes were fine.

To take a made-up example, consider a simple web forum, where the
MemberProfile object might have some strings like display name,
location, and favorite quote. If somebody updated all three fields at
once, it's possible that another user looking at that profile might see
only one of them updated on that viewing. But nobody cares, so in this
case it's not worth copying the data while in the query.

Naturally, this isn't something you'd do most of the time, as you're
right, it can lead to subtle bugs.

William

Justin T. Sampson

2009-01-25 19:10:31 UTC

Permalink

Replying to the whole thread...

Sergey, yes, everything you've said is true. Unfortunately, the demos
included with Prevayler are not threadsafe, so they're no help here. I
almost always use Query and make sure it returns an immutable result.
And I do advocate your #3: "Move complex read-only logic into separate
Transactions." Well, Queries rather than Transactions. The point is, I
think that just doing "CRUD"-style Queries and Transactions with
complex logic on the outside is the wrong way to go; it kind of misses
the point of Prevayler. Your business logic should be in your business
objects, which should be in your prevalent system.

The options William described are perfect. I can't really endorse the
"not caring" case because it's easy to go just a little too far: It
*might* be okay if all the fields in question are strings or
primitives (except longs), but the problem with unsynchronized access
to shared state is that you really don't know what you're going to
get: In general, you might see things that synchronized access would
*never* see (not just out-of-date values). For example, if you try to
read from a HashMap in one thread while it's being updated in another
thread, you could easily get a NullPointerException or even go into an
infinite loop.

As for running multiple queries concurrently, Prevayler doesn't
currently (as of 2.3) support that, but probably will soon since it
gets requested so often. (I did implement it on the java5_experiment
branch.) However, even so, I wouldn't want a Query to actually be
doing output; it should still really only be accessing the prevalent
system and getting out as quickly as possible. I do occasionally go
with the style of rendering the HTTP response within a Query; however,
I would do that by writing to a StringBuilder and returning that from
the Query, not by writing out to the client directly.

I once did implement something very close to what Klaus described,
wrapping each HTTP request in a Prevayler transaction or query. (We
did it at NewEdu, where William and I worked together.) We saw it as
the first step in adding Prevayler to a system that had no persistence
at all yet. We were able to drop Prevayler into the system in a few
days, most of which time was testing. For total correctness, we
actually started by wrapping *all* requests into Transactions, just in
case some GETs did modify the system. Then we gradually converted most
over to Queries for performance. Over time, we ended up factoring
various parts of the code out of those Transactions and Queries, as
they didn't *quite* make sense to be exactly the same layer of code as
the requests coming in.

Cheers,
Justin

Sergey Didenko

2009-01-27 21:19:36 UTC

Permalink

Guys, I study Prevayler further and see other cases where a feeling of
false (thread-)safety can occur.

Consider TransactionWithQuery. There is no warning neither in API nor
in the documentation that returning a mutable business object can lead
to a dirty-read in multithreading application. May be it would be good
to have a special class that makes a deep cloning of result, may be it
would just be enough to write a warning in the javadoc. May be it
suffices to write a special article on the site like for the "baptism
problem".

I hope I will come up with good ideas about this later, during my
studying/ using of Prevayler. However I want to focus your attention
on this problem.

Cheers, Sergey

Klaus Wuestefeld

2009-01-27 22:00:53 UTC

Permalink

Our hand-holding responsibilities only go so far.

The baptism problem is something introduced by serialization, so we document it.

The fact that multithreaded code is tricky is not introduced by
Prevayler. The issues would still exist even if you had an
invulnerable VM in RAM, without Prevayler.

The question you have to ask yourself before using Prevayler is:

"If I had an invulnerable VM, would that be cool? Would I be capable
of using it? Or am I too DBMS-atrophied?"

With great power comes great responsibility. :P

See you, Klaus.

On Tue, Jan 27, 2009 at 7:19 PM, Sergey Didenko

Post by Sergey Didenko
Guys, I study Prevayler further and see other cases where a feeling of
false (thread-)safety can occur.
Consider TransactionWithQuery. There is no warning neither in API nor
in the documentation that returning a mutable business object can lead
to a dirty-read in multithreading application. May be it would be good
to have a special class that makes a deep cloning of result, may be it
would just be enough to write a warning in the javadoc. May be it
suffices to write a special article on the site like for the "baptism
problem".
I hope I will come up with good ideas about this later, during my
studying/ using of Prevayler. However I want to focus your attention
on this problem.
Cheers, Sergey
------------------------------------------------------------------------------
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
To unsubscribe go to the end of this page: http://lists.sourceforge.net/lists/listinfo/prevayler-discussion
_______________________________________________
"Databases in Memoriam" -- http://www.prevayler.org

m***@ikanos.se

2009-01-28 19:22:28 UTC

Permalink

Hi.

If you are using multiple threads in your application, you have a
responsibility to synchronize carefully. This is nothing new.
At least Prevayler helps you with the updates. But reading shared data in
concurrent threads may require locks of some kind.

I still think Prevayler enables something not even possible with many
other frameworks, to actually use the "original" object, for good and bad.
The performance, simplicity and small code overhead is simply unrivaled in
my eyes.

I fully understand your concern about how to successfully address
concurrent access to the data model. It is no simple problem.

I can tell you that we've thought of different solutions in (one of) my
current project, and have come to the conclusion to deep clone most
results from our queries.

We actually started with the idea to maximize use of immutable objects,
but it is a pain having immutable objects in the data model, if they are
not "natural" immutable or "almost primitives". Multiple references to an
immutable object is impractical when it needs updating etc.
So my recommendation is: Use immutable for "leaf" objects, and objects
that actually shouldn't change. (Kind of obvious if you think about it.)
Do not use immutable for objects like users, preferences etc.

We deep clone our objects using serialization, because we're lazy. It's a
very simple way to make deep clones, and (often) one that people feel
comfortable with and know possible side-effects of.

Another way not to expose the data model outside of Prevayler by having
custom built responses for queries (I find this a bit "unproductive"), or
wrap everything inside Prevayler (in which case you are in practice
single-threaded).

Good luck!

/Mikael

Post by Sergey Didenko
Guys, I study Prevayler further and see other cases where a feeling of
false (thread-)safety can occur.
Consider TransactionWithQuery. There is no warning neither in API nor
in the documentation that returning a mutable business object can lead
to a dirty-read in multithreading application. May be it would be good
to have a special class that makes a deep cloning of result, may be it
would just be enough to write a warning in the javadoc. May be it
suffices to write a special article on the site like for the "baptism
problem".
I hope I will come up with good ideas about this later, during my
studying/ using of Prevayler. However I want to focus your attention
on this problem.
Cheers, Sergey
------------------------------------------------------------------------------
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
http://lists.sourceforge.net/lists/listinfo/prevayler-discussion
_______________________________________________
"Databases in Memoriam" -- http://www.prevayler.org

Klaus Wuestefeld

2009-01-28 21:45:03 UTC

Permalink

Nice comments :)

Post by m***@ikanos.se
or
wrap everything inside Prevayler (in which case you are in practice
single-threaded).

Actually you can have queries running in parallel with Prevayler using
a system-wide read/write lock for queries/transactions.

Justin and I have already independently implemented that in
experimental future versions of Prevayler. We just have to back-port
that. Justin can do it in 3 mins, I in about 47.

Then, you just have to make sure you locally synchronize the
lazy-inits/evals, which is pretty simple.

See you, Klaus.

Sergey Didenko

2009-01-29 07:23:43 UTC

Permalink

Thanks Mikael!

Klaus, my main point is to make all these considerations explicit, so
that people don't have to study Prevayler and JMatch code deeply to
write their prevaylent (safe) multithreaded application.

Klaus Wuestefeld

2009-01-29 15:07:30 UTC

Permalink

OK

On Thu, Jan 29, 2009 at 5:23 AM, Sergey Didenko

Post by Sergey Didenko
Thanks Mikael!
Klaus, my main point is to make all these considerations explicit, so
that people don't have to study Prevayler and JMatch code deeply to
write their prevaylent (safe) multithreaded application.
------------------------------------------------------------------------------
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
To unsubscribe go to the end of this page: http://lists.sourceforge.net/lists/listinfo/prevayler-discussion
_______________________________________________
"Databases in Memoriam" -- http://www.prevayler.org

Klaus Wuestefeld

2009-01-25 13:43:38 UTC

Permalink

Post by Sergey Didenko
2. The code that accesses query result is not synchronized with Transactions

Sergey Didenko

2009-01-25 13:53:31 UTC

Permalink

Klaus, could you clarify? I don't quite understand your explanation.

You can run my example code on multiprocessor system to see that it's
quite possible for a reading thread to observe inconsistent results.
Though it takes the result from "execute( sensibleQuery )"

On Sun, Jan 25, 2009 at 3:43 PM, Klaus Wuestefeld

Post by Klaus Wuestefeld

Post by Sergey Didenko
2. The code that accesses query result is not synchronized with Transactions

Why not?
Can you not treat every http POST as a transaction and every http GET
as a sensitive query?
You can do that in a single point in your code and then forget all
about transactions and queries. Logically, you will be inside a web
app in RAM that never crashes.
See you, Klaus.
------------------------------------------------------------------------------
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
To unsubscribe go to the end of this page: http://lists.sourceforge.net/lists/listinfo/prevayler-discussion
_______________________________________________
"Databases in Memoriam" -- http://www.prevayler.org

Klaus Wuestefeld

2009-01-25 14:05:11 UTC

Permalink

Post by Sergey Didenko

Post by Klaus Wuestefeld
Can you not treat every http POST as a transaction and every http GET
as a sensitive query?

Klaus, could you clarify? I don't quite understand your explanation.

Wrap Prevayer around your entire web app, not only your business logic.

So now every code you execute is either a Transaction (from http
POSTs) or a synchronized query (from http GETs). There is no more
accessing business object code from "outside" because there is no more
"outside".

See you, Klaus.

William Pietri

2009-01-25 17:31:53 UTC

Permalink

For those trying this in practice, make sure your output is properly
buffered. Otherwise one user on a slow connection will hang your system
from time to time. :-)

Klaus, this discussion makes me wonder: does Prevayler execute multiple
simultaneous queries in parallel? Writes, I'm sure, are one at a time.
But given that most web apps are read-heavy, and given the
ever-increasing number or cores available, it would make sense to do
your GETs in simultaneous batches once you reach a certain level of load.

William

Sergey Didenko

2009-01-25 17:34:59 UTC

Permalink

Klaus, this does not solve "dirty-reads" problem. It just lowers its
possibility, because there is less time between query ends and query
results read.

Also it can decrease performance on multi-processor system, because
every http POST request blocks other POST requests from the very
start. That means that 7 other processor cores wait for a single POST
request to finish.

I'm totally ok with writing "synchronized" clauses, I just thought
somebody has experience with preventing "dirty-reads" and can suggest
a safe and efficient pattern.

William Pietri

2009-01-25 20:26:37 UTC

Permalink

Post by Sergey Didenko

Klaus, this does not solve "dirty-reads" problem. It just lowers its
possibility, because there is less time between query ends and query
results read.

How do you mean? It would seem to me that if you do all of your request
handling, from initial parameter processing to final output buffer
writes, inside a query or a transaction, then there is no time for dirty
reads.

Post by Sergey Didenko
Also it can decrease performance on multi-processor system, because
every http POST request blocks other POST requests from the very
start. That means that 7 other processor cores wait for a single POST
request to finish.

That is definitely a problem in theory, but for a lot of applications,
it may not matter much. Typical web applications are very read-heavy. A
prevalent system gives you such a performance boost compared with a
database-backed system that the global write lock could still be much
more efficient. When you start to push the boundaries of that, you could
invest in finer-grained locking. Or you could start to distribute your
app across multiple machines.

Unless you're pretty sure that your load will plateau at a level where a
global write lock is insufficient but you won't need multiple machines,
it may be a better use of development time to skip the fine-grained
locking and go right for splitting your app up. Either way, going with a
global write lock would buy you a lot of time with pretty low
development overhead. For a typical consumer traffic mix, I'm sure you
could get 10m dynamic pageviews/month like that on a single commodity
server, and I wouldn't be surprised if you topped 50m.

William

Sergey Didenko

2009-01-25 21:07:39 UTC

Permalink

Thanks for explanations guys!

Now I see that you propose to extend
net.sourceforge.javamatch.query.MatchQuery for every sensible query
that is more complex that standard javamatch queries. And to return
only atomic values/ deep object copies / immutable objects.

Also I see why you think that can be superficial than fine-grained
locking - using one global lock scheme can speed up the development.

BTW, it would be really good to put this info to the site.
Unfortunately Prevayler has very low exposure in programmers community

Post by Klaus Wuestefeld
Wrap Prevayer around your entire web app, not only your business logic.
So now every code you execute is either a Transaction (from http
POSTs) or a synchronized query (from http GETs). There is no more
accessing business object code from "outside" because there is no more
"outside".
Klaus, this does not solve "dirty-reads" problem. It just lowers its
possibility, because there is less time between query ends and query
results read.
How do you mean? It would seem to me that if you do all of your request
handling, from initial parameter processing to final output buffer writes,
inside a query or a transaction, then there is no time for dirty reads.
Also it can decrease performance on multi-processor system, because
every http POST request blocks other POST requests from the very
start. That means that 7 other processor cores wait for a single POST
request to finish.
That is definitely a problem in theory, but for a lot of applications, it
may not matter much. Typical web applications are very read-heavy. A
prevalent system gives you such a performance boost compared with a
database-backed system that the global write lock could still be much more
efficient. When you start to push the boundaries of that, you could invest
in finer-grained locking. Or you could start to distribute your app across
multiple machines.
Unless you're pretty sure that your load will plateau at a level where a
global write lock is insufficient but you won't need multiple machines, it
may be a better use of development time to skip the fine-grained locking and
go right for splitting your app up. Either way, going with a global write
lock would buy you a lot of time with pretty low development overhead. For a
typical consumer traffic mix, I'm sure you could get 10m dynamic
pageviews/month like that on a single commodity server, and I wouldn't be
surprised if you topped 50m.
William
------------------------------------------------------------------------------
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
http://lists.sourceforge.net/lists/listinfo/prevayler-discussion
_______________________________________________
"Databases in Memoriam" -- http://www.prevayler.org