[isabelle] A large mutually recursive data type

Discussion:

(too old to reply)

Victor Gomes

2016-03-11 10:26:25 UTC

Dear Isabelle experts,

As part of the Cerberus project, a formalisation of the de-facto C semantics, I was trying to import the model into Isabelle/HOL.

One of the first files (attached a simplified version) contains a very large mutually recursive data type (with 30 different types, around 5 or 10 constructors each).
After a few minutes waiting a response from Isabelle, I realised there was no hope to try to wait in JEdit.
Therefore, I decided to build an Isabelle heap file.

Isabelle took around 40 min to give me the following error:
ISABELLE_BUILD_OPTIONS=""

ISABELLE_BUILD_JAVA_OPTIONS="-Djava.awt.headless=true -Xms512m -Xmx2560m -Xss4m"

ML_PLATFORM="x86-darwin"
ML_HOME="/Applications/Isabelle2016.app/Contents/Resources/Isabelle2016/contrib/polyml-5.6-1/x86-darwin"
ML_SYSTEM="polyml-5.6"
ML_OPTIONS="-H 500"

Session Pure/Pure
Session HOL/HOL (main)
Session Unsorted/isabelle_generated
Building isabelle_generated ...
isabelle_generated: theory Cabs
Run out of store - interrupting threads
Run out of store - interrupting threads
Failed to recover - exiting
isabelle_generated FAILED

Since the default ML_OPTIONS is “-H 500”, I increased it to “-H 4000” and gave Isabelle another try.
This time I waited for 4 hours… at that point I gave up.

I’ve decided to give HOL4 a try and the same datatype was handled by HOL4 in minutes (around 2 min, I think).
I know Isabelle is quite heavy (for good reasons) compared to HOL4, but the difference is astonishing.
I was wondering if there is any optimisation issue in the Isabelle datatype package.

Isabelle probably produces a lot more lemmas "for free" when defining a datatype, but a slowdown in processing seems to be the price you pay.

The main question is:
Is there any hope of definitions like this being practicable within Isabelle/HOL in the future?

Note that:
- My machine is a MacBook Pro 2015 16GB
- I’ve used Isabelle 2016
- The version of polyml used in both cases (Isabelle and HOL4) are the same, polyml-5.6.

Best wishes,
Victor

Dmitriy Traytel

2016-03-11 11:13:45 UTC

Permalink

Hi Victor,

Isabelle’s new package for datatypes works quite differently from HOL4’s package (and from Isabelle’s old package which was much closer to HOL4’s).

The new package is known to be slower for mutual types, than the old construction. It is much faster for nested recursion (in “‘a tree = Node ‘a "‘a tree list” ‘a tree is nested in ‘a list) though, such that in most applications we saw so far the new package was actually faster. I don’t think we ever tried it with more than 15 mutually recursive types.

I’m quite curious to see you concrete datatype to say if it still might be feasible. (I don’t see any attachment in your mail.)

Cheers,
Dmitriy

Post by Victor Gomes
Dear Isabelle experts,
As part of the Cerberus project, a formalisation of the de-facto C semantics, I was trying to import the model into Isabelle/HOL.
One of the first files (attached a simplified version) contains a very large mutually recursive data type (with 30 different types, around 5 or 10 constructors each).
After a few minutes waiting a response from Isabelle, I realised there was no hope to try to wait in JEdit.
Therefore, I decided to build an Isabelle heap file.
ISABELLE_BUILD_OPTIONS=""
ISABELLE_BUILD_JAVA_OPTIONS="-Djava.awt.headless=true -Xms512m -Xmx2560m -Xss4m"
ML_PLATFORM="x86-darwin"
ML_HOME="/Applications/Isabelle2016.app/Contents/Resources/Isabelle2016/contrib/polyml-5.6-1/x86-darwin"
ML_SYSTEM="polyml-5.6"
ML_OPTIONS="-H 500"
Session Pure/Pure
Session HOL/HOL (main)
Session Unsorted/isabelle_generated
Building isabelle_generated ...
isabelle_generated: theory Cabs
Run out of store - interrupting threads
Run out of store - interrupting threads
Failed to recover - exiting
isabelle_generated FAILED
Since the default ML_OPTIONS is “-H 500”, I increased it to “-H 4000” and gave Isabelle another try.
This time I waited for 4 hours… at that point I gave up.
I’ve decided to give HOL4 a try and the same datatype was handled by HOL4 in minutes (around 2 min, I think).
I know Isabelle is quite heavy (for good reasons) compared to HOL4, but the difference is astonishing.
I was wondering if there is any optimisation issue in the Isabelle datatype package.
Isabelle probably produces a lot more lemmas "for free" when defining a datatype, but a slowdown in processing seems to be the price you pay.
Is there any hope of definitions like this being practicable within Isabelle/HOL in the future?
- My machine is a MacBook Pro 2015 16GB
- I’ve used Isabelle 2016
- The version of polyml used in both cases (Isabelle and HOL4) are the same, polyml-5.6.
Best wishes,
Victor

Jasmin Blanchette

2016-03-11 12:48:36 UTC

Permalink

Dear Victor,

Thanks for the answer. I have also tried Isabelle2013-2 (which I believe uses the old Isabelle’s package) with no success, although I didn’t wait much longer.

In principle, we know how to handle such large datatypes. Mutually recursive definitions can be rephrased as nested definitions. For example, instead of

datatype
'a tree_list = Nil | Cons "'a tree" "'a tree_list" and
'a tree = Node 'a "'a tree_list"

one can define

datatype 'a list = Nil | Cons 'a "'a list"
datatype 'a tree = Node 'a "'a tree list"

(We call this "nesting" because "'a tree" occurs recursively nested under "list", but this is not standard terminology.) The nested version scales much better than mutual one with modern versions of Isabelle (but much worse in old versions) -- nesting basically comes for free, whereas an n-ary mutual construction is O(n^3) or O(n^4) (Dmitriy might know).

Given your specification, I see two options:

1. You could introduce nesting manually, in a more or less ad hoc fashion, inspired by the "tree_list" example above. Although this sometimes leads to satisfactory results (as in the "tree list" example above), in your case the result surely wouldn't be pretty.

2. We could internally reduce large mutual specifications to nested ones. We already have much of the machinery in place to do this (see the "Nested-to-Mutual Reduction" paragraph, starting on p. 5, in our ITP 2014 paper [*]), but some more engineering would be necessary to (1) heuristically generalize some of the types in some groups (i.e. automatizing part 1) and (2) lifting all the results across isomorphisms, so that this is completely invisible to the user (except that it scales much better).

I would expect that point 2 would take us up to one week of work to get it right. It is unfortunately rather technical work. How important is this to you? What are your deadlines? And how acceptable is Option 1?

Cheers,

Jasmin

[*] http://www.loria.fr/~jablanch/co-data-impl.pdf

Victor Gomes

2016-03-11 13:33:28 UTC

Permalink

Dear Jasmin,

Thanks for the prompt reply.

Cerberus is an ongoing project written in Lem [1], which can automatically export to executable code (OCaml), and to interactive theorem provers (HOL4 and Isabelle). Currently, we are able to export the model to OCaml and execute C programs [2].

We would like to export the model to a theorem prover and prove some properties about it, but we have not decided each theorem prover yet. We would prefer to use Isabelle, because I have more experience with it.

The option 1 is clearly not viable, since it would imply change all the current Cerberus code. It would be great if we could try the second option. Since I am still deciding how to proceed, I don’t have firm deadlines yet.

Cheers,

Victor

[1] http://www.cl.cam.ac.uk/~pes20/lem/
[2] http://conf.researchr.org/event/pldi-2016/pldi-2016-papers-into-the-depths-of-c-elaborating-the-de-facto-standards

Post by Jasmin Blanchette
Dear Victor,

Thanks for the answer. I have also tried Isabelle2013-2 (which I believe uses the old Isabelle’s package) with no success, although I didn’t wait much longer.

In principle, we know how to handle such large datatypes. Mutually recursive definitions can be rephrased as nested definitions. For example, instead of
datatype
'a tree_list = Nil | Cons "'a tree" "'a tree_list" and
'a tree = Node 'a "'a tree_list"
one can define
datatype 'a list = Nil | Cons 'a "'a list"
datatype 'a tree = Node 'a "'a tree list"
(We call this "nesting" because "'a tree" occurs recursively nested under "list", but this is not standard terminology.) The nested version scales much better than mutual one with modern versions of Isabelle (but much worse in old versions) -- nesting basically comes for free, whereas an n-ary mutual construction is O(n^3) or O(n^4) (Dmitriy might know).
1. You could introduce nesting manually, in a more or less ad hoc fashion, inspired by the "tree_list" example above. Although this sometimes leads to satisfactory results (as in the "tree list" example above), in your case the result surely wouldn't be pretty.
2. We could internally reduce large mutual specifications to nested ones. We already have much of the machinery in place to do this (see the "Nested-to-Mutual Reduction" paragraph, starting on p. 5, in our ITP 2014 paper [*]), but some more engineering would be necessary to (1) heuristically generalize some of the types in some groups (i.e. automatizing part 1) and (2) lifting all the results across isomorphisms, so that this is completely invisible to the user (except that it scales much better).
I would expect that point 2 would take us up to one week of work to get it right. It is unfortunately rather technical work. How important is this to you? What are your deadlines? And how acceptable is Option 1?
Cheers,
Jasmin
[*] http://www.loria.fr/~jablanch/co-data-impl.pdf

Jasmin Blanchette

2016-03-13 13:54:24 UTC

Permalink

Dear Victor,

Post by Victor Gomes
The option 1 is clearly not viable, since it would imply change all the current Cerberus code. It would be great if we could try the second option. Since I am still deciding how to proceed, I don’t have firm deadlines yet.

The lack of scalability of the "(co)datatype" package in terms of mutual recursion has been an annoyance for many years. I believe we have 80% or 90% of the work done towards a scalable solution, thanks to the nested-to-mutual reduction I mentioned in my previous email.

I hope to be able to come back to you in the next weeks with some good news.

Cheers,

Jasmin

Jasmin Blanchette

2016-04-03 21:14:20 UTC

Permalink

Dear Victor,

Post by Jasmin Blanchette
The lack of scalability of the "(co)datatype" package in terms of mutual recursion has been an annoyance for many years. I believe we have 80% or 90% of the work done towards a scalable solution, thanks to the nested-to-mutual reduction I mentioned in my previous email.
I hope to be able to come back to you in the next weeks with some good news.

Dmitriy and I have made some progress on this front, and things look promising, but it might still take two or three weeks before we can push something to Isabelle.

Cheers,

Jasmin