Discussion:
[isabelle] Clone detection for Isabelle theories
(too old to reply)
Lars Hupel
2016-07-13 14:22:47 UTC
Permalink
Raw Message
Dear list,

I'd like to announce the availability of a technology preview of code
clone detection for Isabelle theories.

This is the result of a Master's Thesis carried out by Dominik Vinan
under my supervision, with the help of Benjamin Hummel. Benjamin's
company is the vendor of ConQAT, an open-source software quality
assurance toolkit, for which the student has implemented an Isabelle
frontend.

The purpose of code clone detection is simple: to find duplicated
specification or proof fragments in Isabelle theories.

I will present the tool and its accompanying paper at the Isabelle
Workshop in Nancy, but it is already possible to download and use it:

Paper
<http://www.in.tum.de/~nipkow/Isabelle2016/Isabelle2016_9.pdf>

ConQAT+Isabelle bundle

<https://www21.in.tum.de/~hupel/downloads/isabelle-clones/isabelle-clones-preview.zip>

The bundle should contain all instructions for feeding theories into
ConQAT. It should work on macOS and Linux (no guarantees for Windows).
The result is an HTML report of detected clones which can be viewed in
the browser.

If you try it out, please report any problems (or that you didn't have
problems) to me via mail so that the student and me can iron out bugs
before ITP.

Cheers
Lars
Johannes Waldmann
2016-07-14 12:34:18 UTC
Permalink
Raw Message
Post by Lars Hupel
I'd like to announce the availability of a technology preview of code
clone detection for Isabelle theories.
Interesting.

As far as I understood from the paper,
you are parsing source code with libisabelle,
which in turn requires a full Isabelle installation.
(Then - is parsing safe? Or is there danger of
code injection?)

I'd hope there is something like Language.Isabelle,
(just AST data type and parser - no semantics). cf.
https://hackage.haskell.org/package/haskell-src-exts
https://hackage.haskell.org/package/language-javascript

very hypothetically - is there a formal Isabelle(Isar) grammar
from which AST representation and parser/printer (in Haskell)
could be generated (mostly) automatically?

Best regards, J. Waldmann.
Lars Hupel
2016-07-14 12:43:32 UTC
Permalink
Raw Message
Hi Johannes,
Post by Johannes Waldmann
As far as I understood from the paper,
you are parsing source code with libisabelle,
which in turn requires a full Isabelle installation.
(Then - is parsing safe? Or is there danger of
code injection?)
no, it is unsafe; yes, there is danger of code injection. Loading
Isabelle theories with libisabelle is equivalent to loading them into
Isabelle/jEdit.
Post by Johannes Waldmann
very hypothetically - is there a formal Isabelle(Isar) grammar
from which AST representation and parser/printer (in Haskell)
could be generated (mostly) automatically?
To quote the paper:

"Isabelle/Isar is the surface syntax for Isabelle theories. Because it
is user-extensible, it is impossible to parse statically."

Due to the presence of arbitrarily complicated parse translations (inner
syntax) and syntax parsers (outer syntax) written in ML the syntactic
analysis of Isar is equivalent to the halting problem. In that sense
Isar is very much like Perl.

Cheers
Lars
Johannes Waldmann
2016-07-14 13:46:07 UTC
Permalink
Raw Message
Hi.
... Isar is very much like Perl.
Great! Use this for advertising.
Should open up a whole new population of users ...

User-defined syntax is perhaps
more a matter of readability
(I guess it's mostly about operator precedences,
and a few special forms - unless you actively abuse it).
But code injection really scares me.

Mind you, Haskell is scary too, because I can do

{-# language TemplateHaskell #-}
import Language.Haskell.TH
$(runIO (print 42) >> return [])
main = return ()

which will do IO during compilation.

But that needs the language pragma to be present
(at the start of the file)
and I can call ghc (the compiler) with "-XSafe"
and it will reject such code outright.

This "safe Haskell" is a recent addition
to the language (well, to GHC actually).
https://ghc.haskell.org/trac/ghc/wiki/SafeHaskell

I wonder why Isabelle plans to abandon
its "safe" flag (as Lars mentioned in
https://lists.cam.ac.uk/pipermail/cl-isabelle-users/2016-July/msg00034.html
)

Best regards, Johannes.
Manuel Eberl
2016-07-14 14:11:02 UTC
Permalink
Raw Message
Post by Johannes Waldmann
But that needs the language pragma to be present
(at the start of the file)
and I can call ghc (the compiler) with "-XSafe"
and it will reject such code outright.
‘Safe Haskell, however, does not offer compilation safety. During
compilation time it is possible for arbitrary processes to be
launched, using for example the custom pre-processor flag. This can be
manipulated to either compromise a users system at compilation time,
or to modify the source code just before compilation to try to alter
set Safe Haskell flags.’
Also, I don't recall the details, but I think there were some problems
with Safe Haskell in the past. If you want to be (more) sure, you
probably still have to sandbox compilation and execution on the OS level.
Post by Johannes Waldmann
I wonder why Isabelle plans to abandon
its "safe" flag (as Lars mentioned in
https://lists.cam.ac.uk/pipermail/cl-isabelle-users/2016-July/msg00034.html
)
First of all, the ability to write custom tactics in ML is an integral
part of Isabelle. Preventing people from doing that is too immense a
restriction for it to be a useful feature in pretty much any context
except course homework (where writing tactics is usually not done –
although I did in fact do that for one of my homework submissions to
make my life easier).

Secondly, if I recall, Isabelle's ‘safe’ mode isn't actually very safe.
I don't know what it prevents and what it does not prevent, but even
without the ability to embed arbitrary ML code, you can do all kinds of
nasty things with code generation and the ‘value’ command. Isabelle is a
huge system and as far as I am aware, none of it was ever designed with
a ‘malicious’ user in mind.

I guess if one were to go over the entire code base of Isabelle with the
express intent of finding possible security problems and then ensuring
that all of these are disallowed in the safe mode could result in a
version of Isabelle that provides a reasonable amount of confidence in
its safety, but that would be a big endeavour and the only real use case
would be automatic homework processing.

Considering that you can do the same thing with considerably less
effort, no restrictions imposed on the expressivity of Isabelle, and
with much better confidence in the safety of the approach by using
sandboxes/containers, I don't think there is any reason to invest time
and energy into a ‘safe mode’.


Cheers,

Manuel
Lars Hupel
2016-07-14 15:46:35 UTC
Permalink
Raw Message
Post by Johannes Waldmann
... Isar is very much like Perl.
Great! Use this for advertising.
Should open up a whole new population of users ...
This was on purpose ;-)
Post by Johannes Waldmann
User-defined syntax is perhaps
more a matter of readability
(I guess it's mostly about operator precedences,
and a few special forms - unless you actively abuse it).
But code injection really scares me.
I think your mental model of Isabelle is slightly off. There's no point
in discussing the design decisions made decades ago when we're really
discussing how clone detection works with the system we have right now.

To rephrase: Isabelle has arbitrary user-extensible syntax and it has
arbitrary executable ML code. If you want to do realistic syntax
analysis today you have to accept that.

I'm not saying that in the future there couldn't be some other
mechanism. But it is what it is right now.
Post by Johannes Waldmann
I wonder why Isabelle plans to abandon
its "safe" flag (as Lars mentioned in
https://lists.cam.ac.uk/pipermail/cl-isabelle-users/2016-July/msg00034.html
)
There are no plans to abandon it. It has already been abandoned.

But the reasoning goes as follows: Many people understand many different
things about what "secure mode" is supposed to mean. The fact that it
won't run custom ML code means neither that

a) malicious users are prevented from tricking the system into accepting
theorems which are not actually proven (e.g. by using axiomatization)

b) malicious users are prevented from affecting the operating system or
the machine running Isabelle

But it means that many legitimate things (e.g. custom tactics, loading
AFP libraries which contain ML code) are impossible.

Because of that it wasn't really useful to begin with. Having a "secure"
flag which doesn't actually protect anything is worse than having no
flag at all.

Again, in the future, there could be different mechanisms. But before
doing so we actually need to come up with a solid threat model and a
specification on what "secure" should mean.

Cheers
Lars
Lars Hupel
2016-07-14 15:56:27 UTC
Permalink
Raw Message
Post by Lars Hupel
I think your mental model of Isabelle is slightly off. There's no point
in discussing the design decisions made decades ago when we're really
discussing how clone detection works with the system we have right now.
To rephrase: Isabelle has arbitrary user-extensible syntax and it has
arbitrary executable ML code. If you want to do realistic syntax
analysis today you have to accept that.
Sorry, that probably came off harsher than intended. I was trying to say
that Isabelle supports all these things today and there is no reliable
way to turn them off.

Loading...