Discussion:
[isabelle] automatically grade Isabelle homework?
(too old to reply)
Johannes Waldmann
2016-07-09 09:04:16 UTC
Permalink
Raw Message
Dear all,

I wonder about the feasibility of the following.
Perhaps something similar has already been done.

I want to automatically check solutions for homework assignments.
The problem statement could be a text with some "sorry",
and the student has to replace these with actual proofs,
and is forbidden to change anything else.

I am already doing this for Haskell homework -
students have to replace "undefined",
resulting code is type-checked and then small/quick-checked.
See URL at end of this mail for implementation.

1. what is the best way to pattern-match
student's Isabelle code against problem statement?
For Haskell, I actually compare abstract syntax trees.

2. how safe is it to run Isabelle? - For Haskell,
anything dangerous would be in IO, which is prohibited
by putting type annotations in the problem statement
(which the student cannot change).
And "unsafePerformIO" is prohibited because it would need
some "import" statement - which again would not survive
the pattern matching. So, the worst that can happen
is that student's code runs forever,
and for that I just limit execution time (to a few seconds).
Is there any way to sneak an IO action into a proof?

3.1 How would I run Isabelle? Of course I don't want the GUI.
3.2 Can I avoid putting student's submission in a file,
and pipe it instead?
3.3 How would I detect that Isabelle actually verified the proof?
Ideally, just by checking the return code?

Any input appreciated.

- Johannes.


https://gitlab.imn.htwk-leipzig.de/autotool/all/tree/master/collection/src/Haskell/Blueprint

NB: I once saw an (unrelated) project
that used "maxima" to check elementary algebra homework,
and this had serious security holes -
you could easily do IO because there is no static checking.
Lars Hupel
2016-07-09 09:58:11 UTC
Permalink
Raw Message
Dear Johannes,
Post by Johannes Waldmann
1. what is the best way to pattern-match
student's Isabelle code against problem statement?
For Haskell, I actually compare abstract syntax trees.
is there any reason you require students to exactly fill out holes? For
our Haskell course we reckoned a "matching interface" is good enough.

In Isabelle you could get something similar by using three theories:

Interface.thy – definitions would go in there
Student.thy – lemma statements + sorry would go there, student has to
fill them
Checker.thy – contains the exact same lemma statements, but using a
proof "by rule".

Example:

Interface.thy

theory Interface imports Main begin

definition prime ...

Student.thy

theory Student imports Interface begin

lemma no_biggest_prime: obtains p where "prime p" "p > n"
sorry

Checker.thy

theory Checker imports Interface Student begin

lemma check_no_biggest_prime:
obtains p where "Interface.prime p" "p > n"
by (rule Student.no_biggest_prime)

Depending on where exactly a failure occurs (you can put those into
multiple sessions) you know where the student did something wrong.
Post by Johannes Waldmann
2. how safe is it to run Isabelle? - For Haskell,
anything dangerous would be in IO, which is prohibited
by putting type annotations in the problem statement
(which the student cannot change).
And "unsafePerformIO" is prohibited because it would need
some "import" statement - which again would not survive
the pattern matching. So, the worst that can happen
is that student's code runs forever,
and for that I just limit execution time (to a few seconds).
Is there any way to sneak an IO action into a proof?
Isabelle allows arbitrary code execution. There is currently a "safe"
flag but:
* with that many legitimate things don't work
* it will be gone in the next release.

I strongly recommend containerizing Isabelle for running untrusted code.
(We use unprivileged containers without network access in a VM in our
AFP submission checker.)

On the plus side, setting timeouts is possible and robust.
Post by Johannes Waldmann
3.1 How would I run Isabelle? Of course I don't want the GUI.
You have to prepare a ROOT file and put all the files into appropriate
folders. Then you can run

$ bin/isabelle build ...

See also §2 in the Isabelle system manual:
<https://isabelle.in.tum.de/dist/Isabelle2016/doc/system.pdf>
Post by Johannes Waldmann
3.2 Can I avoid putting student's submission in a file,
and pipe it instead?
You could, but that would involve you writing some nontrivial amount of
custom Scala code to drive the build process. I wouldn't recommend it.
Post by Johannes Waldmann
3.3 How would I detect that Isabelle actually verified the proof?
Ideally, just by checking the return code?
Yes, you get that for free from "isabelle build". Exit code 0 means
session built successfully, other exit codes mean timeout/failure/...

Cheers
Lars
Johannes Waldmann
2016-07-14 13:57:18 UTC
Permalink
Raw Message
Post by Lars Hupel
Post by Johannes Waldmann
1. what is the best way to pattern-match
student's Isabelle code against problem statement?
For Haskell, I actually compare abstract syntax trees.
is there any reason you require students to exactly fill out holes?
Well, you can always give a name to a "hole" inside an expression.
Then student has to add a top-level definition for that name.

But then, being top-level (module-level)
means that it cannot refer to local names
from the scope that the hole was in,
but sometimes that's the point of an exercise,
if it's about "local" programming techniques.

Also, there are other kinds of holes (than expressions),
I want to use this also for patterns and types.
Again, these could be named, but then I lose the scope information,
or have to make it explicit (by adding arguments).

I imagine the same could be useful for Isabelle homework:
"fill the gap(s) in this proof (but don't touch the non-gaps)"
the point being that the student has to respect the (local) structure.

- Johannes.

Lars Hupel
2016-07-09 13:53:47 UTC
Permalink
Raw Message
as Lars says, there is a safe mode, and we use in (in addition to
containerization using Docker) Karlsruhe where we automatically process
Isabelle submissions using the “Praktomat” checker.
The point I was trying to make is: there is a "safe" mode, but don't
ever use it! :-)

Cheers
Lars
Lars Hupel
2016-07-11 08:47:47 UTC
Permalink
Raw Message
Hi Joachim,
besides adding security (which is better handled by containerization),
.. a dubious amount at best. Using "code_printing" and "export_code"
you can already write arbitrary things to the file system, e.g.
".bashrc" or the checker theory (if those are writeable). Also let me
reiterate that the "secure" mode is gone after Isabelle2016.
the safe mode also prevents the student from adding ML code that allows
them to implement a bad oracle, i.e. their own "sorry", right? Will
there be another way of avoiding that, once the safe mode is gone?
There are more ways to cheat than that, e.g. with "axiomatization" or
introducing custom or changing existing input/output syntax. However, ML
oracles can be easily detected using "Thm.peek_status" (the "oracle"
field is set to "true").

Given that the whole system is so complex there is no bullet-proof way
you can reasonably assign grades without reserving the right to deduct
points if cheating is detected by a human. (Human review of Isabelle
sources is a necessity.)

Cheers
Lars
Ondřej Kunčar
2016-07-11 09:16:06 UTC
Permalink
Raw Message
Post by Lars Hupel
the safe mode also prevents the student from adding ML code that allows
them to implement a bad oracle, i.e. their own "sorry", right? Will
there be another way of avoiding that, once the safe mode is gone?
There are more ways to cheat than that, e.g. with "axiomatization" or
introducing custom or changing existing input/output syntax. However, ML
oracles can be easily detected using "Thm.peek_status" (the "oracle"
field is set to "true").
This is not reliable either. You can hide usage of an oracle in the
instance proof when you instantiate a type class. For example, create a
type class that assumes False and use sorry in the instance proof.

Probably the most reliable way that we can amend this is to remove the
axiomatic type classes from the kernel.

Bests,
Ondrej
Loading...