Archive for the ‘Projects’ Category

« Older Entries

TAAC, Cleaned Up and Ready

Monday, October 18th, 2010

So I’ve finally been able to get around to moving the TAAC demos to their final location (on DIG’s demo server, dice) and also porting the SVN history to TAAC’s new home in a Mercurial repository.

Now, you should be able to run the demos on a permanent home, and have a much easier time hacking on TAAC if you so desire. (As before, it still needs python-openid and it requires that you check out the air-reasoner repository in the same place to a directory named “tmswap”).

Work Continues!

Monday, September 13th, 2010

My apologies for being relatively silent on this blog, but I tend to be rather close-lipped when it comes to social media (blogs, Twitter, Facebook, etc.). In any case, those of you visiting this site may be interested to know about certain projects I’ve worked on in the past (especially TAAC and/or AIR), so I’ll try to do my best to keep you up to date.

  • TAAC: TAAC has unfortunately been suffering from code-rot for a while, so I am in the process of updating it and cleaning up the code to work again and get it into the Mercurial repository that now hosts the AIR reasoner, Tabulator, and other projects at DIG. I hope to get around to that by the end of the month. Also, as a result of a server move, I’ve switched the location of the demos (and will be switching them one more time once I set TAAC up on our demo server).
  • AIR Reasoner: The reasoner for the semantic-web rules language AIR is another project I’ve been working on (it is also used in TAAC). We’re in the middle of cleaning up the code and trying to set the features down to actually do a proper release of the reasoner with some explanatory documentation, but that goal is still a little elusive. You can check out our progress on the “refactor” branch of the DIG Mercurial repository at http://dig.csail.mit.edu/hg/air-reasoner/. We’re hoping to get this done by December.
  • “Dprop” and Distributed Data Propagation: This was my master’s thesis, which should be on DSpace at MIT sometime in the near future. The code for that is all uploaded to to the Mercurial repository at http://dig.csail.mit.edu/hg/dprop/, although it is not clearly documented. Some examples are present in the examples/ directory which may serve as guidance as to how to use dprop.

Well, that’s all I can offer for now. I may revise this post as I note other things I forgot, however, so stay tuned.

TAAC Update

Friday, January 30th, 2009

TAAC should now, with the latest update, support Apache 2 much more nicely (apparently, mod_python with it no longer nicely forwards SSL variables as environment variables, so you have to access them in a different manner). There are still some issues with Apache 2′s handling of SSL renegotiation and use of the SSLVerifyClient directive that need to be resolved, but the bottom line is that the demos should now be working again on this server (updated to Apache 2 in the past month.)

In addition, public SVN access to checkout the TAAC code should now be available using the username and password ‘dig’.

Quantifying cwm Variables…

Thursday, December 18th, 2008

Mostly for my benefit, but here are a few examples of how cwm’s N3Rules translate into formal logic:

  • Global universal quantification:
    @prefix : <#> .
    @forAll :x  .
    
    { :x  :a :b . } => { :x  :c :d . } .
    
    :someValue :a :b .

    ∀x (a(x, b) → c(x, d))

    Therefore the above entails the additional statement :someValue :c :d . as :x is bound to :someValue on the RHS.

  • Global existential quantification:
    @prefix : <#> .
    @forSome :x  .
    
    { :x  :a :b . } => { :x  :c :d . } .
    
    :someValue :a :b .

    ∃x (a(x, b) → c(x, d))

    Therefore the above entails no additional statements.

  • LHS universal quantification:
    @prefix : <#> .
    
    { @forAll :x  . :x  :a :b . } => { :x  :c :d . } .
    
    :someValue :a :b .

    (∀x a(x, b)) → c(x, d)

    Therefore the above entails no additional statements.

  • LHS existential quantification:
    @prefix : <#> .
    
    { @forSome :x  . :x  :a :b . } => { :x  :c :d . } .
    
    :someValue :a :b .

    (∃x a(x, b)) → c(x, d)

    Therefore the above entails the additional statement :x :c :d . as :x is unbound on the RHS.

  • RHS universal quantification:
    @prefix : <#> .
    
    { :someValue :a :b . } => { @forAll :x  . :x  :c :d . } .
    
    :someValue :a :b .

    a(someValue, b) → (∀x c(x, d))

    Therefore the above entails (generally) @forAll :z . :z :c :d .

  • RHS existential quantification:
    @prefix : <#> .
    
    { :someValue :a :b . } => { @forSome :x  . :x  :c :d . } .
    
    :someValue :a :b .

    a(someValue, b) → (∃x c(x, d))

    Therefore the above entails the additional statement [ :c :d ] .

Finally, two trickier specific examples: “If there exists a foaf:Person that all (known) foaf:Persons foaf:know, then there exists a :P opularPerson” and, “any foaf:Person that is foaf:knows of all (known) foaf:Persons in a :P opularPerson” can’t be done properly without completely closing the world. cwm cannot do this without artificially closing the world through built-ins.

TAAC in Action

Friday, December 12th, 2008

TAAC Examples

I’ve posted three examples that utilize TAAC in some manner.

You can test any of these yourself if you present the proper client certificate linked to your FOAF file (otherwise, without a client certificate, you won’t be able to authenticate with FOAF+SSL.) If you don’t have a properly configured certificate or FOAF file, Henry Story has a short description of how you can set this up in Firefox 3 with some utilities in the sommer repository. In addition, this server requires you to explicitly provide a certificate (as client certificates are optional).

So How Does TAAC Work?

As mentioned previously, Henry Story has some excellent descriptions of how the FOAF+SSL protocol works in general. TAAC is merely an implementation of this, but goes further to implement an authorization framework. How does this work though? The following diagram goes a ways toward explaining TAAC’s design (especially with regard to authorization) in general.

(A diagram of TAAC)

TAAC acts as a proxy for any URI access within the directory it’s set up in (thanks to mod_python). On every access, it will check the requested URI against the list of URIs having an rein:access-policy (as populated from the file specified in the POLICY_FILE variable). If no access policy exists, TAAC gladly permits normal access without any needed authentication.

If an access policy exists, however, TAAC will immediately attempt to properly reach a successful completion of the FOAF+SSL authentication protocol. I won’t go into significant details here, as Henry Story gives an excellent overview of the protocol (in a somewhat earlier state, though the same principles still apply) on his blog.

Following this, TAAC takes the successfully authenticated URI-token and logs the attempted access to a log file (specified by LOG_FILE). Taking this generated resource describing the access, and the AIR policy attached with the rein:access-policy triple, TAAC then proceeds to run an AIR reasoner over the policy with the given log resource. If the resource describing the access is concluded to be air:compliant-with the associated access-policy, the fact that access was granted according to the policy is logged, and access is granted. Otherwise, the fact that access was denied is logged, and access is denied with a 403 response.

Authentication and Authorization on the Open Social Web with TAAC

Thursday, December 11th, 2008

Update 3: The subversion repository for tmswap has been superseded by the Mercurial repository for air-reasoner

Update 2: The subversion repository should now have public checkouts enabled with the username and password ‘dig’.

Update: The subversion repository is currently not set up for external access. I probably won’t be able to get this resolved until Monday at the earliest. For the time being, you can extract this tarball into the directory you wish to protect, and skip the first two steps.

Recent discussions on the foaf-protocols mailing list have been pushing the FOAF+SSL protocol (discussed earlier on both Henry Story’s and my blogs) towards a more finalized state, pending some clarification of issues with generating the self-signed certificates that serve as the key to the protocol. As has been mentioned on the list and the blog several times, I have been maintaining an independent Python implementation of the FOAF+SSL implementation, and I now feel that the implementation is at a stable enough state to officially offer up instructions for installing TAAC.

Before I give instructions on how to do so, however, let me digress onto an important subtopic, that being the subtle difference of authentication and authorization, as the dichotomy is critical to understanding how TAAC works. FOAF+SSL is fundamentally an authentication mechanism. It provides a method to confirm that the individual presenting the SSL certificate is, in fact, the persons who is also in control of the FOAF resource specified in the certificate. It does not, however, specify any criteria for how access should actually be granted. It only establishes an identity.

TAAC implements FOAF+SSL as one of several authentication mechanisms tested, including a sample implementation of the RDFAuth mechanism, as well as an OpenID-based mechanism. TAAC, however, only implements these authentication mechanisms as a means to the goal of achieving a flexible Semantic-Web-friendly authorization framework. While the language and reasoning is still very much in flux, the idea of TAAC is to permit the creation of distributed access control lists and complex access control policies on top of semantic web data. Indeed, the current implementation (slowly) permits such authorization rules as “Only friends of people I specify as friends or the friends I specify can access this page” or “MIT students who are sophomores or juniors currently taking 6.805 can access this page” without having to maintain cumbersome access control lists, instead deferring to collections of data compiled by others. In effect, we can rely on MIT to maintain the list of current students, and accurately state their class year and the classes they are taking, such that we can merely reason over that data without having to compile an access control list from it.

Installing TAAC

Before You Install: Make sure you have installed the python-openid and pyCrypto >= 2.0.1 frameworks and are running mod_python on your server. While python-openid is not absolutely necessary for FOAF+SSL, TAAC is implemented with an additional vestigial OpenID mechanism that may or may not be integrated as an alternative mechanism to FOAF+SSL for FOAF-based authentication schemes, and hence requires the library

  1. Get the TAAC source code and copy the files and directories enclosed in the directory in which you want to protect some files. The source code is available in an SVN repository at https://svn.csail.mit.edu/dig/TAMI/2008/taac/proxy.
  2. Get the tmswap directory needed for TAAC to properly operate and copy it into the directory containing the TAAC code. The source code is available in an SVN repository at https://svn.csail.mit.edu/dig/TAMI/2007/cwmrete/tmswap.. The tmswap SVN repository has been superseded by the air-reasoner Mercurial repository at http://dig.csail.mit.edu/hg/air-reasoner/. Clone the repository and either:
    • Copy the tmswap directory (if it exists in the current revision)
    • Switch to the refactor branch and copy the contents of the root directory of the repository into a new tmswap directory
  3. Configure TAAC. The primary configuration for TAAC is in taac/config.py. You most probably don’t need to change any of the settings, but you should be aware of their setting, as it impacts the remainder of this installation process. POLICY_FILE is the relative path from proxy.py to the file that links your protected files to the corresponding policy files governing access. POLICY_TYPE is the MIME type of POLICY_FILE (‘text/rdf+n3′ or ‘application/rdf+xml’ most likely). LOG_FILE is the relative path from proxy.py to the file to log access information to. The other settings are not terribly relevant to FOAF+SSL and can be left alone.
  4. Setup your policy file. Your policy file (at the path specified by POLICY_FILE, defaulting to ‘./policies.n3′) is the key to protecting your URIs with FOAF+SSL. The policy file is an RDF file that links resources representing the protected URIs to their corresponding policy files. This is most easily done with the rein:access-policy (http://dig.csail.mit.edu/2005/09/rein/network#access-policy) property (subject to change in future TAAC releases). Here’s a very simple policies.n3 that protects my_file.html:
    @prefix rein: <http://dig.csail.mit.edu/2005/09/rein/network#> .
    
    <./my_file.html> rein:access-policy <./my_file.policy.n3> .
    
  5. Create a policy The policy is the access-policy attached by policies.n3. This policy is written in the AIR language, may be somewhat daunting for someone trying to write their first policy. A couple of sample policies include http://www.pipian.com/rdf/tami/juliette.policy.n3#JulietteLocationDissemPolicy, which permits any valid authentication via FOAF+SSL, and http://www.pipian.com/rdf/tami/juliette.policy.n3#JulietteFOAFDissemPolicy, which allows only friends and friends of friends of Juliette access.
  6. Create your log file with mode 0666. This is usually ‘log.n3′.
  7. Edit your .htaccess file. In order to actually enable the protection, you need to create a .htaccess file that actually adds proxy.py as a mod_python proxy and explicitly enables SSL client certificates to be passed to proxy.py. http://www.pipian.com/rdf/tami/htaccess is a good example for Apache 1.3 SSL servers. Apache 2.0′s mod_ssl requires somewhat different flags to enable passing SSL client certificates (melvin carvalho says that SSLOptions should be set to +StdEnvVars and +ExportCertData).
  8. TAAC should now be set up and running

The above instructions should work, but I have not officially tested them on a clean server.

It is worth noting that TAAC is still very much in flux and is alpha-quality software, and tends to follow the discussions on the foaf-protocols list rather closely, so the above instructions and configuration options may change without warning. Furthermore, there are some caveats with TAAC. In particular, it only currently allows for static policies and static protected URIs. It’s my hope to extend TAAC such that it will have hooks to allow for custom policies dependent on script arguments in the URL, no longer requiring static lists of all possible URIs (so protecting scripts is currently not likely to work well, especially if they take free-form arguments like session variables).

So that hopefully wraps it up a bit, and will get you started on getting a FOAF+SSL implementation set up of your own. TAAC may be clunky now, but the hope is to streamline it such that it’s easily integrated into any Python web application.

Issues with a FOAF-based Authentication System

Friday, September 5th, 2008

As I’ve been working on TAAC, I’ve started to become concerned about potential weaknesses with any FOAF-based identity authentication system (be it RDFAuth, OpenID, or FOAF+SSL) and that’s that ALL systems, with the possible exception of RDFAuth (due to its reliance on PKI), have their weakest link as the integrity of the server hosting the FOAF file. All three systems rely on data in the FOAF file to ‘authenticate’ against, but this poses problems. Take, for example, the following scenario:

Alice runs a website that accepts an OpenID+FOAF system (it works easily well with FOAF+SSL). Bob is a client of Alice, and regularly uses the authentication scheme Alice has implemented. When authenticating, he traditionally authenticates against his FOAF URI, http://www.example.com/bob.rdf#bob. The file bob.rdf has information that links to Bob’s OpenID, http://www.example.com/bob, permitting him to authenticate with his (self-run) OpenID provider.

Eve wants to see the information that Bob gets to see on Alice’s website, and thanks to some shoddy system administration, finds a security hole that allows her to get access to the filesystem. Ignoring the other private information acquired in this way, Alice silently replaces bob.rdf with her own FOAF file that has one simple change: the OpenID associated with http://www.example.com/bob.rdf#bob is now http://www.example.com/eve, which is Eve’s OpenID provider. Eve authenticates agains her own OpenID provider and gets access as Bob to Alice’s website, does her dirty work, and then quietly returns the original FOAF file so that Bob is none the wiser. There’s precious little evidence that Eve intruded, and only an alert sysadmin might note the erroneous login. Meanwhile, Alice is barely aware of any difference other than that the OpenID changed for one particular login.

In summary, as Henry Story admitted (Point 5 in the FOAF+SSL description), these methods only assert that the person accessing any protected resource has ‘write access’ to their FOAF file… But that doesn’t assert that they’re the same person.

With the common weakness of many self-hosted domains having poor security protocols, a FOAF-based Authentication System could be disastrous. The only plausible ‘stopgap measure’ might be requiring the system as a whole to cache the authentication credentials (e.g. OpenID, public key URL, or X.509 hash) and refuse access to people who present credentials that have changed. This adds a layer of complication to the mix as well, as it would require out-of-band communication to ensure that the ‘cached’ credentials are removed or replaced with new credentials manually… And even so, there is still the risk of incorrect authentication credentials being presented absent any evidence they are incorrect (e.g. Eve logs in before Bob ever does, or does so in the period where Bob’s cached credentials have been deleted, establishing her credentials in place of his own). There are ways around this, but they seem a bit kludgy to me (e.g. using the old OpenID/X.509 cert, which may not exist due to security risks, to authenticate the new one; checking against a public key server to see if there’s any indication that a public key has been revoked/replaced).

Are we sure that a FOAF-based Authentication System is secure enough? At the very least, it seems like we need proactive sysadmins maintaining the system to ensure it remains secure… And can we afford that?

Back to TAAC

Wednesday, September 3rd, 2008

So I’ve finally got a chance to return to working on TAAC, an access control mechanism for the web that integrates FOAF-based identification with access control rules. I’ve been doing some more thorough testing on the slow-down issues explained two posts back, and found that the slowdown, while significant, appears to be about 13 seconds or so, on average, on this server, a Linode virtual private server which I expect typifies an average web host (if not better than average).

Several attempts at profiling (aside from creating significantly increased processing times, up to 10x longer) led to the conclusion that, in fact, most of that time is spent in the second phase (post-authentication, during reasoning), which is where I’d EXPECT the slowdown to be. Granted, this now becomes a problem that can be solved in part by Moore’s Law, but even so, some speedups would be nice to allow it to be implemented today. I plan on running the same code on a relatively modern test server that’s dedicated to doing more or less supporting these tests, so it will likely run faster on there.

It’s worth considering that this is running on a variant of the cwm reasoner on top of a re-implemented Rete reasoner, and, seeing how it’s all in interpreted Python, rewriting it in compiled C code (or even Java) would probably see a significant speed-boost, but that’s not a terribly productive line of work (except where trying to actually push out a commercial product). It might also be worth exploring other reasoning approaches to improve the speed.

Even so, I’m going to try looking at the other authentication approaches to see what the benefits and costs of them are… I think the more RESTful approach without OpenID may have some arguments in favor of it, but I doubt they’re going to be based solely on speed.

OpenID and Other Musings

Saturday, May 10th, 2008

So I’ve returned after some time at MIT where I was getting a bearing on where I’m going next with my part in the TAMI project, and I’ve come out with a few goals:

  • Finish tinkering and profiling the current TAAC setup.  This has already resulted in some interesting results, namely, that the planned OpenID setup is really slow.  To be fair, I’ve also only tested it with one physical setup, so I need to test a couple other servers, and so on. Unfortunately, it seems that the number of round-trips needed to get the FOAF file, get the OpenID identifier, and then establish a shared secret with the OpenID provider takes way too long. We can cache some of this (especially the former two), and can even avoid it all with a cookie established at the end of the first authentication, but the first sign-on takes entirely too long to process on this VPS.
  • Examine other authentication methods.  Since the key right now is shortening the time needed to authenticate against one’s FOAF URI, there are several other methods out there that may cut out the authentication issues, including RDFAuth and Toby Inkster’s FOAF+SSL.  The former has less round-trips (as there isn’t the cost of setting up the SSL connection), but the latter doesn’t require the maintenance of a PKI, and can be done with self-signed certs.  I hope to be talking with Toby and Henry Story, among others, to see what’s been done with FOAF+SSL, and to see how we can work that authentication method in.
  • Get a better idea how the reasoner engines work for the AIR reasoner. Seeing as my understanding is not terribly good at their reasoning methods, I’m going to be trying my hand at reimplementing a Rete system, a TREAT system, and a backwards chainer…  In Erlang (or at least do such for a Rete).  Why Erlang?  I think it will give me a good idea about not only how the system’s productions are called (as a network of alpha and beta nodes is rather nicely done in a functional framework), but it will give me a better understanding of the problems with trying to make a Rete concurrent (and why TREAT is ostensibly better at concurrency).  With Erlang’s BUILT-IN concurrency and light-weight threads, rather than a lock-based concurrent framework like in the Python we’re currently using, there’s no additional cost to making the functions concurrent if I take the time to do it in Erlang.  Luckily for me, I’ve worked with the Mozart Programming System in my programming languages class, and Erlang isn’t too different from that…  Plus, it’s another programming paradigm/language under my belt.
  • Implement cwm built-ins into the AIR reasoner.  Yosi and I have already discussed some of the issues with doing so, so it’s just a matter of my understanding the code that’s standing in the way of my adding such.  Thus the reason for the above, and studying the existing code.

What’s Up…

Tuesday, April 15th, 2008

I suppose it’s about time for me to announce a status report of what I’m up to lately…

First: I’ve picked up my Pixonomy project again, and while I’ve JUST put it on hold again, I’ve progressed the library with a refactoring and I just need to do some cross-platform hacking (to get it to compile nicely on OS X as a universal binary), and implement a couple of search functions to actually get it to a state where I can actually start programming client software in GTK+ or wxWidgets (I haven’t decided which) to demo the library.

Second: I’m currently taking a break from Pixonomy to work on a nifty font for OpenTTD.  Since I noted that they finally implemented TrueType support and Unicode support in 0.5.0 (which I’d tried to implement before, but never really got around to), I figured I’d try my hand at something fun.  After I get all the lower case characters done preliminarily, I’ll start adjusting the bounds and kerning by testing in game…

Third: Been watching some of the subs for this season, and I think Allison & Lillia does seem to have some promise, but we’ll see where it goes.  Zettai Karen Children, though, is not so much up my alley.  We’ll see where the other series I want to check out go (namely, Library War)…  There’s a few others that might be good, too.

« Older Entries