Peter Van Roy's Feeedback on the CEDAR Project's Review Meeting of Nov. 28, 2014


I recommend that you initiate a STREP on Scalable Semantic Web. The major problem to solve, in addition to the usual problems of distributed systems (partial failure, localized resources) is scalable distributed consistency. In addition to the inconsistency inherent in the Semantic Web data, there will be an additional source of inconsistency due to the distributed structure of the graph (partitioned data and latency). Because of its size it is difficult to move the data around; the amount of data overwhelms the connection bandwidth. So you need algorithms that send much smaller amounts of information, while still doing the essential work. Constraints may help here since they are compact intensional representations. Much work has been done on scaling up constraint programming in recent years. I am not directly involved in this any more but I know of this work due to Yves Deville in our department who works with Pascal Van Hentenryck.

For such a STREP you would need partners with different expertise: graph databases, distributed consistency, big data, constraints. You should also have one or two partners who are close to the WWW Consortium, so that you have a path to getting your results generally accepted. (Also, the reviewers will not like it if you don't have any of the major Semantic Web players as partners.) I would guess that some of these people would be intrigued by your project and would like being partners. There are one or two calls per year (spring and fall) where this proposal could fit: you need to look at the precise objectives of each call.

I am partner in the SyncFree project that works on consistency for large-scale distributed applications. Marc Shapiro is the coordinator. The project is based on a recent development called "Conflict-Free Replicated Data Type" (CRDT) which uses properties of join semilattices (monotonicity and convergence) to compute with distributed data with extremely low-cost synchronization. One of the partners in the project is Basho, who develops the Riak cloud database. Riak 2.0 has CRDTs built in. A very interesting one is called the Riak DT Map. This is very similar to a ψ-term: a dynamic nested record with labeled fields (see also this Java API). Merging two of these (which is done as part of distributed conflict resolution, where two independent results are combined) is a subset of ψ-term unification. What is missing is they do not take into account cycles and they have no concept of OSF theory. But the similarity between the two concepts is uncanny. So Riak could in fact serve as a data store for your project.