3 August 2010
During this summer, six students are working on Gephi with the Google Summer of Code. They contribute to Gephi by developing new features that will be integrated in the 0.8 version, released later this year.
The purpose of the Graph Streaming API project, run by André Panisson, is to build a unified framework for streaming graph objects. Gephi’s data structure and visualization engine has been built with the idea that a graph is not static and might change continuously. By connecting Gephi with external data-sources, we leverage its power to visualize and monitor complex systems or enterprise data in real-time. Moreover, the idea of streaming graph data goes beyond Gephi, and a unified and standardized API could bring interoperability with other available tools for graph and network analysis, as they could start to interoperate with other tools in a distributed and cooperative fashion.
With the increasing level of connectivity and cooperation between systems, for a system that aim to be interoperable, it is imperative to comply with the available standards. Graph objects are abstractions that can represent a wide range of real-world structures, from computer networks to human interactions, and there are a lot of standards to exchange graph data in different formats, from text-based formats to xml-based formats. But the real-world structures are constantly changing, and the current formats are not suitable to exchange such type of dynamic data.
A lot of well-established systems already stream data to its users using a streaming API. Twitter for example defined a Streaming API to allow near-realtime access to its data. They are using two different formats: XML and JSON, but JSON is strongly encouraged over XML, as JSON is more compact and parsing is greatly simplified.
We are not the first to implement a Graph Streaming API, and another very interesting experience is the GraphStream Java Library. It is composed of an API that gives a way to add edges and nodes in a graph and make them evolve. The graphs are composed of nodes and edges that can appear, disappear or be modified, and these operations are called events. The sequence of operations that occur in a graph is seen as a stream of events.
So, as other people already had successful experiences with graph streaming, why not start our work based on these experiences? That’s what we are doing, and beyond finding these experiences very useful, we are also trying to be compatible with the available work. The first Gephi Graph Streaming release will use two formats: JSON for flexibility, and a text-based format, based in the GraphStream implementation.
The first version of the Graph Streaming features will be available in the next release of Gephi, but it’s already possible to taste some of these features. To illustrate how simple it will be to connect to a master, the following video shows Gephi connecting to a master and visualizing the received graph data in real time. The graph in this demo is a part of the Amazon.com library, where the nodes represent books and the edges represent their similarities. For each book, a node is added, the similar books are explored, adding the similar ones as nodes and the similarity as an edge.
The Graph Streaming specification goes beyond the simple fact that a client can pull data from a master: in fact, clients can interact with the master pushing data to it, in a REST architecture. The same data format used by the master to send graph events to the clients is used by clients to interact with the master.
In the next example, we will transform Gephi in a master to provide graph information to its clients. At the Streaming Tab in the Gephi application, you can access all the features of graph streaming. You can connect to a Master by clicking the ‘+’ button, but you can also transform your Gephi in a master by right-clicking the “Master Server” and selecting “Start” (You are not limited to a single master by host: each Gephi workspace can be available as a master). By default, the HTTP server will listen at port 8080 in plain HTTP, and at port 8443 using SSL. The server path depends on your workspace: each workspace uses a different path. You can configure these parameters (and also Basic Authentication) at the “Settings…” button:
Now, you can connect to it using some simple HTTP client. For example, you could use curl to see the data flowing. First of all, open a shell window and execute the following command:
With this, you are connecting to your workspace at Gephi. If the workspace is empty, you will receive no data, but you will remain connected, so you will receive all events from now.
Now open another shell prompt, and with the following commands, you could see a triangle appearing at Gephi:
curl "http://localhost:8080/workspace0?operation=updateGraph" -d $'
At the same time, all events will be sent to your connected client, in the other shell window.
With the following commands you can retrieve some of the data:
And you could start manipulating your graph through command line, as you like. There are other event types for changing and removing edges and nodes, for more information about them see the current status of the JSON Streaming Format, available at this page. We recall that this format is subject to changes, as the API was build to be very flexible and more requirements are being added to it.
But what about connecting two different Gephi instances together? One instance will be master, and the other client. Using the Graph Streaming API, a change in a graph at the master’s workspace should cause a change in the client’s workspace, and a change at the client’s workspace will cause it to send requests to the master to update its graph accordingly. Both instances working in a distributed mode. In fact, different people could work in a distributed mode to construct a graph: it’s the Collaborative Graph Construction.
My personal impressions about it
For me as a researcher, Gephi has the potential to become a de-facto standard for manipulating and visualizing large scale graphs. I believe that the research community is still lacking a high-quality, general-purpose, community-supported framework for exploratory analysis of large-scale dynamical graph data, and I believe that Gephi has the potential to fill this gap. I’m working also in collaboration with ISI Foundation at the SocioPatterns project, an example of research use case that currently uses Gephi for exploratory data analysis and visualization. The support for dynamic networks, the readiness of the Gephi data model for dynamical update of graph topology and attributes and, in a near future, the support for graph streaming are exciting features that suit very well the large-scale real-time data sources we are dealing with. The potential for processing live streams from our experiments is a unique feature that we are eager to see working.