Community Design ~

Guest blog post from Dr. Tominski who accepted to review Gephi 0.7alpha4 for us.

Christian Tominski received his diploma (MCS) from the University of Rostock in 2002. In 2006 he received doctoral degree (Dr.-Ing.) from the same university. Currently, Christian is working as a lecturer and researcher at the Institute for Computer Science at the University of Rostock. Christian has authored and co-authored several articles in the field of information visualization. His main interests concern visualization of multivariate data in time and space, visualization of graph structures, and visualization on mobile devices. In his research, a special focus is set on interactivity, including novel interaction methods and implications for software engineering.

Recently, I stumbled upon the Gephi Project – an open source graph visualization system. As I’ve done some research in the area of interactive graph visualization, I was eager to see how Gephi works and if it brings some new concepts or if it’s yet another graph visualization system. I’ll share my thoughts on Gephi from three perspectives. The first one is the user perspective. I’ll take the role of a user who is interested in getting a visual depiction of some graphs. Secondly, I’ll take the role of a developer and shed some light on the aspect of software engineering. And finally, I’ll be a scientist and try to foresee if and in which regard Gephi might have some impact on visualization research.

The User’s Perspective

Gephi has been designed with the users and their needs in mind. The system welcomes its users with a familiar look and feel. It is quite easy to load graph data into the system. Many of the known file formats for graphs are supported, as for instance, DOT, GML, GraphML, or Tulip’s file format TLP. A nice thing about the data import is that an import report provides essential information about the import process (e.g., number of nodes and edges, edge-directedness, potential problems, etc.). Once imported, the graph is shown as nodes and links in a main view, and several complementary views provide additional information.

The main view is the core for visual graph exploration. It allows users to zoom in, to select nodes, to adjust node size and color, to find shortest paths, and to access attributes of nodes and edges. In addition to letting users set sizes and colors manually, the system can also set these automatically based on attributes associated with nodes and edges. What is called “Partition” in Gephi is used to assign unique colors to nodes and edges based on qualitative data attributes (e.g., class affiliation). Quantitative data values can be mapped to size and color of nodes, edges, and labels using the “Ranking” tool. All these tools are customizable. It is worth mentioning, that Gephi provides some nice user controls to parameterize the color coding.

Gephi also supports graph editing, i.e., insertion and deletion of nodes and edges as well as manipulation of attribute values. What is missing in terms of editing the data is the possibility to add (and delete) attributes, for instance to generate some derived data values using simple formula.

A key aspect in graph exploration is the layout of node and edges. As it is usually unclear what will be the best layout for a given graph, Gephi offers various layout algorithms to choose from. While a layout is being computed, the main view constantly updates itself to provide feedback of the progress made. A big plus is that users can interrupt the layout algorithm once they deem the result to be ok or if they find that it might be more suitable to use the current result as the initial setup for another algorithm. This way users can easily tune the layout to fit the graph and the particular needs. Users may put the finishing touches to the layout by moving nodes manually in the main view.

Once a suitable visual representation has been created, the final step is to export nice pictures of the graph. To this end, Gephi follows the philosophy of providing a dedicated export interface with many options to create high quality printouts.
People that have been working with larger graphs might know that some computations on graphs (including layout computation) are quite complex and take some time. While other systems are blocked during computation and in the best case provide a progress bar, Gephi is different. Long running calculations are concurrent to the main application. From my point of view, this is one of the strongest points of Gephi, the system does not block during costly computations. The benefit for the users is that they can always interact, for instance to initiate some other computations or to cancel running ones when they recognize that a re-parameterization would yield better results.

Concurrency is Gephi’s solution to offering computations of statistics about the graph. Currently, Gephi supports a variety of classic graph statistics including degree distribution, number of connected components, and others. Based on data attributes and computed statistics, the graph can be filtered to reduce nodes and edges to those that fulfill the filter criteria. In a dynamic filtering UI, several filters can be combined using drag’n’drop and thresholds can be manipulated easily, for instance via sliders. Besides using filtering for data reduction, Gephi also provides basic support for graph clustering. However, the currently implemented MCL algorithm is still experimental. But there is the possibility to manually group nodes to build a hierarchical structure on top of the visualized graph. Yet, this is quite cumbersome for larger graphs. Additional tools are needed to support the user in creating a navigable hierarchy on top of a graph. Configurable clustering pipelines that combine several strategies for clustering (e.g., based on attributes or based on bi-connected components) in addition to a clustering wizard user interface would be helpful.

In summary, I see a much potential in Gephi, the overall shape of the system impressed me – me as a user. I personally felt it easy to work with Gephi and explore some of my own data sets and some provided at Gephi’s website. Given the fact that the version I’ve worked with is 0.7 alpha, there is also much space for improvements. In the first place I would like to mention the navigation of the graph. The main view provides just basic zoom and pan navigation, which is even imprecise in some situations. Navigation tools like those provided in Google Earth and navigation based on paths through a graph would be really helpful. Moreover, I was missing the concept of linking between views. Selecting an element (node or edge) in one view should highlight that element in all other views. Right now this is not really an issue as the number of views seen in parallel is quite low. But once additional views are needed, for instance to focus on data attributes in a Parallel Coordinates Plot or to visualize the cluster hierarchy in a dedicated view, or when one and the same graph is shown in parallel in two or more main views for comparing different analytic results, linking will be crucial for user experience. But these things are not too complex and should be easy to integrate in future versions of Gephi. Another aspect regards highlighting in the main view: instead of marking the selected node, all non-selected nodes faded out to focus on the selected node. This implies rather big visual changes because all but one nodes change their appearance when a single node gets selected and deselected.

Pros: Cons:
  • Easy graph import and export
  • Many options for visual encoding
  • Various layout algorithms to choose from
  • Support for dynamic filtering
  • Computation of graph statistics
  • Basic support for graph clustering
  • System does not block during long running computations
  • Graph navigation can be improved
  • No linking among views
  • Few visual glitches
  • Still an alpha version with bugs here and there

The Developer’s Perspective

Now let me switch to the developer’s view. Gephi is open source software so that everybody can participate in improving the system or can adapt the system to personal or business needs. Gephi seems to be very well designed on the back-end. The project is based on the Netbeans platform and the Java language. It is subdivided into a number of modules that define several APIs and SPIs and that provide implementations of these interfaces. Thanks to the modular structure, Gephi can be extended quite easily. The best way to do so is to implement plugins. Plugins can be used, for instance, to add further layout or clustering algorithms, statistical computations, filter components, or export methods. The modular structure also allows for using only specific parts of the Gephi project in one’s own projects. The Gephi Toolkit is a good example. It is not an end-user desktop application, but a class library that provides all the functionality of Gephi to those who want to reuse Gephi’s functionality and data structures in different ways.

As I’ve mentioned in the user perspective, the way how Gephi deals with long running computations is a big plus. Given the fact that aspects of multi-threading are inherent in the system from the very beginning and are manifested at the systems core, I sincerely hope – no, I’m quite sure that Gephi will not run into all the problems that are likely to occur when multithreading is integrated into an existing single-threaded system, as I have experienced it myself. Also I conjecture that others will find it much easier to implement concurrent non-blocking extensions of the system simply by following the way how existing code handles things in Gephi.

As Gephi is split up into many different modules, it took me a while to get accustomed to the system and to learn which functionality can be found in which module. But I have to add that I had no prior experience in Netbeans platform development and the module concept that is used there. I also found that the code documentation could be improved in several parts of Gephi’s sources. On the other hand, the Gephi website provides informative wiki pages with various examples and tutorials.

My view from the developer’s perspective can be summarized as the following pros and cons:

Pros: Cons:
  • Open source
  • Modular structure
  • Well defined interfaces
  • Extensible via plugins
  • Inherently multithreaded
  • In-code documentation can be improved

The Scientist’s Perspective

As a scientist I’m not so much interested in developing fully-fledge end-user software, but in developing solutions to scientific questions and in publishing the results. A difficulty in interactive visualization is that usually one needs a broad basis of fundamental functionality to be able to develop such solutions. Previous attempts of establishing a common infrastructure for interactive data exploration made notable progress, but eventually did not fully succeed or are no longer actively maintained. This is due to the fact that a single researcher usually simply does not have the time to do decent research and at the same time to maintain a larger software project.

I personally feel that Gephi can become such a fundamental infrastructure. Maintained by an active community, the system allows researchers to focus on solutions in form of plugins, while they can utilize the functionality that the system provides. Visualization researchers will be happy if they can simply plug in new visualization techniques as additional views, test new layout algorithms, and experiment with new clustering methods. Moreover, new solutions can be easily disseminated to real users in the community. This might prove beneficial when it comes to acquiring early user feedback or when more formal user evaluation is needed prior to publishing new techniques and concepts.

A big issue in visualization research is visual analytics, that is, the combination of analytical, interactive, and visual means to facilitate making sense of large volumes of data. In terms of analytic means, a goal is to break analytic black boxes and make analysis algorithms interactively steerable. With the architecture of Gephi, where parameterizable algorithms run concurrently and provide feedback in form of intermediate results, I believe this goal can be reach in the future. A thing that I’m curious about is if it is also possible to come up with concepts that allow for plugging in new interaction techniques. As interaction is usually quite tightly bound to a view, I wonder if interaction could be implemented as independent plugins as well, and if novel interaction concepts will be supported in the future (e.g., touch interaction)? Furthermore, aspects of interactive collaboration of multiple users working to solve a common analysis problem could be of interest. A question related to the visual side is whether it is possible to use Gephi with different displays and display environments such as tabletop displays, display walls, smart phones, or multi-display environments?

A facet of graph visualization that I did not mention in the user’s perspective as I felt it more suited to be mentioned here is dealing with dynamically changing graphs. Visualization of time-varying graphs is a hot research topic and Gephi is about to face this challenge. There is preliminary support for exploring time-dependent graphs via a time slider. But there is more to this that just browsing in time. Concepts have to be integrated to support easy comparison of multiple snapshots of a graph and to highlight significant changes in the development of a graphs history.

Let me try to put my thoughts into a pros and cons list:

Pros: Cons:
  • Potential infrastructure for visualization research
  • Researchers can focus on solutions in form of plugins
  • Potential to use community for user feedback and evaluation
  • Partial results for current research questions (graph clustering, steerable algorithms, dynamic graphs)
  • Nice playground for experimentation and testing new ideas
  • Unclear if new and alternative technologies will be supported

Summary

Since I’ve put hands on Gephi I’m infected. Maybe I’m dazzled by the beautiful demo video or the nice pictures that have been generated using Gephi, but in my opinion Gephi has the potential to become a big player in interactive visual graph exploration and analysis. From all perspectives that I’ve taken I see many positive things – and plenty of room for improvements or additional features. I do hope that the people behind Gephi will continue their work to the benefit of all users, developers, and researchers.

Related Stuff

There are many other systems and frameworks out there that do a great job in interactive graph visualization or in supporting it as a toolkit. I would like to give credit to these systems, because they can be the source of many ideas and much inspiration:

To go further about Gephi design, see also this article about semiotics.

2 comments »

Plugins ~

Martin Škurla

During this summer, six students are working on Gephi with the Google Summer of Code. They contribute to Gephi by developing new features that will be integrated in the 0.8 version, released later this year.

My name is Martin Škurla and this summer I was working on GSoC project called “Adding support for Neo4j in Gephi”. In this article we will look at implemented features including these under the hood, pictures of dialogs, common use cases and future plans.

 

Gephi project

At first I want to make quick introduction into Gephi project. Gephi is Open Source Visualization Platform build on top of the NetBeans platform. It is written in Java so you can run it on various Operating Systems including Windows, Linux, Mac OS. It supports many interesting graph analysis capabilities including:

  • Real-time visualization
  • Layout
  • Metrics
  • Dynamic network analysis
  • Cartography clustering and hierarchical graphs
  • Dynamic filtering

The story so far

The main idea of my project is to add support for Neo4j in Gephi. This means the ability to transform the Neo4j graph into Gephi graph. In fact, both graph models are different so the first task was to make mapping between Neo4j graph items and Gephi graph items and vice versa.

There was also a mismatch between types supported in Neo4j and these supported by Gephi. This mismatch was solved by adding new “List” types into Gephi, so now every type in Neo4j has its appropriate type in Gephi.

There were also some changes under the hood which are not visible to end user, but must be defined and implemented. The most interesting thing is adding “Delegating mechanism”. This mechanism is responsible for getting values from storing engine (Neo4j) as well as manipulation with data. In fact during the importing process, graph representation of Neo4j graph is created in Gephi, but all values are not stored directly, but they are queried using delegating mechanism.

Another minor tasks were to customize the open dialogs used for importing local Neo4j database and debugging the imported database. The open dialog for importing accepts only valid Neo4j database directories. I defined valid Neo4j database directory structure and every valid directory now includes picture of Neo4j in the open dialog. User is able to open only valid Neo4j directories in the process of importing. The open dialog for debugging accepts only Java class files that can be used for debugging process. This simply means they have to implement required interface and have public nonparam constructor. Every valid class file will have Neo4j picture and after selecting a valid debug file, Target and Visualization options will be automatically filled based on data from selected class file.

 

Open Neo4j directory dialog customization

Open Neo4j debug file dialog customization

 

Neo4j integration

Menu integration

All possible actions started in menu. As we can see, this is the entry point to import from, export to and debug the Neo4j graph. Both importing and exporting support local as well as remote Neo4j databases.

Importing

Whole graph import dialog

Importing process consist of 2 approaches:

  • whole import
  • traversal import

Whole graph import dialog is designed for importing whole graph. We can customize the rules responsible for returning nodes by defining filtering expressions. For example previous dialog can be used when we want to find all people working on project Gephi with maximum age 30 years. Only people with at least 5 years of experience and those which have driver licence types A, B and C will be included.

Let’s have a deeper look at the dialog:

  • Property key is the name of property we want to filter
  • Property value is the value which will be compared to actual Node property value using chosen operator. Values will be automatically converted into appropriate types and if the value cannot be converted, the node will not be included into graph. All types supported in Neo4j are supported in this dialog. We can also see the support for array types in the last filter expression.
  • Operator will be applied on the final expression and if the expression is evaluated to true, node will be included
  • Match case means the ability to compare String, char, String[] and char[] types with respect of the same case
  • Restrict mode is used to restrict some nodes. Imagine we have people stored in database which have only subset of required property names used in filtering expressions. If the Restrict mode is on, only nodes which have all property names and all filtering expressions evaluated to true will be included. If the Restrict mode is off, every node which has any subset of required property names (even empty subset) will be included if all the filtering expressions applicable to the subset will be evaluated to true.

All the filtering expressions are combined together using AND and the list of current supported operators consist of: ==, !=, <, <=, >, >=.

In fact, usefulness of adding new operators as well as including OR and other useful import options is the main idea behind Questionnaire which is part of this article.

Traversal graph import dialog is designed for importing any subgraph using traversal capabilities of Neo4j v 1.1. Traversal import adds additional options:

  • Start node can be set in two ways, either by its id or by its indexing key and value pair
  • Order can be set to depth or breadth first algorithms
  • Max depth can be set to concrete number or to end of graph
  • Relationships can be restricted too. We can set any combination of Relationship types and directions which should traversal include. The list of Relationship types is dynamically filled from database with existing values.

 

Traversal graph import dialog

This was the quick summary of Gephi Neo4j importing capabilities implemented in the project. We focused on more features and one of them is the support for exporting. We can export any loaded graph into local or remote Neo4j database. The exporting process can be customized in similar way as importing.

Exporting

Export dialog

Exporting means opposite process to importing. Previous dialog shows exporting options as well as validation. We can customize exporting process by setting:

  • From column is used to set the RelationshipType to appropriate values from any of Gephi edge columns. During importing Neo4j graph, column with name “Neo4j Relationship Type” is automatically created.
  • Default value is used in the case when processed Gephi edge does not have value in selected From column
  • Export Node columns is the set of Gephi columns in node table which will be exported
  • Export Edge columns is the set of Gephi columns in edge table which will be exported

Remote importing/exporting

The only difference between local and remote importing/exporting is the existence of Remote dialog, where we need to set following connection information:

  • Remote database URL
  • Login
  • Password

All of them must be filled in order to successfully import/export remote graph.

Remote import/export dialog

Delegation process

Nodes values exploration (click on the image to enlarge)

As we can see from previous picture, we can very simply explore all the node and edge values. This is exactly the place where delegating mechanism is used. All values are in fact not stored directly in memory in some kind of Gephi data structure, but the storing engine (Neo4j) is requested for actual values every time we need them.

Debugging

Debugging in action

We can see debugging in action in previous picture. The dialog is initialized with data from chosen debug class file, but we can change all of them at the runtime too. Any change in options will automatically update graph visualization. We can change visibility of nodes and edges as well as colors for both nodes and edges. User proceeds to next step of debugging/traversal by clicking on the Next button.

Use cases

That was the quick summary of all implemented features and now we can summarize common use cases every user can be interested in.

Visualizing Neo4j graphs

One of the main ideas of my project was to implement the ability to visualize Neo4j graphs, even big ones. As we saw from the dialog pictures, we have many options how to customize the importing process including filtering. After the import we can use all the rich graph analysis features Gephi provides.

Analyzing only part of the whole graphs

Quite common use case is to analyze only part of the graph, which is possible in Gephi too. We can take advantage of traversing where we can set starting node and other traversal options. After that we can visualize and analyze only part of the graph.

Export graph stored in text files/databases into Neo4j

Another use case could be exporting graphs stored in graph text files or relational databases into Neo4j. In fact, every graph loaded into Gephi can be easily exported to Neo4j database. Importing formats depends on Gephi abilities themselves, currently following formats are supported:

  • Text formats: GEXF, GDF, GML, GraphML, Pajek NET, GraphViz DOT, CSV, UCINET DL, Tulip TPL, XGMML
  • Relational databases: MySQL, PostgreSQL, SQL Server

Future plans

There are more things which we want to implement, including:

  • support for Gephi Toolkit, which is in general set of Gephi core libraries which you can use in your own Java projects for graph visualization and manipulation
  • implementing proof of concept Web application using both Gephi Toolkit & Neo4j to manipulate with Neo4j database & show results (probably using GWT)
  • more features, bug fixing, performance optimizations

Questionnaire

One of the big advantages of Gephi is the fact that it is developed as Open Source project. We want to add additional features according to user requests and their opinions. That’s why we created questionnaire focusing on usefulness of proposed additions. We will be very happy if you fill the questionnaire because it is very valuable source of information and we can focus on features Neo4j users think useful. Please fill in the questionnaire.

Conclusion

I am very happy that I can be part of the Gephi developer community and introduce integration with Neo4j. During this summer I learned a lot and I am proud that I was chosen as GSoC student. The fact is that none of these features can be done without great help of my mentors, so big thank to both of them: Mathieu Bastian & Tobias Ivarsson.

If you are interested in and want to test the code, you can download source codes from my branch using bzr branch lp:~bujacik/gephi/support-for-neo4j

All the pictures were made on data stored in testing Neo4j database which can be created using Java SE project and you can download it using:
bzr branch lp:~bujacik/+junk/testing-new-neo4j-traversal-api

 

Martin Škurla

Download this article in PDF.

Comment it »

Functionality ~

Cezary Bartosiak

During this summer, six students are working on Gephi with the Google Summer of Code. They contribute to Gephi by developing new features that will be integrated in the 0.8 version, released later this year.

 

The project which is done by Cezary Bartosiak focuses special attention on further development of dynamic network analysis (DNA) in Gephi. The aim is to create a framework which would make it possible to build and query a dynamic graph with use of proper API. It has got a practical purpose, for instance analyzing evolution of networks (see in particular M. Argollo de Menezes, A.-L. Barabási Fluctuations in Network Dynamics) or dynamic networks visualization. The article shows the most important features provided by this GSoC project.

 

In the current 0.7 version we can import dynamic graphs written in GEXF syntax and then filter them using Timeline component. Unfortunately, it only filters graphs topologies and that means hiding nodes and/or edges.

The obvious step is make it possible to handle dynamic changes not only of graph topology but also attributes connected with nodes and edges. It can be done by creating a proper API. This API could be used by other modules, like Statistics to make dynamic versions of them. Computing metrics like Degree Distribution or Clustering Coefficient for each time interval in the time series has got a great interest to analyze graphs within time.

So, getting down to brass tacks, the most important tasks are:

  • A data structure to host dynamic attributes efficiently which would make it possible to present them in Data Laboratory module.
  • A Dynamic API which has got the following features: the Dynamic Graph Decorator, that wraps the graph and a time interval, returns static graphs copies for given time intervals, attributes values arrays for given nodes/edges and time intervals.
  • Adapting Metrics framework to use Dynamic API to propose dynamic versions of existing metrics.

There are also additional features, which will be done in the future (probably they will not be included in the nearest release):

  • Dynamic visualization of attributes.
  • Dynamic version of the Ranking module – dynamic visualization attributes transformation.

I’ll try to shortly describe how these features are done.

Dynamic attributes

It is a very interesting task from a programmer’s point of view since it requires implementing a complicated data structure like Interval Tree (see also Antoine Vigneron – Segment trees and interval trees). But also users will judge it necessary. The purpose is to make it possible to read dynamic attributes from GEXF files and host them efficiently. Thanks to that we are able to get values of attributes of different time intervals. It goes without saying how powerful feature it is. To show how it is working, let’s consider one node (written in GEXF syntax):

<node id="1" label="Some node">
<attvalues>
<attvalue for="0" value="abcdefgh"/>
<attvalue for="2" value="1" end="2009-03-01"/>
<attvalue for="2" value="2" start="2009-03-01" end="2009-03-10"/>
<attvalue for="2" value="1" start="2009-03-10"/>
</attvalues>
</node>

As we can see we have got one dynamic attribute (id = 2) which has three different values in different time intervals. The first interval starts in the “negative infinity”. We simply assume that it only ends, never starts. But if we have got some bounds, for instance, a related graph has its start and end times, this attribute would “start” in the same moment as the graph. It is rather intuitive. The second interval exists from 2009-03-01 to 2009-03-10 and the last one exists from 2009-03-10 to “positive infinity” or graph’s bound.

After importing this to Gephi we can simply get values of ANY time interval we want, for example [-inf, +inf]. But we should know how to estimate a final value. In the above example we have got three values: 1, 2 and 1. To solve the problem which of them should be returned, we provide a set of estimators like AVERAGE, MEDIAN, MODE, SUM, MIN, MAX, FIRST and LAST. Each of them has got different behavior that depends on a type of attribute, i.e. for real numbers they behave like in statistics.

So, users will be able to get values of different time intervals on demand, for instance in Data Laboratory module or (in the future) see them on the screen as a part of a rendered graph. For instance we have got some attribute like priority. A potential user will be able to choose between several possibilities like: nothing (it means this attribute should not be visualized), color, stroke, thickness etc. It means, for instance, that if some node has got this attribute close to its upper bound its stroke thickness would be very high. And, on the other hand, if one node has got this attribute close to its lower bound only its internal color could be visualized.

Metrics framework

For now it is possible to count a set of important metrics but all of them take a “static graph” into consideration. The idea of dynamic metrics is then to execute the static ones in a loop, where the graph changes according to time interval. The following screen shows that use of these additional metrics is similar to their static brothers:

Dynamic Metric (click on the image)

In the screen we can see only Dynamic Degree Power Law, but of course every dynamic metric will be implemented (during writing this article this module was still under development – it also means that the final product could differ from this one presented above). So, user inserts important information like time interval etc. and gets a separate report for every time interval. What are the other results?
The result for each node/edge is written in the graph, so one can see this in Data Laboratory.
General result is also written and presented in the report.

Conclusion

Evolution of networks, network dynamics and dynamic network analysis are hot topics nowadays. There is growing interest in studying these issues. It causes that there is bigger and bigger need of DNA analysis tools. In my opinion Gephi is heading towards being one of the best…

Cezary Bartosiak

Comment it »

Functionality ~

andre-panisson

During this summer, six students are working on Gephi with the Google Summer of Code. They contribute to Gephi by developing new features that will be integrated in the 0.8 version, released later this year.

The purpose of the Graph Streaming API project, run by André Panisson, is to build a unified framework for streaming graph objects. Gephi’s data structure and visualization engine has been built with the idea that a graph is not static and might change continuously. By connecting Gephi with external data-sources, we leverage its power to visualize and monitor complex systems or enterprise data in real-time. Moreover, the idea of streaming graph data goes beyond Gephi, and a unified and standardized API could bring interoperability with other available tools for graph and network analysis, as they could start to interoperate with other tools in a distributed and cooperative fashion.

 

With the increasing level of connectivity and cooperation between systems, for a system that aim to be interoperable, it is imperative to comply with the available standards. Graph objects are abstractions that can represent a wide range of real-world structures, from computer networks to human interactions, and there are a lot of standards to exchange graph data in different formats, from text-based formats to xml-based formats. But the real-world structures are constantly changing, and the current formats are not suitable to exchange such type of dynamic data.

A lot of well-established systems already stream data to its users using a streaming API. Twitter for example defined a Streaming API to allow near-realtime access to its data. They are using two different formats: XML and JSON, but JSON is strongly encouraged over XML, as JSON is more compact and parsing is greatly simplified.

We are not the first to implement a Graph Streaming API, and another very interesting experience is the GraphStream Java Library. It is composed of an API that gives a way to add edges and nodes in a graph and make them evolve. The graphs are composed of nodes and edges that can appear, disappear or be modified, and these operations are called events. The sequence of operations that occur in a graph is seen as a stream of events.

So, as other people already had successful experiences with graph streaming, why not start our work based on these experiences? That’s what we are doing, and beyond finding these experiences very useful, we are also trying to be compatible with the available work. The first Gephi Graph Streaming release will use two formats: JSON for flexibility, and a text-based format, based in the GraphStream implementation.

The first version of the Graph Streaming features will be available in the next release of Gephi, but it’s already possible to taste some of these features. To illustrate how simple it will be to connect to a master, the following video shows Gephi connecting to a master and visualizing the received graph data in real time. The graph in this demo is a part of the Amazon.com library, where the nodes represent books and the edges represent their similarities. For each book, a node is added, the similar books are explored, adding the similar ones as nodes and the similarity as an edge.

 

 

The Graph Streaming specification goes beyond the simple fact that a client can pull data from a master: in fact, clients can interact with the master pushing data to it, in a REST architecture. The same data format used by the master to send graph events to the clients is used by clients to interact with the master.

In the next example, we will transform Gephi in a master to provide graph information to its clients. At the Streaming Tab in the Gephi application, you can access all the features of graph streaming. You can connect to a Master by clicking the ‘+’ button, but you can also transform your Gephi in a master by right-clicking the “Master Server” and selecting “Start” (You are not limited to a single master by host: each Gephi workspace can be available as a master). By default, the HTTP server will listen at port 8080 in plain HTTP, and at port 8443 using SSL. The server path depends on your workspace: each workspace uses a different path. You can configure these parameters (and also Basic Authentication) at the “Settings…” button:

 

Graph Steaming Server start

Graph Steaming Settings Panel

 

Now, you can connect to it using some simple HTTP client. For example, you could use curl to see the data flowing. First of all, open a shell window and execute the following command:

curl "http://localhost:8080/workspace0"

With this, you are connecting to your workspace at Gephi. If the workspace is empty, you will receive no data, but you will remain connected, so you will receive all events from now.

Now open another shell prompt, and with the following commands, you could see a triangle appearing at Gephi:

curl "http://localhost:8080/workspace0?operation=updateGraph" -d $'
{"an":{"A":{"size":10,"r":1,"g":0,"b":0,"z":0,"y":500,"x":70}}}\r
{"an":{"B":{"size":10,"r":1,"g":0,"b":0,"z":0,"y":90,"x":250}}}\r
{"ae":{"AB":{"source":"A","target":"B","weight":10,"r":0,"g":0,"b":0,"directed":false}}}\r
{"an":{"C":{"size":10,"r":1,"g":0,"b":0,"z":0,"y":90,"x":-90}}}\r
{"ae":{"BC":{"source":"B","target":"C","weight":10,"r":0,"g":0,"b":0,"directed":false}}}\r
{"ae":{"CA":{"source":"C","target":"A","weight":10,"r":0,"g":0,"b":0,"directed":false}}}'

At the same time, all events will be sent to your connected client, in the other shell window.

With the following commands you can retrieve some of the data:

curl "http://localhost:8080/workspace0?operation=getNode&id=A"
curl "http://localhost:8080/workspace0?operation=getEdge&id=AB"

And you could start manipulating your graph through command line, as you like. There are other event types for changing and removing edges and nodes, for more information about them see the current status of the JSON Streaming Format, available at this page. We recall that this format is subject to changes, as the API was build to be very flexible and more requirements are being added to it.

But what about connecting two different Gephi instances together? One instance will be master, and the other client. Using the Graph Streaming API, a change in a graph at the master’s workspace should cause a change in the client’s workspace, and a change at the client’s workspace will cause it to send requests to the master to update its graph accordingly. Both instances working in a distributed mode. In fact, different people could work in a distributed mode to construct a graph: it’s the Collaborative Graph Construction.

My personal impressions about it

For me as a researcher, Gephi has the potential to become a de-facto standard for manipulating and visualizing large scale graphs. I believe that the research community is still lacking a high-quality, general-purpose, community-supported framework for exploratory analysis of large-scale dynamical graph data, and I believe that Gephi has the potential to fill this gap. I’m working also in collaboration with ISI Foundation at the SocioPatterns project, an example of research use case that currently uses Gephi for exploratory data analysis and visualization. The support for dynamic networks, the readiness of the Gephi data model for dynamical update of graph topology and attributes and, in a near future, the support for graph streaming are exciting features that suit very well the large-scale real-time data sources we are dealing with. The potential for processing live streams from our experiments is a unique feature that we are eager to see working.

André Panisson

4 comments »

Community ~

Today I am pleased to interview Jeremy Subtil, Gephi student at Google Summer of Code 2009.

Jeremy is a French postgraduate student in Computer Science at Compiègne University of Technology. Fond of FLOSS, he took part to the 2009 Google Summer of Code on a Gephi project.

 

Sebastien Heymann: Hi Jérémy Subtil, you took part in Gephi with the Google Summer of Code 2009 (GSoC), by handling the vectorial preview module and implementing the SVG export. Could you explain why you chose this project, in particular why getting involved in Gephi although there are such great other organizations like Debian, WordPress or Mozilla?

Jérémy Subtil: I heard about Gephi in one of the courses I chose in February last year. From web crawling inputs, we visualized links between websites and we identified the emerging clusters, in order to approach a part of the world wide web’s shape. I discovered that Gephi was driven by a small but very active team, so I though it was a very nice opportunity to integrate a FLOSS community. In addition, I was very interested in doing some graphic work with the vectorial preview, as well as I wanted to know more about the SVG format.
Read next page »

Comment it »