Community Events ~ berlin meetup user group
People in Beta is a festival about startup culture, diy and co-working hosted on the 1st of October at betahaus in Berlin.
As a part of this festival Nodus Labs will do a workshop on social network analysis starting at 13.30, finishing at 14.30 (reserve your space on people in beta festival website).
Also, betahaus cafe will be a space where everyone can host their own session, so right after the workshop, at 14.30, we’ll host the second Gephi meetup in Berlin downstairs at one of the open tables. Together with the other guests we’ll talk about the different ways we use Gephi and Dmitry from Nodus Labs will show some practical applications of Gephi for text network analysis and social network analysis.
You are welcome to come and join in!
You can read a report on the previous meet-up on the blog of Nodus Labs.

Dmitry Paranyushkin is a professional amateur who’s had numerous affairs in the fields of arts, music, intersubjective relations, network research and internet business. He’s the founder of ThisIsLike.Com – an online mnenomic network and Nodus Labs – an exploratorium of ideas in the fields of network analysis. Having fled Russia for undefined reasons in 1976 he’s found a temporary refuge in Berlin where he lives in a castle on Spree river and occasionally visits betahaus to steal rocket-fast broadband frequencies.
Community Events ~ meetup paris user group
This is an announcement for the first Gephi User Group drink in Paris! The area has many active Gephi users and supporters and we are looking forward making regular meetups, to create connections and discuss features and projects. The group is also open to students interested in open-source or data visualization.
The first event is planned for Wednesday September 28, 20.00 to 22.00 (free access) in Aux 2 Academies, 15 Rue Bonaparte, Paris Au Père Tranquille, 16 rue Pierre Lescot, Paris (map).
Gephi can be used in many domains and with different types of data. Whether you are a scientist, a student, an artist, a developer or a simple enthusiast, you are welcome to join the community and show up at our meetup. It’s a great opportunity to ask questions, discuss data, plugins, code, metrics or visualization.
The meetup will be organized by Sébastien Heymann, Gephi co-founder. To register, sign-in on meetup.com and RSVP for the event.
Functionality ~ gsoc index performance
My name is Ernesto Aneiros and during this Google Summer of Code I am working on the Attributes Disk Store.
The problem
In Gephi, Attributes are the data that is associated with nodes and edges. As graphs grow larger and larger, attributes occupy more memory even though many times they are not essential to the end-user when he is only applying transformations or algorithms to the graphs. These attributes can be of different types, from simple Java primitives (byte, char, int, String, etc) to Gephi’s internal data types (lists of primitives or versioned data). The idea for the project was to have a combined memory/disk cache system to partially off-load these attributes to disk. The system should have a well-designed cache system to handle heavy read access on the most-accessed elements.
The Solution (1st iteration)
Lucene is one of the most popular text searching engines and a flagship Java open source project. Lucene is capable of handling and indexing millions of records while remaining performant, and when the idea for the Attributes API cache was born, Lucene was first considered for the role of the data store, and as added bonus Gephi will get full-text search capabilities with almost no extra effort. When analyzing the problem, the following criteria were developed to judge a possible data store:
- Reliable (resistant to corrupted disk data, failed transactions, unexpected errors)
- Fast
- Transparent (minimum complexity exposed to the end-user)
While Lucene complies with items 2 and 3, the approach when dealing with corrupted indices in Lucene is to rebuild from scratch therefore failing item 1. This doesn’t pose a problem to Lucene because in the context where it is supposed to be used (indexing of external information), input data is always available separately from the index and can be accessed if needed. In Gephi, however, this is not the case. Once Attributes are loaded from disk they remain in memory until saved back to file. If an error occurs during a disk store transaction the end-user can end up losing a day’s work, certainly not acceptable.
The Solution (2nd iteration)
After Lucene was ruled out as a contender for a data store, several options were considered, including using embedded SQL databases and using a combination of Ehcache plus BerkeleyDb. Both options bring a lot to the table and embedded databases in Java have achieved impressive results in performance when compared to other mainstream database systems (see projects H2 and HSQL for example). Ehcache + BerkeleyDb however win when complexity is considered since they introduce almost no translation layers between Gephi and the cache. Both solutions are good fits for the problem but in the end the balance tilted in favor of Ehcache + BDB because the complexity consideration.
Optimizing Ehcache and BerkeleyDB
Even though Ehcache provides a great deal of functionality and features, it was relatively easy getting up to speed with it. The documentation provided online was very complete with code samples available and detailed explanations. In almost no time an in-memory cache was up, running and being tested. Traditionally cache sizes have been specified as the amount of max elements that they can hold. In the 2.5 BETA of Ehcache a new feature was introduced that allowed sizing the caches by memory consumed instead of elements held. For our project this is a killer feature since we can now expose a single option to the user, letting him specify how much memory the cache should consume. Even though using the new feature proved a little more complicated than expected we obtained great feedback from the Ehcache community, specially from alexsnaps and Mike Allen, which helped us to solve the issues we were having.
BerkeleyDB on the other hand, is a very complex piece of software. With years of development under the belt, BDB has evolved to be a very robust and flexible database. In fact, it is so flexible that can be used as full blown database supporting queries, a simple key/value datastore or with a front-end that exposes a Java collections map that greatly simplifies its use. All of this flexibility does not come free though, configuring and optimizing BerkeleyDB requires delving into details about transactions, buffers, log file sizes and BDB internals. However the tools are there and the information provided is quite good, especially the FAQ and the optimization section.
Integration with Gephi
Since ease of use and transparency are important considerations for the end-user of Gephi, only the minimal configuration options are exposed in the preference panel of the disk cache, but an Advanced tab provides more control for those who want it.

The General settings tab, where cache can be enabled or disabled and the memory usage configured.

The advanced settings tab allows a more advanced user to configure several of BerkeleyDB’s options.
The Disk Store in Action

Memory consumption without the disk store. It reaches 400MB.

Memory consumption with the store, after load it drops below 400 MB. Note how load time increased due to disk operations, a trade-off to consider when using the store.
Known issues
The project is still in development. Being memory saving the main goal of the disk store project, results are not good enough yet because of several reasons.
While BerkeleyDB provides a very convenient way of storing bytes in disk, it is still a database oriented software and therefore it is not the most suitable solution for out project because of large memory usage to caching data, building and maintaining its index (features desirable for databases but not for this project).
Trying to reduce BerkeleyDB memory usage with its settings will produce quite different results in different systems or even in the same system. The benchmark above shows not bad results but it is not always the case. A better control of maximum heap growth can be observed but still with memory usage peaks that prevent better saving.
The conclusion is that it is a priority to replace BerkeleyDB with other disk persistence system or create one specifically designed for Gephi disk store.
It is also known that graphs with more complex data like strings or lists will always benefit more from a disk storage system than graphs with simple data like integers or booleans. An idea is to always store simple data in memory because indexing in in the disk is going to need as much memory anyway, or even more.
On the other hand, Gephi works and was designed with in-memory data structures in mind. Adding a cache/disk store to the system is bound to create integration issues with other parts of the codebase. For example the GEXF file importer tends to load large portions of the graph file to memory while parsing it, which is not so good in memory constrained environments and using the cache here will not make a difference. One of these issues is regarding the handling of data in files with .gephi format. Due to the way that .gephi files are imported, some integration problems still need to be debugged in the disk store to work properly.
Looking to the future
This GSOC project is only scratching the surface of what a memory + disk cache system can achieve. In the future BerkeleyDB could be replaced with other persistence provider, and it doesn’t necessarily has to persist locally to the disk. For example replacing BerkeleyDB with a datastore like Cassandra, or maybe some RDBMS.
Conclusion
While the Data Store API introduced by this project is still taking its first steps and can be significantly evolved, it has helped ironing out many issues and has paved the way for bigger and better improvements. Working during this summer has been a great experience and I have been able to share with great mentors like Eduardo Ramos, who knows the Gephi codebase in and out. I hope the work of all of Gephi’s GSOC’ers becomes the starting point for many new features and enhancements that the community will surely appreciate. Happy coding and see you next summer!
Community Events ~ berlin meetup user group
This is an announcement for the first Gephi User Group meet-up in Berlin! The area has many active Gephi users and supporters and we are looking forward making regular meetups, to create connections and discuss features and projects. The group is also open to students interested in open-source or data visualization.
The first event is planned for Thursday September 8, 17.30 to 19.30 (free access) in betahaus, 4th floor (Arena), Prinzessinnenstr 19-20, 10999, Berlin (map).
In this workshop conducted by Dmitry Paranyushkin from Nodus Labs hosted at Berlin’s most important co-working hub betahaus, we will demonstrate how one can visualize and analyze a social network or a community (using an example from Facebook selected by the participants). We will find out how to identify the most influential nodes within a network, various subgroups within a community, and the most efficient communication strategies to spread information within a group.
We will also discuss what behavior within the network fosters stronger ties between the members and a more sustainable community.
This event will also be first event of the Gephi Meetup Group in Berlin and we can move to betahaus cafe after to discuss further questions after the workshop.
Dmitry Paranyushkin is a professional amateur who’s had numerous affairs in the fields of arts, music, intersubjective relations, network research and internet business. He’s the founder of ThisIsLike.Com – an online mnenomic network and Nodus Labs – an exploratorium of ideas in the fields of network analysis. Having fled Russia for undefined reasons in 1976 he’s found a temporary refuge in Berlin where he lives in a castle on Spree river and occasionally visits betahaus to steal rocket-fast broadband frequencies.
Design Functionality ~ dynamics gsoc timeline
My name is Daniel Bernardes and during this Google Summer of Code I am working on the new Timeline interface.
Dynamic graphs have been the subject of increasing interest, given their potential as a theoretical model and their promising applications. Following this trend, Gephi has incorporated tools to study dynamic networks. From a visualization perspective, a critical tool is the Timeline component, which allows users to select pertinent time intervals and display and explore the corresponding graph. The challenge concerning the timeline was twofold: redesign the component to improve user experience and add extra features and introduce an animation scheme with the possibility to export the resulting video.
Together with my mentors Cezary Bartosiak and Sébastien Heymann, we have proposed a new design for the timeline component featuring a sparkline chart in the background of the interval selection drawer (which is semi transparent): this feature will help the user to focus on particular moments of the evolution of the dynamic graph, like bursts of connections or changes in graph density or other simple graph metrics. Current metrics are the evolution of the number of nodes, the number of edges and the graph density. The sparkline chart was preferred to other chart solutions because it does not add too much visual pollution to the component and adds to the qualitative analysis. The interaction with the drawer remains globally the same of the old timeline, to guarantee a smooth transition for the user.

To implement this feature we have used the chart library JFreeChart (a library already incorporated to Gephi), customizing their XYPlot into a Sparkline chart by modifying their visual attributes. To display the Sparkine, one needs to measure the properties of the graph in several time instants of the global time frame where the dynamic graph exists. This represented a major challenge, since the original architecture did not allow the timeline component to access (and measure) the graph in particular instants of time; the solution was to introduce a slight modification to the DynamicGraph API to provide an object which gave us snapshots of the graph at given instants. Other challenges we dealt with included the automatic selection/switching of real number/time units in the timeline (depending on the nature of the graph in question) and sampling granularity of the timeline.
Another breakthrough of this project was the introduction of the timeline animation. Once the user has selected a time frame with the drawer it can make it slide as the corresponding graph is being displayed on the screen. Besides the technical aspects of interaction between the timeline and the animation controller, there were also an effort to calibrate the animation (ie, in terms of speed and frames) so it would be comfortable and meaningful for the user.
As far as the UI is concerned, the component has gained a new “Reset” button next to the play button which activates the timeline drawer and displays the chart. It also serves to reset the drawer selection to the full interval when the timeline is active. The play button gained its original function, that is, to control the animation of the timeline — instead of activating the selection.

Finally, the animation export to a video format revealed to be more tricky than expected and couldn’t be finished as planned. There were several setbacks to this feature, beginning with the selection of a convenient library to write de movie container: it turns out that the de facto options available are not fully Java-based and need an encoder working in the background. The best alternative I found was Xuggler, which is based on ffmpeg. Also, obtaining screen captures of the graph to were a little bit tricky so I have exported SVG images from the graph corresponding to each frame, converted them to jpeg and than encoded them though Xuggler to a video format. As one might expect, this solution is not very efficient in terms of time, so Mathieu Bastien and my mentors suggested me to wait for the new features from the new Visualization API that would make this process simpler.
In addition to current bugfixes and minor improvements concerning the timeline and the animation, the movie export remains the the next big step to close this project. If you have questions or suggestion, please do not hesitate! The new timeline will be available in the next release of Gephi.
DB