1 February 2010
Today I have the honnor to interview a special member of Gephi Team: Mathieu Jacomy.
Mathieu is an engineer, a founder of the WebAtlas NGO, teacher in Sciences Po Paris, and leads R&D in the TIC Migrations program in the Fondation Maison des Sciences de l’Homme and Telecom ParisTech school.
He is the main developer of the “Navicrawler” software. He also created the first Gephi prototype.
At this time I was analyzing a lot of graphs and I wasn’t satisfied by the existing free tools. That’s why I started to build my own tools.
I had no money to use professional tools, and I needed to understand precisely what the software was doing : the open source, free softwares perfectly fit these constrains.
I was using the amazing software Guess proposed by Eytan Adar, that himself built for his own needs. I was doing quite the same thing as him, and I couldn’t start to explore graphs without this tool.
But I wasn’t satisfied because the software didn’t allow so much manipulations. I couldn’t look at the substructures as easily as I wanted, and it was difficult to make nice cartographies.
I was dreaming of a “graph-dedicated Photoshop“, a visualization-oriented software rather than a script-oriented tool.
A good way to figure out what I mean is to look at the spatialization process. In famous softwares such as Pajek or Guess, you have algorithms called “layout”, “force-vectors” or “energy model”. These algorithms give its shape to the graph, and it is probably the most critical part of the process to build a clear visualization. Because the substructures or “patterns” that one may see in the image strongly depend on the algorithm and the settings chosen. But in the same time, most of users also want to quickly look at the global shape of the graph, and may not be aware that it’s important to search for the best algorithm to use depending on the time you have, the quality you want, the size of the graph, its degree distribution, the substructure that you expect to recognize… I was careful with these algorithms but even if I understood their principles and specificities, I couldn’t figure out how they were transforming the graph, and I couldn’t evaluate their differences.
Why? Because in these softwares you can’t :
- Manipulate the graph while the algorithm is running
- Modify the settings while the algorithm is running
- And sometimes, you can’t event see the graph while the algorithm is running
How can you just understand what’s happening there? Of course I started to work on a software that allowed this. But the same kind of problems appears again in other parts of the process, like filtering, image exporting… Pajek is clearly built in a mathematical perspective. Guess is more user-friendly, but not enough. I didn’t want a tool for mathematics experts, but a tool for people that actually have to explore and understand graphs. A professional tool for a job that didn’t exist at this time.
This was the starting point of “Graphiltre“. Building a graph exploration system so that you can understand what you are doing by looking at what happens on the screen, and do anything (including filtering) without typing a single script line.
Then if you ask me: “Why didn’t you include your work inside Guess?”, of course you’ll be right. I strongly thought about it, because creating a new tool means developing again many basic features – quite for nothing.
To be honest, I just couldn’t do that, I wasn’t good enough to understand the source code of Guess – actually I tried! But behind this, lies another important issue. Because the inner structure of Guess (including a live script editing feature, several graphic engines, the JUNG graph core library, SQL bindings…) was too shy in my opinion. This software didn’t make a strong choice. Some very different options stay unchosen in Guess. I’ll give you an example. It is based on the “Piccolo” graphic library, which is good even if not graph optimized ; but if you look closely you’ll see that you can actually switch from this module to another module such as TouchGraph. But even if most of users keep using Piccolo – the default option-, Guess is locked by the need to tune its features up to each concurrent library. And this kind of problem lies everywhere in Guess because it’s a composite software, a puzzle built from various sources.
I’m criticizing Guess but I must say that I have a lot of respect for this software, for Eytan Adar and its team. They opened my mind and in many ways Guess had the taste of the future, it was a more decisive improvement to the world of graph-workers than Pajek. Because for the first time you could use your body to interact with graphs, not only your brain. You could actually “handle things”. The problem of Guess is just to be thought as a “research tool” and not a “general public software”, probably for obvious reasons (time, research priorities…). As an engineer, I started Graphiltre in another perspective.
Fortunately, with another design, you see things differently and you can sometimes make some improvements. There is a performance issue in Guess. I had big graphs to study, and Guess was way too slow for me. I started to measure the time it took to spatialize a graph and I understood that it was difficult for Piccolo to display a large graph. And by large, I mean more than 100 nodes / 1000 edges – which is actually quite small. Of course a complex graph is a lot of information, but it’s only simple shapes : one-color lines, rounds, squares and sometimes letters. You know that in a computer a CPU can do many different things, but isn’t so powerful about simple and repetitive tasks. I thought about the millions of polygons displayed in video games: the idea was to benefit from the power of GPUs. And I needed to rethink the display engine compared to the multiple solutions of Gephi. I wanted to first improve the display performance, and then try to use GPU power to improve spatialization algorithms.
I was amazed to use some video games’ technologies, and it appeared that it was a good idea. Graphiltre uses an OpenGL binding since the beginning. And since the beginning it is way faster than Guess for display. But I didn’t achieve the “GPGPU” perspective (general purpose graphic processor unit), even if today it’s easier to do it – and I just think that “physical engines” like “Havoc” already do what we need…
Mathieu did a really impressive work on the software. And a consequence is that now, it’s too difficult for me to contribute to the core of the code. But there is a difference for me: Mathieu made Gephi a contributive tool, based on a modern open-source philosophy. That’s why I keep improving some aspects such as spatialization, user interface, and filters. I still share my vision with the Gephi team and I’m always here if someone needs an opinion or an advice. I like discussing a lot about specifications and development perspectives. But I think that my main role is to make Gephi stay in touch with the right concepts about graphs.
Even if a consortium uses this format, it doesn’t mean that it’s a good reason to keep using it. First, they don’t necessary need it, nor actually use it. They can support it for other reasons. Actually, the same reason I told you before: technological influence.
This question is a good one for innovators, and it’s important to give a clear answer because it touches to what a technology is, and to what a technology looks like. That’s why we need to separate two questions :
1) Why isn’t the .gexf based on GraphML ?
2) Why is there a new format associated to Gephi ?
The answer to 1) is: Vanity, laziness, and the possibility to take another direction later. There is no strong technical reason.
The answer to 2) is: Because we don’t claim to be compatible with anyone. Gephi already has some specificities that make it not fully compatible with other softwares. And it will be more and more the case. Because it has a strong identity, and I speak about features, it’s useful to make it clear, that it’s not using “generic” graph files. That’s why there is a name and a format linked to Gephi: because you have to know that if you open a “Gephi graph” in another software, you’ll probably lose some informations.
These two separated answers have nevertheless something in common: freedom. The .gexf file format is something the community can handle, it’s easy to implement a specific feature, because only few people have to agree on the format. It represents the freedom of the community, the opportunity to design its own tools for its own needs. It worths as a caution, as a freedom of move, not as itself. In my opinion, if we don’t implement strong specificities in the future, it may be question to leave this format. But its existence is useful for the moment, and might be decisive some day.
So my goal is to keep Gephi simple, handy for users. It isn’t easy for two reasons. First, graphs are complex – the purpose of Gephi to allow users to understand them and share this knowledge, with a cartography for example. Two, making graph handling easy means high performance, and it needs a high structural complexity for the software. But we achieved some improvements in this way, and I think that the 0.7 version is a milestone for that. Gephi is much more complicated than my original Graphiltre prototype, but it stays simple enough for users, and even simpler on many aspects. Many features appeared and the user interface is richer, but the work flow is more fluid.
But the secret, if there is one, the ‘mantra’ behind Gephi’s development philosophy isn’t the vision itself. The key of success in this kind of innovative process is to keep the good concepts in the center of the tool. I don’t have a “technology for technology” philosophy. Tools are made for users, and everything has to be about them. For example there is something I forbid : to implement an algorithm with no access to it. It may sound strange, but in research people sometimes do this, because they think that if it’s in the code and if they can use it (with a command line for example) then it’s OK. But it’s not : sharing with others also means respecting them, and respecting their right not to be comfortable with “computer scientist style” interfaces. It’s an example of the wrongs things to do. But what are the right things to do? There is no definitive answer to this question, because you innovate by making something new, and it means that you have to forget what previously was right or wrong. Nevertheless there is a vision behind an innovation: you have the intuition of what’s good or not. That’s why I say that the concepts are the key. Dedicate your tool to your concepts. This is, I think, my best advice.
You understood that I want user-centric tools. But the user isn’t actually in the center of the software. Mainly because when you develop a software, the user isn’t there. So what’s the link with the user? The user interface? No. It’s the concepts that make your software “work”. The user interface only stem from them. You have to think “the user will use my concepts to do its work, and I have to guide him so that he understands and benefits from them”. It’s your responsibility as a software designer to assume that there is a hidden power in any tool.
Think of a screwdriver: how to design the handle of a screwdriver? Long or short? Heavy or light? Fat or thick? Square or round section? Your concept is that the user has to push the back of the screwdriver to screw well on, with the arm in the axis of the tool. You know that if the user takes the screwdriver in its hand like a spoon, he won’t make a good work. Your power is to force the user to push the back of the tool. Your responsibility is to assume this power and to make it to serve the user. Your guideline for the interface will be to prevent the user from using your screwdriver like a spoon. That’s why you’ll chose a short handle, so that the user doesn’t want to grasp it. You’ll design a big flat back, so that it’s comfortable for the user to push it. You’ll make the tool easy for the right use and difficult for wrong uses. If you do that, you don’t expect from users to read the fucking manual. They’ll learn from using the tool.
Yes, users will learn from using the tools. And this is my point. The value of an innovation is the value of the concept behind the tool, plus the value of learning it by practice. This is the mantra of Gephi design. I was successful in my job, because I knew that the graphical aspect of a graph is very important. I made high quality spatializations, and very nice pictures. You can read Jacques Bertin to learn more about that. I’ll just say that my concept was: semiotics matter. As soon as a graph is spatialized, it’s read. As soon as it’s read, the signs in it, the system of signs it is, have to be carefully tuned. This was my secret, and the idea to develop and share a tool was to help people achieve a high quality work. I wanted them to understand that semiotics matter. But rather than writing a book, I wrote a software. And now people do great cartographies with Gephi because they benefit from the concepts I put inside Gephi. And I take responsibility for forcing them to use my concepts, even if they don’t realize it. And now it’s their concepts as well as mine. Franck Ghitalla calls this principle “to embody Human Sciences concepts in tools”, and he is right. This is the key. And I’ll give you a clear example.
When you spatialize a graph, a mathematical algorithm will make nodes converge to a locally optimal position. This means that the nodes are mathematically well placed. But as a system of signs, the graph may not be satisfying. For example, two nodes are very close one to another. If you show the names of the nodes, they hide each other or they superpose so that you can’t read them. What’s the meaning of a mathematically perfect position if you can’t read the cartography? My idea was to implement an algorithm that shifted nodes just a little enough to make the names readable. The loss is mathematically insignificant, and the image improves a lot. You know, my concept, “Semiotics matter”… But to achieve it you have to make the size of your text (graph as a system of signs) accessible to the mathematical algorithms (graph as a theoretical object). This is mathematically weird, but we don’t care. Concepts in the center: we designed Gephi so that it’s possible, that’s all. This feature is a real innovation (it involves new design principles for a graph software) and most of users love it.
I use Gephi to analyze graphs, and to make always nicer cartographies. I also teach students how to achieve web cartographies with Gephi in Sciences Po in Paris. For more informations about what I do, take a tour on WebAtlas.fr (but in french only) or type my name in Google . Or…
…wait for it…
…just use Gephi!
- New Plugin-oriented architecture
- New User Interface
- New Cartography Creator module
- New Network Statistics
- and more…