25 July 2010
During this summer, six students are working on Gephi with the Google Summer of Code. They contribute to Gephi by developing new features that will be integrated in the 0.8 version, released later this year.
Yi Du is adding the module Direct Social Networks Import during this summer, which provides several kinds of importers like Emails, Twitter or Facebook. The goal of this article is to briefly introduce some of the importers, as well as several samples provided.
The ability to import any kind of structured data and build network from it is essential for users. This step is often missing and requires time and scripting abilities, although tools and libraries able to read and parse all type of data already exist. Moreover it has never been so easy to quickly access meaningful datasets online.
Email is a simple and widely used tool in communication among people, yet many people have no knowledge of its mechanism. To some extent, our work on analyzing emails can help people better know their relationship with others. In our email importer module, each email address is represented as a node. If there are two email addresses with the same display name, an option will be provided to allow the user to determine whether to regard them as a node or two different nodes. Afterwards, if there is an email from A to B, an edge will be built, along with an option permitting the user to determine whether Cc or Bcc will be viewed as an edge.
We provide two ways to import emails: on the one hand, the emails are obtained from the email server (POP3 or IMAP), in a one-by-one manner. On the other hand, we get the emails from local files or folder. This importer will arise a problem, that is, different email clients may have different file format. Fortunately, our importer has an easy-to-extend API, as well as a default implementation (EML files). EML is standard and can be obtained from Thunderbird, Outlook and Gmail with tools like Gmail Backup.
Twitter is a very popular social network. People can send and receive short messages, which we usually call tweets, using Twitter. We can follow person we are interest in and topics we like. Twitter networks has been popularized by NodeXL which has a similar feature. See this nice gallery.
We provide two kinds of networks: “Twitter Search Network” and “Twitter User Network”.
We support Twitter search network to analyze people who search or mention similar keywords. We present one Twitter user as a node and define three kinds of edge construction:
- Replies-to relationship: If A reply to B in a searched tweet, an edge from A to B will be added.
- Mentions relationship: If A mentions B in a searched tweet, an edge from A to B will be added.
- Followers relationship: If A follows B in constructed graph, an edge from A to B will be added.
The second network we provide is “twitter user network”. We analyze people who follow each other to show the relationships between twitter users. We add an edge from A to B if A follows B in the whole graph by default. We provide three options for vertex construction:
- Person followed by the user: If searched user A follows B, B will be added as a vertex.
- Person following the user: If A follows searched user B, A will be added as a vertex.
- Both: Both the above two options.
New-York Times importer
The New York Times is an American daily newspaper founded and continuously published in New York City. It has a series of APIs for developers on news and social networks. There are several APIs of NYT, such as Article Search API, Best Seller API, etc.
We provide two kinds of social network importers in Gephi: “Article Network” and “TimesPeople Network”. We use article network to analyze articles with specific filters (date, facets, etc). User can choose which option constructs the edge. For example, user can choose date as the edge. If two articles have the same date attribute, an edge between them will be built. TimesPeople is a social network for Times readers, it’s similar to Facebook, we can analyze the relationship between them.
Conclusion and future work
In this article, we introduced several importers: Email, Twitter and NYT. By using these importers, users can import data they want and analyze them. They can find the hottest group, the relationship of their friends, the most related author of a facet and other import information by analyzing them.
Until the end of the GSoC, we will have four major importers: Email, Twitter, NYT and Facebook. Among these four importers, Twitter will have “Twitter User Network” and “Twitter Search Network”. NYT will have “NYT article search network” and “NYT TimesPeople Network”. Facebook will have “Facebook Friends Network” and “Facebook Group Network”. Besides adding Facebook importer, we will also optimizing the UI of the importers, and make them more user friendly.