A new plugin is available for Gephi that utilizes the power of natural language processing (NLP) software to analyze text documents and visualize their contents. The plug-in was created by AlchemyAPI (alchemyapi.com), and utilizes the AlchemyAPI REST service to semantically process a web page or text file and show all the subjects of the text (people, places and things, known collectively as named entities) as nodes in Gephi.

 

Graph of the American Revolution wikipedia entry.

The plug-in is a powerful tool to distill dense and unstructured textual data into easy to understand graphs. Extracted entities possess a relevance attribute which is a measure of how pertinent the subject is to the source text, and also a count attribute that indicates the number of times the subject is named in the source text. Both of these attributes can be used to affect the visualization.

Once installed, the plug-in can be accessed through the File->Generate->Semantic Analysis menu. As an example of the functionality of the plug-in, we’ll examine the wikipedia entry for the American Revolution. To make a graph with this article, enter the article’s url into the Semantic Analysis dialog box. The plug-in will extract over 350 people, places, and things from the wikipedia page. You can use this data to create a word cloud type visualization of the article, like the one above.

If subtype analysis is enabled, you can also visualize the types and subtypes of named entities. For example, the nodes in the image below were extracted from a recent news article. They represent Dmitry Medvedev and his ontological classifications. The edges from Medvedev’s node identify him as a Person, Politician, and President (classifications he shares with Mahmoud Ahmadinejad). A complete list of the subtypes AlchemyAPI returns can be found at http://www.alchemyapi.com/api/entity/types.html.

Detail of named entity subtypes

The plug-in can also be used to visualize the connections between multiple text documents. Connections will be drawn between the document node and the entities that the texts share, creating a powerful way of discovering recurring themes within an archive. As an example, see the connections shared between the wikipedia pages for the American Revolution and the French Revolution in the picture below. Common entities like ‘France’, ‘Britain’, and ‘Thomas Paine’ are linked by both the French Revolution and American Revolution articles.

Graph of connections between American and French Revolution wikipedia entries.

As more documents are added to the graph, a web of entities form. The relevance and count of connected entities increase with the number of documents that mention them.

We hope you use this plug-in to make the data in your text more accessible. If you have any questions or suggestions for the makers of this plug-in, please leave them in the comments section.

Our thanks to the Gephi team for their remarkable visualization program, and all the documentation and help that made this plug-in possible.

Graph of espn.com front page and linked articles.

Shaun Roach

Download the Gephi plugin for AlchemyAPI here, or find it in your Gephi plug-in center.

Post to Twitter Post to Plurk Post to Yahoo Buzz Post to Delicious Post to Digg Post to Facebook Post to Ping.fm Post to Reddit Post to StumbleUpon

Related Posts:


Community Plugins ~

Trackback

3 comments feed

  1. 1
    Clement

    Hi,

    I am experimenting with this plug-in. At some point I got a warning that the text I submitted was over the size limit. What is this size exactly?

    Thx!

    Clement

  2. 2
    Kevin

    Hi!

    This tool is very helpful! How exactly do you use the multiple text document function through this plugin?

    Kevin

  3. 3
    Huy Nguyen

    where could i download this plugin? I really need it for my work. Thanks for help me.

Add comment