10 August 2010
A new plugin is available for Gephi that utilizes the power of natural language processing (NLP) software to analyze text documents and visualize their contents. The plug-in was created by AlchemyAPI (alchemyapi.com), and utilizes the AlchemyAPI REST service to semantically process a web page or text file and show all the subjects of the text (people, places and things, known collectively as named entities) as nodes in Gephi.
The plug-in is a powerful tool to distill dense and unstructured textual data into easy to understand graphs. Extracted entities possess a relevance attribute which is a measure of how pertinent the subject is to the source text, and also a count attribute that indicates the number of times the subject is named in the source text. Both of these attributes can be used to affect the visualization.
Once installed, the plug-in can be accessed through the File->Generate->Semantic Analysis menu. As an example of the functionality of the plug-in, we’ll examine the wikipedia entry for the American Revolution. To make a graph with this article, enter the article’s url into the Semantic Analysis dialog box. The plug-in will extract over 350 people, places, and things from the wikipedia page. You can use this data to create a word cloud type visualization of the article, like the one above.
If subtype analysis is enabled, you can also visualize the types and subtypes of named entities. For example, the nodes in the image below were extracted from a recent news article. They represent Dmitry Medvedev and his ontological classifications. The edges from Medvedev’s node identify him as a Person, Politician, and President (classifications he shares with Mahmoud Ahmadinejad). A complete list of the subtypes AlchemyAPI returns can be found at http://www.alchemyapi.com/api/entity/types.html.
The plug-in can also be used to visualize the connections between multiple text documents. Connections will be drawn between the document node and the entities that the texts share, creating a powerful way of discovering recurring themes within an archive. As an example, see the connections shared between the wikipedia pages for the American Revolution and the French Revolution in the picture below. Common entities like ‘France’, ‘Britain’, and ‘Thomas Paine’ are linked by both the French Revolution and American Revolution articles.
As more documents are added to the graph, a web of entities form. The relevance and count of connected entities increase with the number of documents that mention them.
We hope you use this plug-in to make the data in your text more accessible. If you have any questions or suggestions for the makers of this plug-in, please leave them in the comments section.
Our thanks to the Gephi team for their remarkable visualization program, and all the documentation and help that made this plug-in possible.
Graph of espn.com front page and linked articles.
Download the Gephi plugin for AlchemyAPI here, or find it in your Gephi plug-in center.