|
eTSNAP: Electronic Text Similarity Network Analysis Program |
This electronic Text Similarity Network Analysis Program (eTSNAP) utilizes natural language processing algorithms and software developed to measure the similarity between samples of ordinary text (eTBLAST), such as gene or drug annotations, department descriptions or abstracts gathered from academic journals or conferences. For each of the "abstracts" submitted in as a query, a pairwise quantitative similarity score is computed, that enables the computation of a similarity matrix. That matrix can then be visualized as a table or as a interactive map.
eTSNAP can be used to identify clusters if information burried within the body of text analyzed. For example, one can use this tool to identify groups of individuals, defined by their "abstract", who may be potential collaborators or competitors, and thus this would be a communications enhancement tool. For meetings, this tool could be used by meeting attendees to generate a suggested set of talks/posters to visit that are in related areas of interest. For data analysis, it could be used to find hidden knowledge and relationships contained within a body of text.
Here is a link to an example job consisting of the results of an OMIM query for the gli, smoothened, or patched genes. Make note of the central node that emerges in the map view.
The eTSNAP input file format is a tag-delimited format inspired by the html and fasta standards. The first section contains your tag names. Tags are the labels for the different types of data associated with each of your text entities. For example, if you are wishing to compare famous quotes, you might want a title, the author of the quote, and the quote itself. eTSNAP requires at least a title along with the text itself, but takes special advantage of an author tag also. Here is what you might enter in your tag section:
<TITLE>Title'<END TAGS>' is the most important part of the tags section, as it tells eTSNAP that you are done naming your tags, and will then start to add text entities. To start let's add an Einstein quote:
<AUTHORS>Author
<TEXT>Quote
<END TAGS>
<TITLE>Einstein Dice QuoteNotice that it looks similar in form except for the '<END TEXT>' instead of '<END TAGS>.' For eTBLAST to properly process your input, you must include the '<END TEXT>' tag directly after your '<TEXT>' tag and text. After the '<END TEXT>' tag, you may enter an unlimited number of additional text entities. Here is the full example file of quotes that you may examine and test in eTSNAP.
<AUTHORS>Albert Einstein
<TEXT>"God does not play dice with the universe."
<END TEXT>