eTBLAST: Unique search engine


  eTBLAST: A text similarity-based engine for searching literature collections

eTBLAST is best described as a text similarity engine rather than a keyword search engine. For most search engines, such as Google and PubMed, the user must distill their ideas down to a very few keywords, and then try a variety of combinations of them to try to get the most relevant documents. eTBLAST takes a whole paragraph, such as a scientific abstract or, say, an invention description, which usually contains hundreds of keywords, as a query. The user simply pastes in their paragraph into the text box and then submits it to the engine using the "Search" button.

eTBLAST first takes this natural language paragraph, strips it of simple words such as “the, a, of, and” and then it searches its database (Medline, Institute of Physics, US Patent database, etc.) to find those entries that match the maximum number of the remaining keywords, weighted by the frequency of each keyword in all the literature being searched. This is a compute intensive process, but when done it keeps the top 400 ‘hits’ (e.g., Medline abstracts) and then it starts the second phase of the computations. It then does a sentence by sentence alignment, which then accounts for the proximity and order of the words in the query when compared to the abstract ‘hits’. A final similarity score is computed, and then the resulting ‘hits’ are ranked and presented to the user. The ‘hits’ can be viewed in your browser, as a link.

Also unique to eTBLAST is the host of post-processors (Find an Expert, Find a Journal, and View History), that inspect the ‘hits’ to see which authors appear most frequently, in which journals were the papers mostly published, and in which years. This allows the user to not only capture value from the ‘hits’ themselves, but also use them as a whole to do things like identify experts in the field as those that published the most, the most appropriate journal to, for example, submit their work, and to view the history to see if the topic defined by the query is becoming more or less ‘popular’.

eTBLAST is continuously being upgraded, to improve the speed and quality of the similarity comparisons, to expand the number and types of databases we search and by adding new post-processors so that users can get the most out of their experience.

eTBLAST is best described as a text similarity engine rather than a keyword search engine. For most search engines, such as Google and PubMed, the user must distill their ideas down to a very few keywords, and then try a variety of combinations of them to try to get the most relevant documents. eTBLAST takes a whole paragraph, such as a scientific abstract or, say, an invention description, which usually contains hundreds of keywords, as a query. The user simply pastes in their paragraph into the text box and then submits it to the engine using the "Search" button. eTBLAST first takes this natural language paragraph, strips it of simple words such as “the, a, of, and” and then it searches its database (Medline, Institute of Physics, US Patent database, etc.) to find those entries that match the maximum number of the remaining keywords, weighted by the frequency of each keyword in all the literature being searched. This is a compute intensive process, but when done it keeps the top 400 ‘hits’ (e.g., Medline abstracts) and then it starts the second phase of the computations. It then does a sentence by sentence alignment, which then accounts for the proximity and order of the words in the query when compared to the abstract ‘hits’. A final similarity score is computed, and then the resulting ‘hits’ are ranked and presented to the user. The ‘hits’ can be viewed in your browser, as a link. Also unique to eTBLAST is the host of post-processors (Find an Expert, Find a Journal, and View History), that inspect the ‘hits’ to see which authors appear most frequently, in which journals were the papers mostly published, and in which years. This allows the user to not only capture value from the ‘hits’ themselves, but also use them as a whole to do things like identify experts in the field as those that published the most, the most appropriate journal to, for example, submit their work, and to view the history to see if the topic defined by the query is becoming more or less ‘popular’. eTBLAST is continuously being upgraded, to improve the speed and quality of the similarity comparisons, to expand the number and types of databases we search and by adding new post-processors so that users can get the most out of their experience.
Last modified: Thursday, 07-Feb-2008 09:05:12 CST