eTBLAST is best described as a text similarity engine
rather than a keyword search engine. For most search engines, such as
Google and PubMed, the user must distill their ideas down to a very few
keywords, and then try a variety of combinations of them to try to get
the most relevant documents. eTBLAST takes a whole paragraph, such as
a scientific abstract or, say, an invention description, which usually
contains hundreds of keywords, as a query. The user simply pastes in
their paragraph into the text box and then submits it to the engine using
the "Search" button.
eTBLAST first takes this
natural language paragraph, strips it of simple words such as “the, a,
of, and” and then it searches its database (Medline, Institute of
Physics, US Patent database, etc.) to find those entries that match the
maximum number of the remaining keywords, weighted by the frequency of
each keyword in all the literature being searched. This is a compute
intensive process, but when done it keeps the top 400 ‘hits’ (e.g.,
Medline abstracts) and then it starts the second phase of the
computations. It then does a sentence by sentence alignment, which
then accounts for the proximity and order of the words in the query
when compared to the abstract ‘hits’. A final similarity score is
computed, and then the resulting ‘hits’ are ranked and presented to the
user. The ‘hits’ can be viewed in your browser, as a link.
Also unique to eTBLAST is the host of post-processors (Find an Expert,
Find a Journal, and View History), that inspect the ‘hits’ to see which
authors appear most frequently, in which journals were the papers mostly
published, and in which years. This allows the user to not only capture
value from the ‘hits’ themselves, but also use them as a whole to do
things like identify experts in the field as those that published the
most, the most appropriate journal to, for example, submit their work,
and to view the history to see if the topic defined by the query is
becoming more or less ‘popular’.
eTBLAST is continuously being upgraded, to improve the speed and quality
of the similarity comparisons, to expand the number and types of databases
we search and by adding new post-processors so that users can get the most
out of their experience.
eTBLAST is best described as a text similarity engine
rather than a keyword search engine. For most search engines, such as
Google and PubMed, the user must distill their ideas down to a very few
keywords, and then try a variety of combinations of them to try to get
the most relevant documents. eTBLAST takes a whole paragraph, such as
a scientific abstract or, say, an invention description, which usually
contains hundreds of keywords, as a query. The user simply pastes in
their paragraph into the text box and then submits it to the engine using
the "Search" button.
eTBLAST first takes this
natural language paragraph, strips it of simple words such as “the, a,
of, and” and then it searches its database (Medline, Institute of
Physics, US Patent database, etc.) to find those entries that match the
maximum number of the remaining keywords, weighted by the frequency of
each keyword in all the literature being searched. This is a compute
intensive process, but when done it keeps the top 400 ‘hits’ (e.g.,
Medline abstracts) and then it starts the second phase of the
computations. It then does a sentence by sentence alignment, which
then accounts for the proximity and order of the words in the query
when compared to the abstract ‘hits’. A final similarity score is
computed, and then the resulting ‘hits’ are ranked and presented to the
user. The ‘hits’ can be viewed in your browser, as a link.
Also unique to eTBLAST is the host of post-processors (Find an Expert,
Find a Journal, and View History), that inspect the ‘hits’ to see which
authors appear most frequently, in which journals were the papers mostly
published, and in which years. This allows the user to not only capture
value from the ‘hits’ themselves, but also use them as a whole to do
things like identify experts in the field as those that published the
most, the most appropriate journal to, for example, submit their work,
and to view the history to see if the topic defined by the query is
becoming more or less ‘popular’.
eTBLAST is continuously being upgraded, to improve the speed and quality
of the similarity comparisons, to expand the number and types of databases
we search and by adding new post-processors so that users can get the most
out of their experience.
| Last modified: Thursday, 07-Feb-2008 09:05:12 CST |
|