RIC: Research Interest Comparator

Lewis, James A, Researcher
Affiliation: not provided
Email: not provided
Home Page: not provided
All Results | New This Month | Abstract | Selected Publications | RIC Statistics Results - NEW THIS YEAR:

No matching results
Abstract:
ABSTRACT
Background: The most widely used literature search techniques, such as those offered by NCBI’s PubMed system, require significant effort on the part of the searcher, and less experienced searchers are less effective. Improved literature search engines can save researchers time and effort.
Results: Several text similarity search algorithms, both standard and novel, were implemented and tested in order to determine which obtained the best results in information retrieval exercises. In a preliminary study to determine the efficacy of text similarity searching (TSS) versus Boolean searching, novice searchers (first year medical students, n = 10) attained 76.8% ± 18.3% precision using various TSS algorithms vs. 53.3% ± 33.3% using PubMed’s Boolean interface. Various TSS algorithms were then tested against one another in a batch retrieval task. Our novel sentence alignment algorithm attained higher precision than other strategies tested. A second novel algorithm combining Boolean with similarity searching attained high precision with significantly reduced run-time.
Conclusions: Text similarity searching outperforms Boolean searching for novice users, and our novel sentence alignment algorithm is an improvement over standard approaches. Literature searching algorithms are implemented in a system called eTBLAST, freely accessible over the web at http://invention.swmed.edu. A utility (RIC) for continuously monitoring a user’s topic against MEDLINE is also provided
Keywords extracted from the abstract: [ eliminated words list ]
Count Word
1.041 abstract
2.713 accessible
6.441 algorithm
7.457 algorithms
4.499 alignment
1.964 approaches
7.427 attained
1.470 background
2.906 batch
1.848 best
13.881 boolean
2.253 called
2.513 combining
1.383 conclusions
2.431 continuously
2.764 determine
1.396 effective
1.603 efficacy
4.781 effort
3.946 engines
6.000 etblast
2.860 exercises
2.046 experienced
1.109 first
2.925 freely
0.994 high
1.114 higher
3.158 http
5.245 implemented
1.613 improved
Count Word
1.713 improvement
1.363 information
2.240 interface
4.012 invention
2.233 less
4.314 literature
0.878 medical
2.730 medline
1.642 monitoring
1.061 most
6.000 ncbi’s
5.011 novel
7.722 novice
1.215 obtained
2.454 offered
1.503 order
0.926 other
4.337 outperforms
1.478 part
6.664 precision
1.989 preliminary
1.745 provided
3.870 pubmed
6.000 pubmed’s
1.234 reduced
2.003 require
2.543 researchers
1.506 results
5.324 retrieval
3.291 ric
Count Word
4.981 run-time
3.178 save
5.825 search
5.091 searcher
9.800 searchers
11.972 searching
1.404 second
6.634 sentence
0.990 significant
0.979 significantly
7.152 similarity
3.190 standard
1.874 strategies
1.816 students
6.000 swmed
1.937 system
1.976 task
1.343 techniques
4.007 tested
7.214 text
0.915 time
2.907 topic
9.891 tss
0.946 used
2.261 users
6.000 user’s
1.801 using
2.333 utility
2.732 various
1.616 versus
Count Word
1.435 vs
2.790 web
2.045 widely
1.576 year
RIC Statistics:
Extraction Method: Keyword Count with Lexical Variants Added
Eliminated words list: MedlinePlus List
Similarity Method: Weighted keyword count
Weighting Method: Term Frequency * Inverse Document Frequency
Database: Medline Updates from current year
Publication Type: All
Score Calculation Method: Cosine Similarity Method
Sort by: Score
Submission date and time: 9-1-2005, 19:23:34
Computation time: 00:00:04
Last updated: Thursday, 01-Sep-2005 19:23:38 CDT