Science Surveyor uses the citations of the new study to generate a special database of related studies, called the context. A series of algorithms analyzes that context for various characteristics, displayed in the five panels below. Science Surveyor uses abstracts and citations from the Web of Science database from 1989 to 2012—about 27 million articles and 581 million citations. It takes the citations from the new study and collects all their citations. Then it does the same for all those studies. In other words, it “hops out” three citation levels. In addition, Science Surveyor collects another 1,000 articles that have language similar to that of the new study. In this way, Science Surveyor curates a context that is unique for each study—and that is more nuanced than the list of studies generated by searching for key words in databases such as Web of Science, Scopus or PubMed.
In more technical detail: The context, or “field” of the target study, consists of the union of all studies within a 3-hop distance of the target-study on the full ISI citation graph and the top-1,000 papers according to tf-idf similarity with the target study. The tf-idf scores are computed using the text of the studies’ abstracts concatenated with the study titles. Standard text pre-processing—lowercasing, lemmatization, and stop-word removal—is applied to the text before computing the tf-idf scores.