15
特殊序言 : 普赖斯奖获得者给年轻学者的寄语
When carrying out scientometric (including webometric
and altmetric) analyses the gathering of data is crucial to the
analytic outcome. From the 60es manual as well as (semi)
automatic information retrieval (IR) is hence applied to online
databases of various kinds and, in recent decades, also to the
WWW. Aside from citation databases like Scopus and Web
of Science, field-dependent and institutional repositories as
well as Web crawlers are used to gather data – depending on
the purpose of the scientometric analysis. In such cases IR
involves a high degree of knowledge of field structures and
retrieval possibilities in the databases used. However, often the
IR options are made very limited due to commercial secrecies,
e.g., in search engines like Google and Bing; one hardly knows
why the retrieval outcome looks the way it does. One should
remember that Google originally (and falsely) in 1998 assumed
web inlinks to be like citations, that is, the more inlinks to
an object, the more recognized it is, and thus assumed (again
falsely) more relevant. But it worked. Later on Google began to
mix many other parameters to produce its ranked lists of objects.
Bing’s retrieval algorithms was originally based on text-based
IR research and relevance as understood in IR; other parameters
are supposedly also included. In addition, the two search
engines’ crawlers are retrieving different data when crawling
the Web. Hence, it is always a good idea to search both engines
for the same query, download the data and find the overlap as
well as separate information. Another important limitation is the
question of coverage of the data source used. In the cases of the
large and recognized field-dependent and citation databases one
may argue that the latter cover almost all the central journals in
the academic landscape, although there are smaller differences
detected in some academic fields between Scopus and Web of
Science. Hence, one can argue that the extraction from those
sources does not constitute a sample but is the actual population.
This can be seen in contrast to most webometric and altmetric
measures for which analyses commonly operate with samples
of unknown populations. Thus, coverage, sampling and the way
data is combined in the scientometric analysis influence the way
statistics should be carried out.
彼得·英格森
Peter Ingwersen
(1947 —)
(Denmark)