Functioning¶
sosia (Italian for doppelgänger) is intended to create control groups for Diff-in-Diff analysis of scientists: Some treatment happens to a scientist, and you need “similar” scientists to whom nothing happened. Similiar means:
Publishes in sources (journals, book series, etc.) the scientist publishes too
Publishes in sources associated with the scientist’s main field
Publishes in the year of treatment
Is not a co-author in the pre-treatment phase
In the year of treatment, has about the same number of publications
Started publishing around the same year as the scientist
In the year of treatment, has about the same number of co-authors
In the year of treatment, has about the same number of citations (excluding self-ciations)
Optional: is affiliated to a similar institution (from a user-provided list of affiliations)
You obtain results after only four steps:
Initiate the class
Define search sources
Define a first search group
Filter the search group to obtain a matching group
Depending on the number of search sources and the first search group, one query may take up to 6 hours. Each query on the Scopus database will make use of your API Key, which allows 5000 requests per week. sosia and pybliometrics makes sure that all information are cached, so that subsequent queries will take less than a minute. The main classes and all methods have a boolean refresh parameter, which steers whether to refresh the cached queries (default is False).