sosia.Original

class sosia.Original(scientist, year, year_margin=1, pub_margin=0.1, coauth_margin=0.1, refresh=False)[source]

Class to represent a scientist for which we want to find a control group.

Parameters:
  • scientist (str or int) – Scopus Author ID of the scientist you want to find control groups for.
  • year (str or numeric) – Year of the event. Control groups will be matched on trends and characteristics of the scientist up to this year.
  • year_margin (numeric (optional, default=1)) – Number of years by which the search for authors publishing around the year of the focal scientist’s year of first publication should be extend in both directions.
  • pub_margin (numeric (optional, default=0.1)) – The left and right margin for the number of publications to match possible matches and the scientist on. If the value is a float, it is interpreted as percentage of the scientists number of publications and the resulting value is rounded up. If the value is an integer it is interpreted as fixed number of publications.
  • coauth_margin (numeric (optional, default=0.1)) – The left and right margin for the number of coauthors to match possible matches and the scientist on. If the value is a float, it is interpreted as percentage of the scientists number of coauthors and the resulting value is rounded up. If the value is an integer it is interpreted as fixed number of coauthors.
  • refresh (boolean (optional, default=False)) – Whether to refresh all cached files or not.
define_search_group(stacked=False, verbose=False, refresh=False)[source]

Define search_group.

Parameters:
  • stacked (bool (optional, default=False)) – Whether to combine searches in few queries or not. Cached files with most likely not be resuable. Set to True if you query in distinct fields or you want to minimize API key usage.
  • verbose (bool (optional, default=False)) – Whether to report on the progress of the process.
  • refresh (bool (optional, default=False)) – Whether to refresh cached search files.
define_search_sources(verbose=False)[source]

Define .search_sources.

Parameters:verbose (bool (optional, default=False)) – Whether to report on the progress of the process.
find_matches(stacked=False, verbose=False, refresh=False)[source]

Find matches within search_group based on three criteria: 1. Started publishing in about the same year 2. Has about the same number of publications in the year of treatment 3. Has about the same number of coauthors in the year of treatment 4. Affiliation was in the same country in the year of treatment

Parameters:
  • stacked (bool (optional, default=False)) – Whether to combine searches in few queries or not. Cached files with most likely not be resuable. Set to True if you query in distinct fields or you want to minimize API key usage.
  • verbose (bool (optional, default=False)) – Whether to report on the progress of the process.
  • refresh (bool (optional, default=False)) – Whether to refresh cached search files.
coauthors

Set of coauthors of the scientist on all publications until the given year.

country

Country of the scientist’s most frequent affiliation in the most recent year (before the given year) that the scientist published.

fields

The fields of the scientist until the given year, estimated from the sources (journals, books, etc.) she published in.

first_year

The scientist’s year of first publication, as integer.

main_field

The scientist’s main field of research, as tuple in the form (ASJC code, general category).

publications

The publications of the scientist published until the given year.

search_group

The set of authors that might be matches to the scientist. The set contains the intersection of all authors publishing in the given year as well as authors publishing around the year of first publication. Some authors with too many publications in the given year and authors having published too early are removed.

Notes

Property is initiated via .define_search_group().

search_sources

The set of sources (journals, books) comparable to the sources the scientist published in until the given year. A sources is comparable if is belongs to the scientist’s main field but not to fields alien to the scientist, and if the types of the sources are the same as the types of the sources in the scientist’s main field where she published in.

Notes

Property is initiated via .define_search_sources().

sources

The Scopus IDs of sources (journals, books) in which the scientist published until the given year.