Match Authors in Scopus automatically with sosia

sosia (Italian for doppelgänger) finds researchers that are similar to another one. Use the matching researcher as a control in Diff-in-Diff anlyses. sosia is developed and described by econometricians for scientists of science.

sosia does not pre-compute annual characteristics to find controls. Instead, sosia searches the entire Scopus database via pybliometrics. Configure both–and let sosia find a match for you.

Example

Install sosia from PyPI using the console or command line interpreter:

$ pip install sosia

In Python, set up sosia (and eventually pybliometrics) and search for similar scientists using their Scoups Author Profile IDs.

>>> import sosia
>>>
>>> sosia.get_field_source_information()  # Necessary only once
>>> sosia.make_database()  # Necessary only once
>>>
>>> stefano = sosia.Original(55208373700, 2019)  # Scopus ID and year
>>> stefano.define_search_sources()  # Sources similiar to scientist
>>> stefano.define_search_group()  # Authors publishing in similar sources
>>> stefano.find_matches()  # Find matches satisfying all criteria
>>> print(stefano.matches)
>>> [55320703900, 55817553500, 56113324000, 56276429200]
>>> stefano.inform_matches()  # Optional step to provide additional information
>>> print(stefano.matches[0])
Match(ID=55320703900, name='Arts, Sam', first_name='Sam', surname='Arts',
      first_year=2012, num_coauthors=9, num_publications=8, num_citations=74,
      num_coauthors_period=None, num_publications_period=None, num_citations_period=None, subjects=['BUSI', 'ECON', 'DECI'],
      affiliation_country='Belgium', affiliation_id='60025063',
      affiliation_name='KU Leuven', affiliation_type='univ',
      language='eng', num_cited_refs=28)

Full reference:

Original(scientist, treatment_year[, ...])

Representation of a scientist for whom to find a control scientist.

Citation

If sosia helped you getting data for research, please cite our corresponding paper:

Citing the paper helps the development of sosia, because it justifies funneling resources into the development. It also signals that you created your control group in a transparent and replicable way.

Indices and tables