Match Authors in Scopus automatically with sosia¶
sosia (Italian for doppelgänger) finds researchers that are similar to another one. Use the matching researcher as a control in Diff-in-Diff anlyses. sosia is developed and described by econometricians for scientists of science.
sosia does not pre-compute annual characteristics to find controls. Instead, sosia searches the entire Scopus database via pybliometrics. Configure both–and let sosia find a match for you.
Example¶
Install sosia from PyPI using the console or command line interpreter:
$ pip install sosia
In Python, set up sosia (and eventually pybliometrics) and search for similar scientists using their Scoups Author Profile IDs.
>>> import sosia
>>>
>>> sosia.create_fields_sources_list() # Necessary only once
>>> sosia.make_database() # Necessary only once
>>>
>>> stefano = sosia.Original(55208373700, 2019) # Scopus ID and year
>>> stefano.define_search_sources() # Sources similiar to scientist
>>> stefano.define_search_group() # Authors publishing in similar sources
>>> stefano.find_matches() # Find matches satisfying all criteria
>>> print(stefano.matches)
>>> ['55022752500', '55810688700', '55824607400']
>>> stefano.inform_matches() # Optional step to provide additional information
>>> print(stefano.matches[0])
Match(ID='55022752500', name='Van der Borgh, Michel', first_name='Michel',
surname='Van der Borgh', first_year=2012, num_coauthors=6, num_publications=5,
num_citations=33, num_coauthors_period=6, num_publications_period=5,
num_citations_period=33, subjects=['BUSI', 'COMP', 'SOCI'], country='Netherlands',
affiliation_id='60032882', affiliation='Eindhoven University of Technology,
Department of Industrial Engineering & Innovation Sciences', language='eng',
reference_sim=0.0, abstract_sim=0.1217)
Full reference:
Original (scientist, treatment_year[, …]) |
Representation of a scientist for whom to find a control scientist. |
Citation¶
If sosia helped you getting data for research, please cite our corresponding paper:
- Rose, Michael E. and Stefano H. Baruffaldi: “Finding Doppelgängers in Scopus: How to Build Scientists Control Groups Using Sosia”, Max Planck Institute for Innovation & Competition Research Paper No. 20-20.
Citing the paper helps the development of sosia, because it justifies funneling resources into the development. It also signals that you created your control group in a transparent and replicable way.