PART I: Creating the Study's Datasets¶

0. Setup¶

Before we begin, make sure you have installed all the required Python packages. (The instructions below use pip. You can use easy_install, too.) Also, consider using virtualenv for a cleaner installation experience instead of sudo. I also recommend to running the code via IPython Notebook.

sudo pip install --upgrade turicreate
sudo pip install --upgrade repoze.lru
sudo pip install --upgrade networkx
sudo pip install --upgrade pymongo

Please download the KDD Cup 2016 data, and please also download the project files from our GitHub repository. Through this research, we use the various constants that appear in consts.py. Please change the DATASETS_AMINER_DIR, DATASETS_BASE_DIR, and SFRAMES_BASE_DIR to your local directories, where you can download the datasets and save the project's SFrames.

Note: Creating the following SFrame requires considerable computation power for long periods.

1. Creating the SFrames¶

In this study, we used the following datasets:

The Microsoft Academic KDD Cup 2016 dataset - The Microsoft Academic KDD Cup Graph dataset (referred to as the MAG 2016 dataset) contains data on over 126 million papers. The main advantage of this dataset is that it has undergone several preprocessing iterations of author entity matching (any author is identified by ID) and paper deduplication. Additionally, the dataset match between papers and their fields of study includes the hierarchical structure and connections between various fields of study.
AMiner dataset - The AMiner dataset contains information on over 154 million papers collected by the AMiner team. The dataset contains papers' abstracts, ISSNs, ISBNs, and details on each paper.
SJR dataset - The SCImago Journal Rank open dataset (referred to as the SJR dataset) contains journals and country specific metric data starting from 1999. In this study, we used the SJR dataset to better understand how various journal metrics have changed over time.

1.1 The Microsoft Academic KDD Cup Dataset¶

The first step is to convert the dataset text files into SFrame objects using the code located under the SFrames creator directory, using the following code.

from create_mag_sframes import *
from configs import *
create_all_sframes() # running this can take considerable time

The above two lines of code will create a set of SFrames with all the dataset data. The SFrames will include data on authors’ papers, keywords, fields of study, and more. Moreover, the code will construct the Extended Papers SFrame, which contains various meta data on each paper in the dataset.

mag_sf = tc.load_sframe(EXTENDED_PAPERS_SFRAME)
mag_sf

In our study, we also analyzed how various authors' attributes, such as the number of published papers, number of coauthors, etc., has changed over time. To achieve this, we created an authors features SFrame using the following code:

from create_mag_authors_sframe import *
a = AuthorsFeaturesExtractor()

#This need to run on a strong server and can take considerable time to run
a_sf = a.get_authors_all_features_sframe()
a_sf #the SFrame can be later loaded using tc.load_sframe(AUTHROS_FEATURES_SFRAME)

The above SFrame contains various features of each author that were constructed based on analyzing the author’s papers that have at least 5 references. If you notice, the author’s SFrame contains each author’s gender prediction. This column was created by obtaining first-name gender statistics from theSSA Baby Names and WikiTree datasets which include over 115 thousands unique first names (see details in geneder_classifier.py).

1.2 The AMiner Dataset¶

After downloading the AMiner website, simply load to an SFrame using the following code:

aminer_sf = tc.SFrame.read_json('%s/*.txt' % AMINER_DATA_DIR,  orient='lines')
aminer_sf # the SFrame can be accessed also by using tc.load_sframe(AMINER_PAPERS_SFRAME)

1.3 The SJR Dataset¶

First, we download all the journal ranking files from the SJR website. Next, we use the following code to create a single SFrame with all the journal data:

from create_sjr_sframe import *
sjr_sf = create_sjr_sframe(SJR_FILES_DIR)
sjr_sf # the SFrame can also be accessed using tc.load_sframe(SJR_SFRAME)

1.4 Joint Datasets¶

The MAG and AMiner datasets have a slightly different set of features. While the MAG dataset contains data on each author with a unique author ID, the AMiner contains additional data on each paper, including the paper's abstract and the paper's ISSN or ISBN. Additionally, the SJR dataset contains data about each journal's ranking.

To combine the data from the author publication record and the journals' rankings, we join the datasets. First, we joined the MAG and AMiner datasets by matching DOI values, using the following code (see also create_mag_aminer_sframe.py):

sf = tc.load_sframe(EXTENDED_PAPERS_SFRAME)
g1 = sf.groupby('Paper Document Object Identifier (DOI)', {'Count': agg.COUNT()})
s1 = set(g1[g1['Count'] > 1]['Paper Document Object Identifier (DOI)'])
sf = sf[sf['Paper Document Object Identifier (DOI)'].apply(lambda doi: doi not in s1 )]
sf.materialize()

sf2 = tc.load_sframe(AMINER_PAPERS_SFRAME)
g2 = sf2.groupby('doi', {'Count': agg.COUNT()})
s2 = set(g2[g2['Count'] > 1]['doi'])
sf2 = sf2[sf2['doi'].apply(lambda doi: doi not in s2 )]
sf2.materialize()

aminer_mag_sf = sf.join(sf2, {'Paper Document Object Identifier (DOI)': 'doi'})
aminer_mag_sf['title_len'] = aminer_mag_sf['title'].apply(lambda t: len(t))
aminer_mag_sf = aminer_mag_sf[aminer_mag_sf['title_len'] > 0]
aminer_mag_sf = aminer_mag_sf.rename({"Paper ID": "MAG Paper ID", "id": "Aminer Paper ID"})
aminer_mag_sf.remove_column('title_len')
aminer_mag_sf # this SFrame can be accessed using tc.load_Sframe(AMINER_MAG_JOIN_SFRAME)

Using the joined dataset, we obtained an SFrame with the joint meta data of 28.9 million papers. We can take this SFrame and join it with the SJR dataset.

import re
def create_aminer_mag_sjr_sframe(year):
    """
    Creates a unified SFrame of AMiner, MAG, and the SJR datasets
    :param year: year to use for SJR data
    :return: SFrame with AMiner, MAG, and SJR data
    :rtype: tc.SFrame
    """
    sf = tc.load_sframe(AMINER_MAG_JOIN_SFRAME)
    sf = sf[sf['issn'] != None]
    sf = sf[sf['issn'] != 'null']
    sf.materialize()
    r = re.compile("(\d+)-(\d+)")
    sf['issn_str'] = sf['issn'].apply(lambda i: "".join(r.findall(i)[0]) if len(r.findall(i))> 0 else None)
    sf = sf[sf['issn_str'] != None]
    sjr_sf = tc.load_sframe(SJR_SFRAME)
    sjr_sf = sjr_sf[sjr_sf['Year'] == year]
    return sf.join(sjr_sf, on={'issn_str': "ISSN"})
create_aminer_mag_sjr_sframe(2015)

2. Loading the Dataset to MongoDB¶

Using Turicreate and SFrame objects can help us get general data on how academic publication dynamics have changed over time, but it would be challenging to use this data to create more complicated insights, such as the trends of a specific journal. To reveal more complicated insights using the data, we would need to load the dataset to a different framework. In this study, we chose to use MongoDB as our framework for more complicated queries. We installed MongoDB on Ubuntu 17.10 using the instructions in the following link. After MongoDB is installed and running, please remember to set the user and password, and update MONGO_HOST & MONGO_PORT vars in consts.py (one can also adjust the connection to include user password auth). Now, the next step is to load the above created SFrames to collections in MongoDB using mongo_connecter.py:

from mongo_connector import *
load_sframe() #this will load the SFrame to a local

In the end of the loading process, six collections will be loaded to the journal database.

MD.client.journals.collection_names()

[u'authoros_features',
 u'sjr_journals',
 u'aminer_mag_papers',
 u'fields_of_study_papers',
 u'papers_features',
 u'authors_features']

In the second part of the tutorial, we will demonstrate how the above created MongoDB collections can be utilized to calculate various statistics on paper collections, authors, journals, and research domains.

Paper ID	Original paper title	Normalized paper title	Paper publish year	Paper publish date
01B27BE8	Evaluating Polarity for Verbal Phraseological ...	evaluating polarity for verbal phraseological ...	2014	2014/11/16
027D0030	Automatic Monitoring the Content of Audio ...	automatic monitoring the content of audio ...	2012	2012/10/27
7CFE299E	Towards a set of Measures for Evaluating Software ...	towards a set of measures for evaluating software ...	2009	2009/11
59BEBE1C	Learning Probability Densities of Optimiza ...	learning probability densities of optimiza ...	2008	2008/10/27
5873C011	Towards a Model for an Immune System ...	towards a model for an immune system ...	2002	2002/04/22
7A1109E4	Approach Towards a Natural Language Anal ...	approach towards a natural language anal ...	2013	2013/11
0B00AFD8	Towards the creation of semantic models based on ...	towards the creation of semantic models based on ...	2012	2012/10/27
5C66D743	Comparison of Neural Networks and Support ...	comparison of neural networks and support ...	2009	2009/11/01
040121AE	Multiple Kernel Support Vector Machine Proble ...	multiple kernel support vector machine proble ...	2014	2014/11/16
7DEADC9A	A Set of Test Cases for Performance Measures in ...	a set of test cases for performance measures in ...	2008

Paper Document Object Identifier (DOI) ...	Original venue name	Normalized venue name	Conference ID mapped to venue name ...
10.1007/978-3-319-13647-9 _19 ...	mexican international conference on artificial ...	micai	42D7146F
10.1007/978-3-642-37807-2 _11 ...	mexican international conference on artificial ...	micai	42D7146F
10.1109/MICAI.2009.15	mexican international conference on artificial ...	micai	42D7146F
10.1007/978-3-540-88636-5 _25 ...	mexican international conference on artificial ...	micai	42D7146F
10.1007/3-540-46016-0_42	mexican international conference on artificial ...	micai	42D7146F
	mexican international conference on artificial ...	micai	42D7146F
10.1007/978-3-642-37807-2 _26 ...	mexican international conference on artificial ...	micai	42D7146F
10.1007/978-3-642-05258-3 _42 ...	mexican international conference on artificial ...	micai	42D7146F
10.1007/978-3-319-13650-9 _14 ...	mexican international conference on artificial ...	micai	42D7146F
	mexican international conference on artificial ...	micai	42D7146F

Paper rank	Ref Number	Total Citations by Year	Total Citations by Year without Self Citations ...	Authors List Sorted
19517	21	None	None	[834A11E2, 7E8BA14F, 852B2668] ...
19444	7	None	None	[7DB8825E, 6936139F]
18870	14	{'2015': 10.0, '2014': 9.0, '2011': 3.0, '20 ...	{'2015': 8.0, '2014': 7.0, '2011': 1.0, '20 ...	[81867464, 8106CCE6, 7D20CE86, 7C6C6BB9] ...
19444	7	None	None	[807DCA23, 811B0352, 2779E3F4] ...
19177	8	{'2003': 1.0, '2006': 3.0, '2007': 3.0, '20 ...	{'2003': 1.0, '2006': 2.0, '2007': 2.0, '20 ...	[7F553272, 7F830ACE, 7E7F1E07] ...
19555	0	None	None	[7EE331AF]
19476	10	None	None	[80CF45DD, 814339C1, 7ED13F21, 7F45D6E4] ...
19428	9	{'2015': 1.0, '2014': 1.0} ...	{'2015': 1.0, '2014': 1.0} ...	[7E2F72E3, 45F06265]
19468	9	None	None	[7677E6C4, 7EBBEA7F, 7F312412, 776D4ECC, ...
19394	9	{'2015': 3.0, '2014': 2.0, '2013': 2.0, '20 ...	{'2015': 3.0, '2014': 2.0, '2013': 2.0, '20 ...	[7E792787, 7EC1BF2D, 7897839F] ...

Keywords List	Field of study list	Field of study list names	Fields of study parent list (L0) ...
None	None	None	None
None	None	None	None
[measures, software measurement, autonomy, ...	[0A9CB5A9, 0556B228, 03E623B0, 0ABCEA76, ...	[Measure, Software measurement, Autonomy, ...	[0271BC14, 0895A350, 0205A1DB, 07982D63] ...
[optimization problem, probability density] ...	[083736DA, 0BBED543]	[None, Probability density function] ...	[0205A1DB]
[process algebra, process calculi, multi agent ...	[09A47029, 09A47029, 027A0232, 027A0232, ...	[Process calculus, Process calculus, Multi- ...	[0271BC14]
[cognition, computational linguistics, grammars, ...	[0A2079AC, 093E8748, 03365AB6, 044294F0, ...	[Cognition, Computational linguistics, Rule-based ...	[0271BC14, 00F03FC7]
[computer aided design, cad, ontologies] ...	[07245C42, 0B9C400C, 09F001E0] ...	[Computer Aided Design, None, Ontology] ...	[0271BC14]
[dynamic system, dynamic systems, neural network ...	[0AA68668, 0304C748, 0304C748, 0AA68668, ...	[Dynamical system, Artificial neural ...	[0271BC14, 0B0FEB68]
None	None	None	None
[multiobjective optimization] ...	[04198571]	[Multi-objective optimization] ...	[]

Fields of study parent list names (L0) ...	Fields of study parent list (L1) ...	Fields of study parent list names (L1) ...	Fields of study parent list (L2) ...
None	None	None	None
None	None	None	None
[Computer Science, Sociology, Mathematics, ...	[0BE4BA29, 0765A2E4, 093C4716, 06E88D7C] ...	[Law, Data mining, Artificial intelligence, ...	[00F36ADC, 05A3DFDE]
[Mathematics]	[064E5072]	[Statistics]	[007E3B49]
[Computer Science]	[0C19BFCD, 0BE20181, 093C4716] ...	[Immunology, Programming language, Artificial ...	[027A0232]
[Computer Science, Psychology] ...	[0BE20181, 0C2DB2A7]	[Programming language, Natural language ...	[00E4DDF6, 0C199D1F, 093E8748, 044294F0] ...
[Computer Science]	[093C4716]	[Artificial intelligence]	[07245C42]
[Computer Science, Chemistry] ...	[0724DFBA]	[Machine learning]	[0304C748, 097464D7]
None	None	None	None
[]	[07868074]	[Mathematical optimization] ...	[02724C38]

Fields of study parent list names (L2) ...	Authors Number	Urls	Fields of study parent list (L3) ...
None	3	[http://link.springer.com /content/pdf/10.1007% ...	None
None	2	[http://dl.acm.org/citati on.cfm?id=2481834, ht ...	None
[Project management, Politics] ...	4	[http://ieeexplore.ieee.o rg/xpl/abstractAuthor ...	[0059F32E, 0556B228, 03E623B0, 0ABCEA76, ...
[Stochastic process]	3	[http://dx.doi.org/10.100 7/978-3-540-88636-5_2 ...	[0BBED543]
[Multi-agent system]	3	[http://dl.acm.org/citati on.cfm?id=691909, htt ...	[09A47029, 0087AC0D]
[Speech synthesis, Machine translation, ...	1	[http://ieeexplore.ieee.o rg/lpdocs/epic03/wrap ...	[0322F49A, 0A2079AC, 03365AB6, 041AB807] ...
[Computer Aided Design]	4	[http://dl.acm.org/citati on.cfm?id=2481852, ht ...	[09F001E0]
[Artificial neural network, Nonlinear ...	2	[http://adsabs.harvard.ed u/abs/2009LNCS.5845.. ...	[00BB2E8D, 0AA68668, 2078A8D7] ...
None	5	[http://link.springer.com /content/pdf/10.1007% ...	None
[Linear programming]	3	[http://dx.doi.org/10.100 7/978-3-540-88636-5_4 ...	[04198571]

Author ID	Papers by Years Dict	Coauthors by Years Dict	Affilation by Year Dict
00001F05	{2010: ['5DA0F250'], 2013: ['7AF8ABFE']} ...	{2013: ['77CE16EC', '17B20BAE']} ...	{2010: [''], 2013: ['']}
00002AD3	{2009: ['7A0B348F'], 2010: ['795F56C6'], 2 ...	{2009: ['7FD1B86A', '7921EA7D', '05390F01', ...	{2009: [''], 2010: [''], 2012: [''], 2006: [''], ...
00006A31	{2009: ['7CFAEB15']}	{2009: ['7E24B147', '7C3ED158', '79A6FF42', ...	{2009: ['']}
0000B5FA	{2008: ['7714EB4E'], 2009: ['78B7C257'], 2 ...	{2008: ['54648B9B'], 2009: ['7ADCCDB0', ...	{2008: [''], 2009: [''], 2010: ['', '', ''], 2 ...
0001CF9B	{2013: ['7C9BBC3A']}	{2013: ['852809D8', '77FE1F64', '80B5223D', ...	{2013: ['']}
00040294	{2009: ['7892886A', '81263516', '81424AA7'], ...	{2009: ['75D367F6', '82C1C4DE', '80824D21', ...	{2009: ['', '', ''], 2011: ['', ''], 2004: ...
00045553	{1987: ['77F10A3D'], 1988: ['7836E5B8'], 1 ...	{1987: ['85CAEB12'], 1988: ['819D7046', ...	{1987: ['new york medical college'], 1988: [''], ...
0004B8AF	{2010: ['77AFDEB4']}	{2010: ['77227437', '77CB65A7', '5EBF97A1', ...	{2010: ['']}
000510E2	{2011: ['80790612'], 2012: ['76E5D7F2', ...	{2011: ['82D84635', '11F1B283'], 2012: ...	{2011: ['university of queensland'], 2012: ['', ...
00063841	{2014: ['7790AFD4']}	{2014: ['853EBBF2', '7901305E', '0F71473E', ...	{2014: ['']}

Sequence Number by Year Dict ...	Author name	First name	Last name	Conference ID by Year Dict ...
{2010.0: array('d', [1.0]), 2013.0: ...	nancy praill	nancy	praill	{2010: [''], 2013: ['']}
{2009.0: array('d', [1.0]), 2010.0: ...	david s rebergen	david	rebergen	{2009: [''], 2010: [''], 2012: [''], 2006: [''], ...
{2009.0: array('d', [6.0])} ...	b zelazowska	b	zelazowska	{2009: ['']}
{2008.0: array('d', [1.0]), 2009.0: ...	lars goerigk	lars	goerigk	{2008: [''], 2009: [''], 2010: ['', '', ''], 2 ...
{2013.0: array('d', [5.0])} ...	orlando lastres danguillecourt ...	orlando	danguillecourt	{2013: ['']}
{2009.0: array('d', [4.0, 7.0, 6.0]), 2011.0: ...	ivani bisordi	ivani	bisordi	{2009: ['', '', ''], 2011: ['', ''], 2004: ...
{1987.0: array('d', [1.0]), 1988.0: ...	miguel a pappolla	miguel	pappolla	{1987: [''], 1988: [''], 1989: [''], 1990: ['', ...
{2010.0: array('d', [7.0])} ...	dong zaijie	dong	zaijie	{2010: ['']}
{2011.0: array('d', [1.0]), 2012.0: ...	fairlie mcilwraith	fairlie	mcilwraith	{2011: [''], 2012: ['', '', ''], 2013: [''], ...
{2014.0: array('d', [8.0])} ...	tiziano ponsetti	tiziano	ponsetti	{2014: ['']}

Journal ID by Year Dict	Venue by Year Dict	Gender Dict
{2010: [''], 2013: ['']}	{2010: [''], 2013: ['']}	{'Gender': 'Female', 'Total Males': 2999, ...
{2009: ['0959867B'], 2010: ['0069C535'], 2 ...	{2009: ['Journal of Occupational and ...	{'Gender': 'Male', 'Total Males': 3700247, 'Total ...
{2009: ['036625C9']}	{2009: ['Advances in Medical Sciences']} ...	{'Gender': 'Unisex', 'Total Males': 536, ...
{2008: ['0A1986D0'], 2009: ['0ACEE946'], 2 ...	{2008: ['ChemPhysChem'], 2009: ['Physical ...	{'Gender': 'Male', 'Total Males': 12459, 'Total ...
{2013: ['01F41F83']}	{2013: ['International Journal of Energy ...	{'Gender': 'Male', 'Total Males': 47535, 'Total ...
{2009: ['08826C6E', '0B483532', '05F694A1'], ...	{2009: ['Infection, Genetics and Evolution', ...	{'Gender': 'Female', 'Total Males': 0, 'Total ...
{1987: ['096E1E70'], 1988: ['03C89659'], 1 ...	{1987: ['Synapse'], 1988: ['Human Pathology'], ...	{'Gender': 'Male', 'Total Males': 173865, 'Total ...
{2010: ['0B0C2E2F']}	{2010: ['Aquaculture Research']} ...	{'Gender': 'Male', 'Total Males': 317, 'Total ...
{2011: ['080AF648'], 2012: ['068E6FF5', '', ...	{2011: ['Drug and Alcohol Review'], 2012: ['Drug ...	{'Gender': 'Unisex', 'Total Males': 7, 'Total ...
{2014: ['06FD8B4A']}	{2014: ['Catalysis Today']} ...	{'Gender': 'Male', 'Total Males': 37, 'Total ...

abstract	authors	doi	id
None	[{'name': 'G. Adam'}, {'name': 'K. Schreibe ...	10.1002/ange.19650770204	53e99784b7602d9701f3e130
None	[{'name': 'R. Farahbod'}, {'name': 'V. Gervasi'}, ...	None	53e99784b7602d9701f3e131
The method to making technology roadmap is ...	[{'name': 'MO Chou'}, {'name': 'CHEN Jiqing'}, ...	None	53e99784b7602d9701f3e132
Drought is the first place in all the natural ...	[{'name': 'Peijuan Wang'}, {'name': 'Jiahua ...	10.1109/IGARSS.2011.60495 03 ...	53e99784b7602d9701f3e133
Determination of total sugar can serve to ...	[{'org': 'Yantai Institute of Coastal ...	None	53e99784b7602d9701f3e135
Resumen: Uno de los problemas que debemos ...	[{'name': 'CELSO VARGAS'}] ...	None	53e99784b7602d9701f3e136
None	[{'name': 'D J Lum'}, {'name': 'V Upadhyay'}, ...	10.1111/j.1365-2559.2007. 02817.x ...	53e99784b7602d9701f3e137
This paper discussed the planning and design ...	[{'org': 'School of Resource and ...	None	53e99784b7602d9701f3e139
Rough set is a mathematical tool to ...	None	None	53e99784b7602d9701f3e13a
None	[{'name': 'F THOUVENYPAISANT'}, ...	10.1016/S0221-0363(05)762 74-0 ...	53e99784b7602d9701f3e13b

references	title	url	venue
[53e9a6e6b7602d970301a47d , ...	1.4-N→N′-Acylwanderun g bei einem ...	[http://dx.doi.org/10.100 2/ange.19650770204] ...	Angewandte Chemie
[53e9a1d0b7602d9702ac8f1b , ...	Design and Specification of the CoreASM Execution ...	None	None
None	Practice Research on Technology Roadmap for ...	None	Science and Technology Management Research ...
[53e999c3b7602d970220b9b7 , ...	The relationship between canopy parameters and ...	[http://dx.doi.org/10.110 9/IGARSS.2011.6049503] ...	IGARSS
None	The effect of metabolites on the determination of ...	None	Food Science and Technology ...
None	El Humanista y la Energía Nuclear ...	None	None
[53e9b395b7602d9703e78794 , ...	Botryoid fibroepithelial polyp of the urinary ...	[http://dx.doi.org/10.111 1/j.1365-2559.2007.02 ...	Histopathology
None	Planning and Design Method of Land ...	None	Journal of Anhui Agricultural Sciences ...
None	A Data Mining Based on Rough Set Theory ...	None	Software Guide
None	RI1 Embolisation des varices stomiales par ...	[http://dx.doi.org/10.101 6/S0221-0363(05)76274 ...	Journal De Radiologie

volume	year
77	1965
None	None
None	2013
null	2011
None	2012
None	2013
51	2007
None	2012
None	2012
86	2005

Rank	Title	Type	SJR	SJR Best Quartile	H index	Total Docs.	Total Docs. (3years)
1	Astrophysical Journal Letters ...	journal	61.473	Q1	82	5	7
1	Astrophysical Journal Letters ...	journal	61.473	Q1	82	5	7
2	Annual Review of Biochemistry ...	journal	49.476	Q1	248	30	81
2	Annual Review of Biochemistry ...	journal	49.476	Q1	248	30	81
3	Cell	journal	41.978	Q1	616	354	1359
3	Cell	journal	41.978	Q1	616	354	1359
4	Annual Review of Immunology ...	journal	40.906	Q1	254	29	81
4	Annual Review of Immunology ...	journal	40.906	Q1	254	29	81
5	Annual Review of Cell and Developmental Biology ...	book serie	33.882	Q1	182	25	61
5	Annual Review of Cell and Developmental Biology ...	book serie	33.882	Q1	182	25	61

Total Refs.	Total Cites (3years)	Citable Docs. (3years)	Cites / Doc. (2years)	Ref. / Doc.	Country
350	493	7	72.75	70.0	United Kingdom
350	493	7	72.75	70.0	United Kingdom
5913	3445	80	35.38	197.1	United States
5913	3445	80	35.38	197.1	United States
15870	47390	1328	34.36	44.83	United States
15870	47390	1328	34.36	44.83	United States
5236	4030	81	46.69	180.55	United States
5236	4030	81	46.69	180.55	United States
4134	1770	60	26.55	165.36	United States
4134	1770	60	26.55	165.36	United States

Year	Categories	ISSN
1999		20418213
1999		20418205
1999		15454509
1999		00664154
1999		00928674
1999		10974172
1999		07320582
1999		15453278
1999		15308995
1999		10810706

MAG Paper ID	Original paper title	Normalized paper title	Paper publish year	Paper publish date
7C15F682	Ptychoptera deleta Novak, 1877 from the Early ...	ptychoptera deleta novak 1877 from the early ...	2011	2011
84A37D36	Sherborn’s foraminiferal studies ...	sherborn s foraminiferal studies and their ...	2016	2016/07/01
773B216E	A new species of hydrobiid snails ...	a new species of hydrobiid snails moll ...	2011	2011/10/19
77C44F83	Revision of the planthopper genus ...	revision of the planthopper genus ...	2014	2014/10/12
75233F3E	Female genitalia of Seasogonia Young from ...	female genitalia of seasogonia young from ...	2012	2012/11/01
7B5321C5	A taxonomic study on the genus Ettchellsia ...	a taxonomic study on the genus ettchellsia cam ...	2012	2012
3C77A5B8	An Asiatic Chironomid in Brazil: morphology, DNA ...	an asiatic chironomid in brazil morphology dna ...	2015	2015/07/27
8051111A	A new species of Smicromorpha ...	a new species of smicromorpha hymenoptera ...	2009	2009/09/14
79BD8F37	Open exchange of scientific knowledge and ...	open exchange of scientific knowledge and ...	2014	2014/06/06
240B7EFF	Four new species of Epicephala Meyrick, 1880 ...	four new species of epicephala meyrick 1880 ...	2015	2015/06/15

Paper Document Object Identifier (DOI) ...	Original venue name	Normalized venue name	Journal ID mapped to venue name ...
10.3897/zookeys.130.1401	ZooKeys	zookeys	0BDFC074
10.3897/zookeys.550.9863	ZooKeys	zookeys	0BDFC074
10.3897/zookeys.138.1927	ZooKeys	zookeys	0BDFC074
10.3897/zookeys.462.6657	ZooKeys	zookeys	0BDFC074
10.3897/zookeys.164.2132	ZooKeys	zookeys	0BDFC074
10.3897/zookeys.254.4182	ZooKeys	zookeys	0BDFC074
10.3897/zookeys.514.9925	ZooKeys	zookeys	0BDFC074
10.3897/zookeys.20.195	ZooKeys	zookeys	0BDFC074
10.3897/zookeys.414.7717	ZooKeys	zookeys	0BDFC074
10.3897/zookeys.508.9479	ZooKeys	zookeys	0BDFC074

Keywords List	Field of study list	Field of study list names	Fields of study parent list (L0) ...
[tertiary, biomedical research, neogene, ...	[009377C6, 0660586C, 01A380F9, 039D5C06] ...	[Tertiary, None, Neogene, Bioinformatics] ...	[]
None	None	None	None
[biomedical research, bioinformatics] ...	[0660586C, 039D5C06]	[None, Bioinformatics]	[]
None	None	None	None
[morphology, taxonomy]	[06A2C3F5, 037ECF39]	[Morphology, Taxonomy]	[052C8328]
[taxonomy, bioinformatics, ...	[039D5C06, 037ECF39, 0660586C] ...	[Bioinformatics, Taxonomy, None] ...	[052C8328]
None	None	None	None
[morphology, taxonomy]	[06A2C3F5, 037ECF39]	[Morphology, Taxonomy]	[052C8328]
[taxonomy, intellectual property rights, ...	[037ECF39, 0215A9CE, 039D5C06, 0660586C] ...	[Taxonomy, Intellectual property, Bioinformat ...	[0895A350, 052C8328]
None	None	None	None

Fields of study parent list names (L2) ...	Authors Number	Urls	abstract
[Stratigraphy]	2	[/pmc/articles/instance/3 260767/?report=abstra ...	The first fossil that was described in ...
None	1	[http://zookeys.pensoft.n et/lib/ajax_srv/artic ...	None
[]	1	[http://bionames.org/refe rences/b38c8055b3453e ...	Anew minute valvatiform species belonging to the ...
None	2	[http://www.researchgate. net/publication/27090 ...	Chinese species in the genus Nycheuma Fennah, ...
[Taxonomy, Morphology]	2	[/pmc/articles/instance/3 272620/?report=abstra ...	Seasogonia Young, 1986 is a sharpshooter genus ...
[Taxonomy]	2	[http://bionames.org/refe rences/4078cc33be92d7 ...	Three new species of Ettchellsia Cameron, ...
None	4	[http://www.researchgate. net/publication/28071 ...	None
[Taxonomy, Morphology]	2	[http://www.cabdirect.org /abstracts/2009331573 ...	None
[Taxonomy]	4	[http://advocomplex.ch/fi les/Open%20exchange%2 ...	Background. The 7(th) Framework Programme for ...
None	3	[http://www.ncbi.nlm.nih. gov/pubmed/26167120, ...	None

Paper rank	Ref. Number	Total Citations by Year	Total Citations by Year without Self Citations ...	Authors List Sorted
19382	7	{'2015': 1.0}	{'2015': 1.0}	[855C02FD, 7B2C4199]
19555	None	None	None	[84CB5028]
19402	12	{'2015': 3.0, '2014': 3.0, '2011': 1.0, '20 ...	{'2015': 1.0, '2014': 1.0, '2011': 1.0, '20 ...	[8439D30B]
19555	None	None	None	[84C55AD3, 7E095DC6]
19427	6	None	None	[805137B8, 80CA5307]
19370	4	None	None	[80F9A983, 7FFB2555]
19555	None	None	None	[6118B891, 7FE3B9C3, 79CC73E7, 7C17044D] ...
19157	3	{'2015': 4.0, '2014': 3.0, '2013': 2.0, '20 ...	{'2015': 4.0, '2014': 3.0, '2013': 2.0, '20 ...	[85B81F06, 7E5ABA3D]
19321	5	{'2015': 3.0, '2014': 1.0} ...	{'2015': 2.0}	[7237B1F9, 7DAD3B1C, 78F96D88, 78A6ED0B] ...
19555	None	None	None	[7CFFBD65, 862455E0, 7D640C0F] ...

Fields of study parent list names (L0) ...	Fields of study parent list (L1) ...	Fields of study parent list names (L1) ...	Fields of study parent list (L2) ...
[]	[090B39EA, 039D5C06]	[Paleontology, Bioinformatics] ...	[0683829C]
None	None	None	None
[]	[039D5C06]	[Bioinformatics]	[]
None	None	None	None
[Biology]	[027F4522]	[Linguistics]	[037ECF39, 06A2C3F5]
[Biology]	[039D5C06]	[Bioinformatics]	[037ECF39]
None	None	None	None
[Biology]	[027F4522]	[Linguistics]	[037ECF39, 06A2C3F5]
[Sociology, Biology]	[0BE4BA29, 039D5C06]	[Law, Bioinformatics]	[037ECF39]
None	None	None	None

authors	Aminer Paper ID	isbn	issn	issue	keywords	lang
[{'org': 'Institute of Biology, Pedagogical ...	55a4881d65ce31bc877df20d	None	1313-2970	130	[cheb basin, cypris formation, czech ...	en
[{'name': 'giles miller'}] ...	56d81fffdabfae2eeeb5906d	None	None	None	None	en
[{'org': 'Department of Ecology and Systematics, ...	55a47a6865ce31bc877c5f09	None	1313-2970	138	[caenogastropoda, daphniola eptalophos sp. ...	en
[{'org': 'The Special Key Laboratory for ...	55a6b03265ce054aad712ec1	None	1313-2989	462	[delphacini, fulgoroidea, hemiptera, nycheuma, new ...	en
[{'org': 'Institute of Entomology, Guizhou ...	55a4953165ceb7cb02d2d096	None	1313-2970	164	[auchenorrhyncha, cicadellinae, ...	en
[{'org': 'Laboratory of Entomology, Faculty of ...	55a51ace65ceb7cb02e11911	None	1313-2989	254	[south east asia, taxonomy, parasitic ...	en
[{'name': 'gizelle amora'}, {'name': 'neusa ...	56d81fffdabfae2eeeb59087	None	None	None	None	en
[{'name': 'd c darling'}, {'name': 'norman f ...	53e9ac5cb7602d970362f86d	None	None	0	[morphology, taxonomy]	en
[{'org': 'Plazi, Zinggstrasse 16, 3007 ...	55a676ab65ce054aad68df46	None	1313-2989	414	[biodiversity knowledge, european copyright, ...	en
[{'name': 'houhun li'}, {'name': 'zhibo wang'}, ...	56d82000dabfae2eeeb59093	None	None	None	None	en

n_citation	page_end	page_start	pdf	references	title	...
3	305	299	None	[53e99c60b7602d9702511119 , ...	Ptychoptera deleta Novák, 1877 from the ...	...
None	None	None	None	None	Sherborn’s foraminiferal studies ...	...
None	64	53	None	[56d8e4efdabfae2eee2affe9 , ...	A new species of hydrobiid snails ...	...
None	57	47	None	None	Revision of the planthopper genus ...	...
None	40	24	None	[56d92143dabfae2eee9f1671 , ...	Female genitalia of Seasogonia Young from ...	...
None	108	99	None	[53e99ddab7602d970269b655 , ...	A taxonomic study on the genus Ettchellsia ...	...
None	None	None	None	None	An Asiatic Chironomid in Brazil: morphology, DNA ...	...
None	None	None	None	[56d85c7cdabfae2eee5a5fe6 , ...	A new species of Smicromorpha ...	...
9	135	109	None	[55a4825f65ce31bc877d426d , ...	Open exchange of scientific knowledge and ...	...
None	None	None	None	None	Four new species of Epicephala Meyrick, 1880 ...	...