{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# PART I: Creating the Study's Datasets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 0. Setup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before we begin, make sure you have installed all the required Python packages. (The instructions below use pip. You can use easy_install, too.) Also, consider using virtualenv for a cleaner installation experience instead of sudo. I also recommend to running the code via IPython Notebook.\n", "* sudo pip install --upgrade turicreate\n", "* sudo pip install --upgrade repoze.lru\n", "* sudo pip install --upgrade networkx\n", "* sudo pip install --upgrade pymongo\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Please download the KDD Cup 2016 data, and please also download the project files from our GitHub repository. Through this research, we use the various constants that appear in consts.py. Please change the DATASETS_AMINER_DIR, DATASETS_BASE_DIR, and SFRAMES_BASE_DIR to your local directories, where you can download the datasets and save the project's SFrames.\n", "\n", "**Note: Creating the following SFrame requires considerable computation power for long periods.** " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1. Creating the SFrames" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this study, we used the following datasets:\n", "* [The Microsoft Academic KDD Cup 2016 dataset](https://kddcup2016.azurewebsites.net/Data) - The Microsoft Academic KDD Cup Graph dataset (referred to as the MAG 2016 dataset) contains data on over 126 million papers. The main advantage of this dataset is that it has undergone several preprocessing iterations of author entity matching (any author is identified by ID) and paper deduplication. Additionally, the dataset match between papers and their fields of study includes the hierarchical structure and connections between various fields of study.\n", "\n", "* [AMiner dataset](https://aminer.org/open-academic-graph) - The AMiner dataset contains information on over 154 million papers collected by the AMiner team. The dataset contains papers' abstracts, ISSNs, ISBNs, and details on each paper.\n", "\n", "* [SJR dataset](http://www.scimagojr.com/journalrank.php) - The SCImago Journal Rank open dataset (referred to as the SJR dataset) contains journals and country specific metric data starting from 1999. In this study, we used the SJR dataset to better understand how various journal metrics have changed over time.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.1 The Microsoft Academic KDD Cup Dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first step is to convert the dataset text files into SFrame objects using the code located under the SFrames creator directory, using the following code." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from create_mag_sframes import *\n", "from configs import *\n", "create_all_sframes() # running this can take considerable time" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above two lines of code will create a set of SFrames with all the dataset data. The SFrames will include data on authors’ papers, keywords, fields of study, and more. Moreover, the code will construct the Extended Papers SFrame, which contains various meta data on each paper in the dataset." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Paper IDOriginal paper titleNormalized paper titlePaper publish yearPaper publish date
01B27BE8Evaluating Polarity for
Verbal Phraseological ...
evaluating polarity for
verbal phraseological ...
20142014/11/16
027D0030Automatic Monitoring the
Content of Audio ...
automatic monitoring the
content of audio ...
20122012/10/27
7CFE299ETowards a set of Measures
for Evaluating Software ...
towards a set of measures
for evaluating software ...
20092009/11
59BEBE1CLearning Probability
Densities of Optimiza ...
learning probability
densities of optimiza ...
20082008/10/27
5873C011Towards a Model for an
Immune System ...
towards a model for an
immune system ...
20022002/04/22
7A1109E4Approach Towards a
Natural Language Anal ...
approach towards a
natural language anal ...
20132013/11
0B00AFD8Towards the creation of
semantic models based on ...
towards the creation of
semantic models based on ...
20122012/10/27
5C66D743Comparison of Neural
Networks and Support ...
comparison of neural
networks and support ...
20092009/11/01
040121AEMultiple Kernel Support
Vector Machine Proble ...
multiple kernel support
vector machine proble ...
20142014/11/16
7DEADC9AA Set of Test Cases for
Performance Measures in ...
a set of test cases for
performance measures in ...
2008
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Paper Document Object
Identifier (DOI) ...
Original venue nameNormalized venue nameJournal ID mapped to
venue name ...
Conference ID mapped to
venue name ...
10.1007/978-3-319-13647-9
_19 ...
mexican international
conference on artificial ...
micai42D7146F
10.1007/978-3-642-37807-2
_11 ...
mexican international
conference on artificial ...
micai42D7146F
10.1109/MICAI.2009.15mexican international
conference on artificial ...
micai42D7146F
10.1007/978-3-540-88636-5
_25 ...
mexican international
conference on artificial ...
micai42D7146F
10.1007/3-540-46016-0_42mexican international
conference on artificial ...
micai42D7146F
mexican international
conference on artificial ...
micai42D7146F
10.1007/978-3-642-37807-2
_26 ...
mexican international
conference on artificial ...
micai42D7146F
10.1007/978-3-642-05258-3
_42 ...
mexican international
conference on artificial ...
micai42D7146F
10.1007/978-3-319-13650-9
_14 ...
mexican international
conference on artificial ...
micai42D7146F
mexican international
conference on artificial ...
micai42D7146F
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Paper rankRef NumberTotal Citations by YearTotal Citations by Year
without Self Citations ...
Authors List Sorted
1951721NoneNone[834A11E2, 7E8BA14F,
852B2668] ...
194447NoneNone[7DB8825E, 6936139F]
1887014{'2015': 10.0, '2014':
9.0, '2011': 3.0, '20 ...
{'2015': 8.0, '2014':
7.0, '2011': 1.0, '20 ...
[81867464, 8106CCE6,
7D20CE86, 7C6C6BB9] ...
194447NoneNone[807DCA23, 811B0352,
2779E3F4] ...
191778{'2003': 1.0, '2006':
3.0, '2007': 3.0, '20 ...
{'2003': 1.0, '2006':
2.0, '2007': 2.0, '20 ...
[7F553272, 7F830ACE,
7E7F1E07] ...
195550NoneNone[7EE331AF]
1947610NoneNone[80CF45DD, 814339C1,
7ED13F21, 7F45D6E4] ...
194289{'2015': 1.0, '2014':
1.0} ...
{'2015': 1.0, '2014':
1.0} ...
[7E2F72E3, 45F06265]
194689NoneNone[7677E6C4, 7EBBEA7F,
7F312412, 776D4ECC, ...
193949{'2015': 3.0, '2014':
2.0, '2013': 2.0, '20 ...
{'2015': 3.0, '2014':
2.0, '2013': 2.0, '20 ...
[7E792787, 7EC1BF2D,
7897839F] ...
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Keywords ListField of study listField of study list namesFields of study parent
list (L0) ...
NoneNoneNoneNone
NoneNoneNoneNone
[measures, software
measurement, autonomy, ...
[0A9CB5A9, 0556B228,
03E623B0, 0ABCEA76, ...
[Measure, Software
measurement, Autonomy, ...
[0271BC14, 0895A350,
0205A1DB, 07982D63] ...
[optimization problem,
probability density] ...
[083736DA, 0BBED543][None, Probability
density function] ...
[0205A1DB]
[process algebra, process
calculi, multi agent ...
[09A47029, 09A47029,
027A0232, 027A0232, ...
[Process calculus,
Process calculus, Multi- ...
[0271BC14]
[cognition, computational
linguistics, grammars, ...
[0A2079AC, 093E8748,
03365AB6, 044294F0, ...
[Cognition, Computational
linguistics, Rule-based ...
[0271BC14, 00F03FC7]
[computer aided design,
cad, ontologies] ...
[07245C42, 0B9C400C,
09F001E0] ...
[Computer Aided Design,
None, Ontology] ...
[0271BC14]
[dynamic system, dynamic
systems, neural network ...
[0AA68668, 0304C748,
0304C748, 0AA68668, ...
[Dynamical system,
Artificial neural ...
[0271BC14, 0B0FEB68]
NoneNoneNoneNone
[multiobjective
optimization] ...
[04198571][Multi-objective
optimization] ...
[]
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Fields of study parent
list names (L0) ...
Fields of study parent
list (L1) ...
Fields of study parent
list names (L1) ...
Fields of study parent
list (L2) ...
NoneNoneNoneNone
NoneNoneNoneNone
[Computer Science,
Sociology, Mathematics, ...
[0BE4BA29, 0765A2E4,
093C4716, 06E88D7C] ...
[Law, Data mining,
Artificial intelligence, ...
[00F36ADC, 05A3DFDE]
[Mathematics][064E5072][Statistics][007E3B49]
[Computer Science][0C19BFCD, 0BE20181,
093C4716] ...
[Immunology, Programming
language, Artificial ...
[027A0232]
[Computer Science,
Psychology] ...
[0BE20181, 0C2DB2A7][Programming language,
Natural language ...
[00E4DDF6, 0C199D1F,
093E8748, 044294F0] ...
[Computer Science][093C4716][Artificial intelligence][07245C42]
[Computer Science,
Chemistry] ...
[0724DFBA][Machine learning][0304C748, 097464D7]
NoneNoneNoneNone
[][07868074][Mathematical
optimization] ...
[02724C38]
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Fields of study parent
list names (L2) ...
Authors NumberUrlsFields of study parent
list (L3) ...
None3[http://link.springer.com
/content/pdf/10.1007% ...
None
None2[http://dl.acm.org/citati
on.cfm?id=2481834, ht ...
None
[Project management,
Politics] ...
4[http://ieeexplore.ieee.o
rg/xpl/abstractAuthor ...
[0059F32E, 0556B228,
03E623B0, 0ABCEA76, ...
[Stochastic process]3[http://dx.doi.org/10.100
7/978-3-540-88636-5_2 ...
[0BBED543]
[Multi-agent system]3[http://dl.acm.org/citati
on.cfm?id=691909, htt ...
[09A47029, 0087AC0D]
[Speech synthesis,
Machine translation, ...
1[http://ieeexplore.ieee.o
rg/lpdocs/epic03/wrap ...
[0322F49A, 0A2079AC,
03365AB6, 041AB807] ...
[Computer Aided Design]4[http://dl.acm.org/citati
on.cfm?id=2481852, ht ...
[09F001E0]
[Artificial neural
network, Nonlinear ...
2[http://adsabs.harvard.ed
u/abs/2009LNCS.5845.. ...
[00BB2E8D, 0AA68668,
2078A8D7] ...
None5[http://link.springer.com
/content/pdf/10.1007% ...
None
[Linear programming]3[http://dx.doi.org/10.100
7/978-3-540-88636-5_4 ...
[04198571]
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Fields of study parent
list names (L3) ...
None
None
[Software agent, Software
measurement, Autonomy, ...
[Probability density
function] ...
[Process calculus, Immune
system] ...
[Feature extraction,
Cognition, Rule-based ...
[Ontology]
[Support vector machine,
Dynamical system, Cop ...
None
[Multi-objective
optimization] ...
\n", "[126903970 rows x 28 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.\n", "
" ] }, "output_type": "execute_result", "metadata": {} } ], "source": [ "mag_sf = tc.load_sframe(EXTENDED_PAPERS_SFRAME)\n", "mag_sf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In our study, we also analyzed how various authors' attributes, such as the number of published papers, number of coauthors, etc., has changed over time. To achieve this, we created an authors features SFrame using the following code:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Author IDPapers by Years DictCoauthors by Years DictAffilation by Year Dict
00001F05{2010: ['5DA0F250'],
2013: ['7AF8ABFE']} ...
{2013: ['77CE16EC',
'17B20BAE']} ...
{2010: [''], 2013: ['']}
00002AD3{2009: ['7A0B348F'],
2010: ['795F56C6'], 2 ...
{2009: ['7FD1B86A',
'7921EA7D', '05390F01', ...
{2009: [''], 2010: [''],
2012: [''], 2006: [''], ...
00006A31{2009: ['7CFAEB15']}{2009: ['7E24B147',
'7C3ED158', '79A6FF42', ...
{2009: ['']}
0000B5FA{2008: ['7714EB4E'],
2009: ['78B7C257'], 2 ...
{2008: ['54648B9B'],
2009: ['7ADCCDB0', ...
{2008: [''], 2009: [''],
2010: ['', '', ''], 2 ...
0001CF9B{2013: ['7C9BBC3A']}{2013: ['852809D8',
'77FE1F64', '80B5223D', ...
{2013: ['']}
00040294{2009: ['7892886A',
'81263516', '81424AA7'], ...
{2009: ['75D367F6',
'82C1C4DE', '80824D21', ...
{2009: ['', '', ''],
2011: ['', ''], 2004: ...
00045553{1987: ['77F10A3D'],
1988: ['7836E5B8'], 1 ...
{1987: ['85CAEB12'],
1988: ['819D7046', ...
{1987: ['new york medical
college'], 1988: [''], ...
0004B8AF{2010: ['77AFDEB4']}{2010: ['77227437',
'77CB65A7', '5EBF97A1', ...
{2010: ['']}
000510E2{2011: ['80790612'],
2012: ['76E5D7F2', ...
{2011: ['82D84635',
'11F1B283'], 2012: ...
{2011: ['university of
queensland'], 2012: ['', ...
00063841{2014: ['7790AFD4']}{2014: ['853EBBF2',
'7901305E', '0F71473E', ...
{2014: ['']}
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sequence Number by Year
Dict ...
Author nameFirst nameLast nameConference ID by Year
Dict ...
{2010.0: array('d',
[1.0]), 2013.0: ...
nancy praillnancypraill{2010: [''], 2013: ['']}
{2009.0: array('d',
[1.0]), 2010.0: ...
david s rebergendavidrebergen{2009: [''], 2010: [''],
2012: [''], 2006: [''], ...
{2009.0: array('d',
[6.0])} ...
b zelazowskabzelazowska{2009: ['']}
{2008.0: array('d',
[1.0]), 2009.0: ...
lars goerigklarsgoerigk{2008: [''], 2009: [''],
2010: ['', '', ''], 2 ...
{2013.0: array('d',
[5.0])} ...
orlando lastres
danguillecourt ...
orlandodanguillecourt{2013: ['']}
{2009.0: array('d', [4.0,
7.0, 6.0]), 2011.0: ...
ivani bisordiivanibisordi{2009: ['', '', ''],
2011: ['', ''], 2004: ...
{1987.0: array('d',
[1.0]), 1988.0: ...
miguel a pappollamiguelpappolla{1987: [''], 1988: [''],
1989: [''], 1990: ['', ...
{2010.0: array('d',
[7.0])} ...
dong zaijiedongzaijie{2010: ['']}
{2011.0: array('d',
[1.0]), 2012.0: ...
fairlie mcilwraithfairliemcilwraith{2011: [''], 2012: ['',
'', ''], 2013: [''], ...
{2014.0: array('d',
[8.0])} ...
tiziano ponsettitizianoponsetti{2014: ['']}
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Journal ID by Year DictVenue by Year DictGender Dict
{2010: [''], 2013: ['']}{2010: [''], 2013: ['']}{'Gender': 'Female',
'Total Males': 2999, ...
{2009: ['0959867B'],
2010: ['0069C535'], 2 ...
{2009: ['Journal of
Occupational and ...
{'Gender': 'Male', 'Total
Males': 3700247, 'Total ...
{2009: ['036625C9']}{2009: ['Advances in
Medical Sciences']} ...
{'Gender': 'Unisex',
'Total Males': 536, ...
{2008: ['0A1986D0'],
2009: ['0ACEE946'], 2 ...
{2008: ['ChemPhysChem'],
2009: ['Physical ...
{'Gender': 'Male', 'Total
Males': 12459, 'Total ...
{2013: ['01F41F83']}{2013: ['International
Journal of Energy ...
{'Gender': 'Male', 'Total
Males': 47535, 'Total ...
{2009: ['08826C6E',
'0B483532', '05F694A1'], ...
{2009: ['Infection,
Genetics and Evolution', ...
{'Gender': 'Female',
'Total Males': 0, 'Total ...
{1987: ['096E1E70'],
1988: ['03C89659'], 1 ...
{1987: ['Synapse'], 1988:
['Human Pathology'], ...
{'Gender': 'Male', 'Total
Males': 173865, 'Total ...
{2010: ['0B0C2E2F']}{2010: ['Aquaculture
Research']} ...
{'Gender': 'Male', 'Total
Males': 317, 'Total ...
{2011: ['080AF648'],
2012: ['068E6FF5', '', ...
{2011: ['Drug and Alcohol
Review'], 2012: ['Drug ...
{'Gender': 'Unisex',
'Total Males': 7, 'Total ...
{2014: ['06FD8B4A']}{2014: ['Catalysis
Today']} ...
{'Gender': 'Male', 'Total
Males': 37, 'Total ...
\n", "[22443094 rows x 12 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.\n", "
" ] }, "output_type": "execute_result", "metadata": {} } ], "source": [ "from create_mag_authors_sframe import *\n", "a = AuthorsFeaturesExtractor()\n", "\n", "#This need to run on a strong server and can take considerable time to run\n", "a_sf = a.get_authors_all_features_sframe()\n", "a_sf #the SFrame can be later loaded using tc.load_sframe(AUTHROS_FEATURES_SFRAME)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above SFrame contains various features of each author that were constructed based on analyzing the author’s papers that have at least 5 references. If you notice, the author’s SFrame contains each author’s gender prediction. This column was created by obtaining first-name gender statistics from the[SSA Baby Names](http://www.ssa.gov/oact/babynames/names.zip]) and [WikiTree](https://www.wikitree.com/wiki/Help:Database_Dumps) datasets which include over 115 thousands unique first names (see details in geneder_classifier.py). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.2 The AMiner Dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After downloading the [AMiner website](https://aminer.org/open-academic-graph), simply load to an SFrame using the following code:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abstractauthorsdoiid
None[{'name': 'G. Adam'},
{'name': 'K. Schreibe ...
10.1002/ange.1965077020453e99784b7602d9701f3e130
None[{'name': 'R. Farahbod'},
{'name': 'V. Gervasi'}, ...
None53e99784b7602d9701f3e131
The method to making
technology roadmap is ...
[{'name': 'MO Chou'},
{'name': 'CHEN Jiqing'}, ...
None53e99784b7602d9701f3e132
Drought is the first
place in all the natural ...
[{'name': 'Peijuan
Wang'}, {'name': 'Jiahua ...
10.1109/IGARSS.2011.60495
03 ...
53e99784b7602d9701f3e133
Determination of total
sugar can serve to ...
[{'org': 'Yantai
Institute of Coastal ...
None53e99784b7602d9701f3e135
Resumen: Uno de los
problemas que debemos ...
[{'name': 'CELSO
VARGAS'}] ...
None53e99784b7602d9701f3e136
None[{'name': 'D J Lum'},
{'name': 'V Upadhyay'}, ...
10.1111/j.1365-2559.2007.
02817.x ...
53e99784b7602d9701f3e137
This paper discussed the
planning and design ...
[{'org': 'School of
Resource and ...
None53e99784b7602d9701f3e139
Rough set is a
mathematical tool to ...
NoneNone53e99784b7602d9701f3e13a
None[{'name': 'F
THOUVENYPAISANT'}, ...
10.1016/S0221-0363(05)762
74-0 ...
53e99784b7602d9701f3e13b
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
isbnissnissuekeywordslangn_citationpage_endpage_startpdf
NoneNone2NoneenNone9594None
NoneNoneNoneNoneenNoneNoneNoneNone
NoneNone19[science and technology
production, technology ...
zhNone9590None
NoneNonenull[canopy parameters,
canopy spectrum, ...
enNone19331930None
NoneNone07[metabolites, Jerusalem
artichoke, total sugar, ...
zh193+9790None
NoneNoneNoneNoneenNoneNoneNoneNone
NoneNone5NoneenNone707704None
NoneNone28[Planning and design
method, Mountainous ...
zh1364362None
NoneNone11[Data Mining, Rough Set,
Algorithm, Rules ...
zh3106104None
NoneNone10NoneenNone15551555None
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
referencestitleurlvenue
[53e9a6e6b7602d970301a47d
, ...
1.4-N→N′-Acylwanderun
g bei einem ...
[http://dx.doi.org/10.100
2/ange.19650770204] ...
Angewandte Chemie
[53e9a1d0b7602d9702ac8f1b
, ...
Design and Specification
of the CoreASM Execution ...
NoneNone
NonePractice Research on
Technology Roadmap for ...
NoneScience and Technology
Management Research ...
[53e999c3b7602d970220b9b7
, ...
The relationship between
canopy parameters and ...
[http://dx.doi.org/10.110
9/IGARSS.2011.6049503] ...
IGARSS
NoneThe effect of metabolites
on the determination of ...
NoneFood Science and
Technology ...
NoneEl Humanista y la
EnergĂ­a Nuclear ...
NoneNone
[53e9b395b7602d9703e78794
, ...
Botryoid fibroepithelial
polyp of the urinary ...
[http://dx.doi.org/10.111
1/j.1365-2559.2007.02 ...
Histopathology
NonePlanning and Design
Method of Land ...
NoneJournal of Anhui
Agricultural Sciences ...
NoneA Data Mining Based on
Rough Set Theory ...
NoneSoftware Guide
NoneRI1 Embolisation des
varices stomiales par ...
[http://dx.doi.org/10.101
6/S0221-0363(05)76274 ...
Journal De Radiologie
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
volumeyear
771965
NoneNone
None2013
null2011
None2012
None2013
512007
None2012
None2012
862005
\n", "[154771161 rows x 19 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.\n", "
" ] }, "output_type": "execute_result", "metadata": {} } ], "source": [ "aminer_sf = tc.SFrame.read_json('%s/*.txt' % AMINER_DATA_DIR, orient='lines')\n", "aminer_sf # the SFrame can be accessed also by using tc.load_sframe(AMINER_PAPERS_SFRAME)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.3 The SJR Dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we download all the journal ranking files from [the SJR website](http://www.scimagojr.com/journalrank.php).\n", "Next, we use the following code to create a single SFrame with all the journal data:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RankTitleTypeSJRSJR Best QuartileH indexTotal Docs.Total Docs. (3years)
1Astrophysical Journal
Letters ...
journal61.473Q18257
1Astrophysical Journal
Letters ...
journal61.473Q18257
2Annual Review of
Biochemistry ...
journal49.476Q12483081
2Annual Review of
Biochemistry ...
journal49.476Q12483081
3Celljournal41.978Q16163541359
3Celljournal41.978Q16163541359
4Annual Review of
Immunology ...
journal40.906Q12542981
4Annual Review of
Immunology ...
journal40.906Q12542981
5Annual Review of Cell and
Developmental Biology ...
book serie33.882Q11822561
5Annual Review of Cell and
Developmental Biology ...
book serie33.882Q11822561
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Total Refs.Total Cites (3years)Citable Docs. (3years)Cites / Doc. (2years)Ref. / Doc.Country
350493772.7570.0United Kingdom
350493772.7570.0United Kingdom
591334458035.38197.1United States
591334458035.38197.1United States
1587047390132834.3644.83United States
1587047390132834.3644.83United States
523640308146.69180.55United States
523640308146.69180.55United States
413417706026.55165.36United States
413417706026.55165.36United States
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearCategoriesISSN
199920418213
199920418205
199915454509
199900664154
199900928674
199910974172
199907320582
199915453278
199915308995
199910810706
\n", "[502524 rows x 17 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.\n", "
" ] }, "output_type": "execute_result", "metadata": {} } ], "source": [ "from create_sjr_sframe import *\n", "sjr_sf = create_sjr_sframe(SJR_FILES_DIR)\n", "sjr_sf # the SFrame can also be accessed using tc.load_sframe(SJR_SFRAME)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.4 Joint Datasets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The MAG and AMiner datasets have a slightly different set of features. While the MAG dataset contains data on each author with a unique author ID, the AMiner contains additional data on each paper, including the paper's abstract and the paper's ISSN or ISBN. Additionally, the SJR dataset contains data about each journal's ranking.\n", "\n", "To combine the data from the author publication record and the journals' rankings, we join the datasets. First, we joined the MAG and AMiner datasets by matching DOI values, using the following code (see also create_mag_aminer_sframe.py):" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MAG Paper IDOriginal paper titleNormalized paper titlePaper publish yearPaper publish date
7C15F682Ptychoptera deleta Novak,
1877 from the Early ...
ptychoptera deleta novak
1877 from the early ...
20112011
84A37D36Sherborn’s
foraminiferal studies ...
sherborn s foraminiferal
studies and their ...
20162016/07/01
773B216EA new species of
hydrobiid snails ...
a new species of
hydrobiid snails moll ...
20112011/10/19
77C44F83Revision of the
planthopper genus ...
revision of the
planthopper genus ...
20142014/10/12
75233F3EFemale genitalia of
Seasogonia Young from ...
female genitalia of
seasogonia young from ...
20122012/11/01
7B5321C5A taxonomic study on the
genus Ettchellsia ...
a taxonomic study on the
genus ettchellsia cam ...
20122012
3C77A5B8An Asiatic Chironomid in
Brazil: morphology, DNA ...
an asiatic chironomid in
brazil morphology dna ...
20152015/07/27
8051111AA new species of
Smicromorpha ...
a new species of
smicromorpha hymenoptera ...
20092009/09/14
79BD8F37Open exchange of
scientific knowledge and ...
open exchange of
scientific knowledge and ...
20142014/06/06
240B7EFFFour new species of
Epicephala Meyrick, 1880 ...
four new species of
epicephala meyrick 1880 ...
20152015/06/15
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Paper Document Object
Identifier (DOI) ...
Original venue nameNormalized venue nameJournal ID mapped to
venue name ...
Conference ID mapped to
venue name ...
10.3897/zookeys.130.1401ZooKeyszookeys0BDFC074
10.3897/zookeys.550.9863ZooKeyszookeys0BDFC074
10.3897/zookeys.138.1927ZooKeyszookeys0BDFC074
10.3897/zookeys.462.6657ZooKeyszookeys0BDFC074
10.3897/zookeys.164.2132ZooKeyszookeys0BDFC074
10.3897/zookeys.254.4182ZooKeyszookeys0BDFC074
10.3897/zookeys.514.9925ZooKeyszookeys0BDFC074
10.3897/zookeys.20.195ZooKeyszookeys0BDFC074
10.3897/zookeys.414.7717ZooKeyszookeys0BDFC074
10.3897/zookeys.508.9479ZooKeyszookeys0BDFC074
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Paper rankRef. NumberTotal Citations by YearTotal Citations by Year
without Self Citations ...
Authors List Sorted
193827{'2015': 1.0}{'2015': 1.0}[855C02FD, 7B2C4199]
19555NoneNoneNone[84CB5028]
1940212{'2015': 3.0, '2014':
3.0, '2011': 1.0, '20 ...
{'2015': 1.0, '2014':
1.0, '2011': 1.0, '20 ...
[8439D30B]
19555NoneNoneNone[84C55AD3, 7E095DC6]
194276NoneNone[805137B8, 80CA5307]
193704NoneNone[80F9A983, 7FFB2555]
19555NoneNoneNone[6118B891, 7FE3B9C3,
79CC73E7, 7C17044D] ...
191573{'2015': 4.0, '2014':
3.0, '2013': 2.0, '20 ...
{'2015': 4.0, '2014':
3.0, '2013': 2.0, '20 ...
[85B81F06, 7E5ABA3D]
193215{'2015': 3.0, '2014':
1.0} ...
{'2015': 2.0}[7237B1F9, 7DAD3B1C,
78F96D88, 78A6ED0B] ...
19555NoneNoneNone[7CFFBD65, 862455E0,
7D640C0F] ...
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Keywords ListField of study listField of study list namesFields of study parent
list (L0) ...
[tertiary, biomedical
research, neogene, ...
[009377C6, 0660586C,
01A380F9, 039D5C06] ...
[Tertiary, None, Neogene,
Bioinformatics] ...
[]
NoneNoneNoneNone
[biomedical research,
bioinformatics] ...
[0660586C, 039D5C06][None, Bioinformatics][]
NoneNoneNoneNone
[morphology, taxonomy][06A2C3F5, 037ECF39][Morphology, Taxonomy][052C8328]
[taxonomy,
bioinformatics, ...
[039D5C06, 037ECF39,
0660586C] ...
[Bioinformatics,
Taxonomy, None] ...
[052C8328]
NoneNoneNoneNone
[morphology, taxonomy][06A2C3F5, 037ECF39][Morphology, Taxonomy][052C8328]
[taxonomy, intellectual
property rights, ...
[037ECF39, 0215A9CE,
039D5C06, 0660586C] ...
[Taxonomy, Intellectual
property, Bioinformat ...
[0895A350, 052C8328]
NoneNoneNoneNone
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Fields of study parent
list names (L0) ...
Fields of study parent
list (L1) ...
Fields of study parent
list names (L1) ...
Fields of study parent
list (L2) ...
[][090B39EA, 039D5C06][Paleontology,
Bioinformatics] ...
[0683829C]
NoneNoneNoneNone
[][039D5C06][Bioinformatics][]
NoneNoneNoneNone
[Biology][027F4522][Linguistics][037ECF39, 06A2C3F5]
[Biology][039D5C06][Bioinformatics][037ECF39]
NoneNoneNoneNone
[Biology][027F4522][Linguistics][037ECF39, 06A2C3F5]
[Sociology, Biology][0BE4BA29, 039D5C06][Law, Bioinformatics][037ECF39]
NoneNoneNoneNone
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Fields of study parent
list names (L2) ...
Authors NumberUrlsabstract
[Stratigraphy]2[/pmc/articles/instance/3
260767/?report=abstra ...
The first fossil that was
described in ...
None1[http://zookeys.pensoft.n
et/lib/ajax_srv/artic ...
None
[]1[http://bionames.org/refe
rences/b38c8055b3453e ...
Anew minute valvatiform
species belonging to the ...
None2[http://www.researchgate.
net/publication/27090 ...
Chinese species in the
genus Nycheuma Fennah, ...
[Taxonomy, Morphology]2[/pmc/articles/instance/3
272620/?report=abstra ...
Seasogonia Young, 1986 is
a sharpshooter genus ...
[Taxonomy]2[http://bionames.org/refe
rences/4078cc33be92d7 ...
Three new species of
Ettchellsia Cameron, ...
None4[http://www.researchgate.
net/publication/28071 ...
None
[Taxonomy, Morphology]2[http://www.cabdirect.org
/abstracts/2009331573 ...
None
[Taxonomy]4[http://advocomplex.ch/fi
les/Open%20exchange%2 ...
Background. The 7(th)
Framework Programme for ...
None3[http://www.ncbi.nlm.nih.
gov/pubmed/26167120, ...
None
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
authorsAminer Paper IDisbnissnissuekeywordslang
[{'org': 'Institute of
Biology, Pedagogical ...
55a4881d65ce31bc877df20dNone1313-2970130[cheb basin, cypris
formation, czech ...
en
[{'name': 'giles
miller'}] ...
56d81fffdabfae2eeeb5906dNoneNoneNoneNoneen
[{'org': 'Department of
Ecology and Systematics, ...
55a47a6865ce31bc877c5f09None1313-2970138[caenogastropoda,
daphniola eptalophos sp. ...
en
[{'org': 'The Special Key
Laboratory for ...
55a6b03265ce054aad712ec1None1313-2989462[delphacini, fulgoroidea,
hemiptera, nycheuma, new ...
en
[{'org': 'Institute of
Entomology, Guizhou ...
55a4953165ceb7cb02d2d096None1313-2970164[auchenorrhyncha,
cicadellinae, ...
en
[{'org': 'Laboratory of
Entomology, Faculty of ...
55a51ace65ceb7cb02e11911None1313-2989254[south east asia,
taxonomy, parasitic ...
en
[{'name': 'gizelle
amora'}, {'name': 'neusa ...
56d81fffdabfae2eeeb59087NoneNoneNoneNoneen
[{'name': 'd c darling'},
{'name': 'norman f ...
53e9ac5cb7602d970362f86dNoneNone0[morphology, taxonomy]en
[{'org': 'Plazi,
Zinggstrasse 16, 3007 ...
55a676ab65ce054aad68df46None1313-2989414[biodiversity knowledge,
european copyright, ...
en
[{'name': 'houhun li'},
{'name': 'zhibo wang'}, ...
56d82000dabfae2eeeb59093NoneNoneNoneNoneen
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
n_citationpage_endpage_startpdfreferencestitle...
3305299None[53e99c60b7602d9702511119
, ...
Ptychoptera deleta
Novák, 1877 from the ...
...
NoneNoneNoneNoneNoneSherborn’s
foraminiferal studies ...
...
None6453None[56d8e4efdabfae2eee2affe9
, ...
A new species of
hydrobiid snails ...
...
None5747NoneNoneRevision of the
planthopper genus ...
...
None4024None[56d92143dabfae2eee9f1671
, ...
Female genitalia of
Seasogonia Young from ...
...
None10899None[53e99ddab7602d970269b655
, ...
A taxonomic study on the
genus Ettchellsia ...
...
NoneNoneNoneNoneNoneAn Asiatic Chironomid in
Brazil: morphology, DNA ...
...
NoneNoneNoneNone[56d85c7cdabfae2eee5a5fe6
, ...
A new species of
Smicromorpha ...
...
9135109None[55a4825f65ce31bc877d426d
, ...
Open exchange of
scientific knowledge and ...
...
NoneNoneNoneNoneNoneFour new species of
Epicephala Meyrick, 1880 ...
...
\n", "[28945815 rows x 44 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.\n", "
" ] }, "output_type": "execute_result", "metadata": {} } ], "source": [ "sf = tc.load_sframe(EXTENDED_PAPERS_SFRAME)\n", "g1 = sf.groupby('Paper Document Object Identifier (DOI)', {'Count': agg.COUNT()})\n", "s1 = set(g1[g1['Count'] > 1]['Paper Document Object Identifier (DOI)'])\n", "sf = sf[sf['Paper Document Object Identifier (DOI)'].apply(lambda doi: doi not in s1 )]\n", "sf.materialize()\n", "\n", "sf2 = tc.load_sframe(AMINER_PAPERS_SFRAME)\n", "g2 = sf2.groupby('doi', {'Count': agg.COUNT()})\n", "s2 = set(g2[g2['Count'] > 1]['doi'])\n", "sf2 = sf2[sf2['doi'].apply(lambda doi: doi not in s2 )]\n", "sf2.materialize()\n", "\n", "aminer_mag_sf = sf.join(sf2, {'Paper Document Object Identifier (DOI)': 'doi'})\n", "aminer_mag_sf['title_len'] = aminer_mag_sf['title'].apply(lambda t: len(t))\n", "aminer_mag_sf = aminer_mag_sf[aminer_mag_sf['title_len'] > 0]\n", "aminer_mag_sf = aminer_mag_sf.rename({\"Paper ID\": \"MAG Paper ID\", \"id\": \"Aminer Paper ID\"})\n", "aminer_mag_sf.remove_column('title_len')\n", "aminer_mag_sf # this SFrame can be accessed using tc.load_Sframe(AMINER_MAG_JOIN_SFRAME)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the joined dataset, we obtained an SFrame with the joint meta data of 28.9 million papers. We can take this SFrame and join it with the SJR dataset." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MAG Paper IDOriginal paper titleNormalized paper titlePaper publish yearPaper publish date
7C15F682Ptychoptera deleta Novak,
1877 from the Early ...
ptychoptera deleta novak
1877 from the early ...
20112011
773B216EA new species of
hydrobiid snails ...
a new species of
hydrobiid snails moll ...
20112011/10/19
77C44F83Revision of the
planthopper genus ...
revision of the
planthopper genus ...
20142014/10/12
75233F3EFemale genitalia of
Seasogonia Young from ...
female genitalia of
seasogonia young from ...
20122012/11/01
7B5321C5A taxonomic study on the
genus Ettchellsia ...
a taxonomic study on the
genus ettchellsia cam ...
20122012
79BD8F37Open exchange of
scientific knowledge and ...
open exchange of
scientific knowledge and ...
20142014/06/06
778DE072First report on
C-banding, fluorochrome ...
first report on c banding
fluorochrome staining ...
20132013/07/30
76034B62Checklist of the families
Scathophagidae, Fanni ...
checklist of the families
scathophagidae fanniidae ...
20142014/09/19
781099DEChecklist of the family
Culicidae (Diptera) in ...
checklist of the family
culicidae diptera in ...
20142014
770A539FTwo new species of
harvestmen (Opiliones, ...
two new species of
harvestmen opiliones ...
20142014/08/14
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Paper Document Object
Identifier (DOI) ...
Original venue nameNormalized venue nameJournal ID mapped to
venue name ...
Conference ID mapped to
venue name ...
10.3897/zookeys.130.1401ZooKeyszookeys0BDFC074
10.3897/zookeys.138.1927ZooKeyszookeys0BDFC074
10.3897/zookeys.462.6657ZooKeyszookeys0BDFC074
10.3897/zookeys.164.2132ZooKeyszookeys0BDFC074
10.3897/zookeys.254.4182ZooKeyszookeys0BDFC074
10.3897/zookeys.414.7717ZooKeyszookeys0BDFC074
10.3897/zookeys.319.4265ZooKeyszookeys0BDFC074
10.3897/zookeys.441.7142ZooKeyszookeys0BDFC074
10.3897/zookeys.441.7743ZooKeyszookeys0BDFC074
10.3897/zookeys.434.7486ZooKeyszookeys0BDFC074
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Paper rankRef. NumberTotal Citations by YearTotal Citations by Year
without Self Citations ...
Authors List Sorted
193827{'2015': 1.0}{'2015': 1.0}[855C02FD, 7B2C4199]
1940212{'2015': 3.0, '2014':
3.0, '2011': 1.0, '20 ...
{'2015': 1.0, '2014':
1.0, '2011': 1.0, '20 ...
[8439D30B]
19555NoneNoneNone[84C55AD3, 7E095DC6]
194276NoneNone[805137B8, 80CA5307]
193704NoneNone[80F9A983, 7FFB2555]
193215{'2015': 3.0, '2014':
1.0} ...
{'2015': 2.0}[7237B1F9, 7DAD3B1C,
78F96D88, 78A6ED0B] ...
1948515{'2015': 1.0, '2014':
1.0} ...
{'2015': 1.0, '2014':
1.0} ...
[7FDB7566, 80418DAE]
194045NoneNone[78FFCF1E, 7CD385BA]
194246{'2015': 1.0}{'2015': 1.0}[7D2454E0]
194045NoneNone[850AEC5F, 8306860B]
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Keywords ListField of study listField of study list namesFields of study parent
list (L0) ...
[tertiary, biomedical
research, neogene, ...
[009377C6, 0660586C,
01A380F9, 039D5C06] ...
[Tertiary, None, Neogene,
Bioinformatics] ...
[]
[biomedical research,
bioinformatics] ...
[0660586C, 039D5C06][None, Bioinformatics][]
NoneNoneNoneNone
[morphology, taxonomy][06A2C3F5, 037ECF39][Morphology, Taxonomy][052C8328]
[taxonomy,
bioinformatics, ...
[039D5C06, 037ECF39,
0660586C] ...
[Bioinformatics,
Taxonomy, None] ...
[052C8328]
[taxonomy, intellectual
property rights, ...
[037ECF39, 0215A9CE,
039D5C06, 0660586C] ...
[Taxonomy, Intellectual
property, Bioinformat ...
[0895A350, 052C8328]
[biomedical research,
bioinformatics] ...
[0660586C, 039D5C06][None, Bioinformatics][]
[bioinformatics,
biomedical research] ...
[039D5C06, 0660586C][Bioinformatics, None][]
[bioinformatics,
biomedical research] ...
[039D5C06, 0660586C][Bioinformatics, None][]
[taxonomy][037ECF39][Taxonomy][052C8328]
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Fields of study parent
list names (L0) ...
Fields of study parent
list (L1) ...
Fields of study parent
list names (L1) ...
Fields of study parent
list (L2) ...
[][090B39EA, 039D5C06][Paleontology,
Bioinformatics] ...
[0683829C]
[][039D5C06][Bioinformatics][]
NoneNoneNoneNone
[Biology][027F4522][Linguistics][037ECF39, 06A2C3F5]
[Biology][039D5C06][Bioinformatics][037ECF39]
[Sociology, Biology][0BE4BA29, 039D5C06][Law, Bioinformatics][037ECF39]
[][039D5C06][Bioinformatics][]
[][039D5C06][Bioinformatics][]
[][039D5C06][Bioinformatics][]
[Biology][][][037ECF39]
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Fields of study parent
list names (L2) ...
Authors NumberUrlsabstract
[Stratigraphy]2[/pmc/articles/instance/3
260767/?report=abstra ...
The first fossil that was
described in ...
[]1[http://bionames.org/refe
rences/b38c8055b3453e ...
Anew minute valvatiform
species belonging to the ...
None2[http://www.researchgate.
net/publication/27090 ...
Chinese species in the
genus Nycheuma Fennah, ...
[Taxonomy, Morphology]2[/pmc/articles/instance/3
272620/?report=abstra ...
Seasogonia Young, 1986 is
a sharpshooter genus ...
[Taxonomy]2[http://bionames.org/refe
rences/4078cc33be92d7 ...
Three new species of
Ettchellsia Cameron, ...
[Taxonomy]4[http://advocomplex.ch/fi
les/Open%20exchange%2 ...
Background. The 7(th)
Framework Programme for ...
[]2[/pmc/articles/PMC3764527
/?report=abstract, ht ...
In spite of various
cytogenetic works on ...
[]2[http://europepmc.org/abs
tract/MED/25337032, h ...
A revised checklist of
the Scathophagidae, ...
[]1[http://europepmc.org/art
icles/PMC4200447?pdf= ...
A checklist of the
Culicidae (Diptera) ...
[Taxonomy]2[http://espace.library.cu
rtin.edu.au/R?func=dbin- ...
Neopilionidae:
Enantiobuninae) are ...
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
authorsAminer Paper IDisbnissnissuekeywordslang
[{'org': 'Institute of
Biology, Pedagogical ...
55a4881d65ce31bc877df20dNone1313-2970130[cheb basin, cypris
formation, czech ...
en
[{'org': 'Department of
Ecology and Systematics, ...
55a47a6865ce31bc877c5f09None1313-2970138[caenogastropoda,
daphniola eptalophos sp. ...
en
[{'org': 'The Special Key
Laboratory for ...
55a6b03265ce054aad712ec1None1313-2989462[delphacini, fulgoroidea,
hemiptera, nycheuma, new ...
en
[{'org': 'Institute of
Entomology, Guizhou ...
55a4953165ceb7cb02d2d096None1313-2970164[auchenorrhyncha,
cicadellinae, ...
en
[{'org': 'Laboratory of
Entomology, Faculty of ...
55a51ace65ceb7cb02e11911None1313-2989254[south east asia,
taxonomy, parasitic ...
en
[{'org': 'Plazi,
Zinggstrasse 16, 3007 ...
55a676ab65ce054aad68df46None1313-2989414[biodiversity knowledge,
european copyright, ...
en
[{'org': 'Department of
Entomology, Y.S. Parmar ...
55a5ce0d65ce60f99bf5c02dNone1313-2989319[c-banding, cma3, dapi,
nor location, ...
en
[{'org': 'Finnish Museum
of Natural History, ...
55a69aa665ce054aad6d871fNone1313-2989441[diptera, finland,
species list, ...
en
[{'org': 'Finnish Museum
of Natural History, ...
55a69aa665ce054aad6d8706None1313-2989441[checklist, culicidae,
diptera, finland, ...
en
[{'org': 'Dept of
Environment and ...
55a688db65ce054aad6ae23aNone1313-2989434[taxonomy, arachnids,
cave biota] ...
en
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
n_citationpage_endpage_startpdfreferencestitle...
3305299None[53e99c60b7602d9702511119
, ...
Ptychoptera deleta
Novák, 1877 from the ...
...
None6453None[56d8e4efdabfae2eee2affe9
, ...
A new species of
hydrobiid snails ...
...
None5747NoneNoneRevision of the
planthopper genus ...
...
None4024None[56d92143dabfae2eee9f1671
, ...
Female genitalia of
Seasogonia Young from ...
...
None10899None[53e99ddab7602d970269b655
, ...
A taxonomic study on the
genus Ettchellsia ...
...
9135109None[55a4825f65ce31bc877d426d
, ...
Open exchange of
scientific knowledge and ...
...
None291283None[56d8b9c0dabfae2eee2562bf
, ...
First report on
C-banding, fluorochrome ...
...
None367347None[56d90738dabfae2eeefe3d12
, ...
Checklist of the families
Scathophagidae, Fanni ...
...
None5147None[56d8d4a0dabfae2eeec120ac
, ...
Checklist of the family
Culicidae (Diptera) in ...
...
None4537None[56d88f6bdabfae2eeedadea4
, ...
Two new species of
harvestmen (Opiliones, ...
...
\n", "[4498015 rows x 61 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.\n", "
" ] }, "output_type": "execute_result", "metadata": {} } ], "source": [ "import re\n", "def create_aminer_mag_sjr_sframe(year):\n", " \"\"\"\n", " Creates a unified SFrame of AMiner, MAG, and the SJR datasets\n", " :param year: year to use for SJR data\n", " :return: SFrame with AMiner, MAG, and SJR data\n", " :rtype: tc.SFrame\n", " \"\"\"\n", " sf = tc.load_sframe(AMINER_MAG_JOIN_SFRAME)\n", " sf = sf[sf['issn'] != None]\n", " sf = sf[sf['issn'] != 'null']\n", " sf.materialize()\n", " r = re.compile(\"(\\d+)-(\\d+)\")\n", " sf['issn_str'] = sf['issn'].apply(lambda i: \"\".join(r.findall(i)[0]) if len(r.findall(i))> 0 else None)\n", " sf = sf[sf['issn_str'] != None]\n", " sjr_sf = tc.load_sframe(SJR_SFRAME)\n", " sjr_sf = sjr_sf[sjr_sf['Year'] == year]\n", " return sf.join(sjr_sf, on={'issn_str': \"ISSN\"})\n", "create_aminer_mag_sjr_sframe(2015)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2. Loading the Dataset to MongoDB" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using Turicreate and SFrame objects can help us get general data on how academic publication dynamics have changed over time, but it would be challenging to use this data to create more complicated insights, such as the trends of a specific journal. To reveal more complicated insights using the data, we would need to load the dataset to a different framework. In this study, we chose to use MongoDB as our framework for more complicated queries.\n", "We installed MongoDB on Ubuntu 17.10 using the instructions in the following [link](https://medium.com/gatemill/how-to-install-mongodb-3-6-on-ubuntu-17-10-ac0bc225e648). After MongoDB is installed and running, please remember to set the user and password, and update MONGO_HOST & MONGO_PORT vars in consts.py (one can also adjust the connection to include user password auth).\n", "Now, the next step is to load the above created SFrames to collections in MongoDB using mongo_connecter.py:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "from mongo_connector import *\n", "load_sframes() #this will load the SFrame to a local" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the end of the loading process, six collections will be loaded to the journal database." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[u'authoros_features',\n", " u'sjr_journals',\n", " u'aminer_mag_papers',\n", " u'fields_of_study_papers',\n", " u'papers_features',\n", " u'authors_features']" ] }, "execution_count": 10, "output_type": "execute_result", "metadata": {} } ], "source": [ "MD.client.journals.collection_names()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the second part of the tutorial, we will demonstrate how the above created MongoDB collections can be utilized to calculate various statistics on paper collections, authors, journals, and research domains." ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2.0 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.14" } }, "nbformat": 4, "nbformat_minor": 2 }