{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Part III - D: Analyzing Changing Trends in Academia - Research Fields" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Research Field Dynamics\n", "The MAG dataset links keywords and their corresponding field of study. Moreover, the dataset provides us with hierarchical data that links these research fields with their parent research fields in up to 4 levels (L0-L3). In this notebook, we will use the research field data to better understand how various fields of study change over time. First, as in previous notebooks, let's load the required Python packages." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/michael/anaconda2/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.\n", " from ._conv import register_converters as _register_converters\n", "/home/michael/anaconda2/lib/python2.7/site-packages/entrypoints.py:171: DeprecationWarning: You passed a bytestring as `filenames`. This will not work on Python 3. Use `cp.read_file()` or switch to using Unicode strings across the board.\n", " cp.read(path)\n", "/home/michael/anaconda2/lib/python2.7/importlib/__init__.py:37: DeprecationWarning: The vega3 module is deprecated. Use vega instead.\n", " __import__(name)\n" ] } ], "source": [ "from configs import *\n", "import pandas as pd\n", "import numpy as np\n", "import altair as alt\n", "alt.renderers.enable('notebook')\n", "from visualization.visual_utils import *\n", "import turicreate.aggregate as agg\n", "FIELD_OF_STUDY_HIERARCHY = \"%s/FieldOfStudyHierarchy.sframe\" % SFRAMES_BASE_DIR\n", "\n", "def normalize_features_dict(feature_dict, start_year):\n", " d = {}\n", " feature_dict = {(y - start_year):v for y,v in feature_dict.iteritems()}\n", " return feature_dict\n", "\n", "def get_values_sum_by_year_dict(d, max_keys):\n", " d2 = {}\n", " for i in range(max_keys):\n", " d2[i] = sum([v for k,v in d.iteritems() if k <= i])\n", " return d2 \n", "\n", "fields_sf = tc.load_sframe(EXTENDED_PAPERS_SFRAME)[\"Paper ID\", \"Paper publish year\",\"Fields of study parent list names (L0)\", \n", " \"Fields of study parent list (L1)\", \"Fields of study parent list names (L1)\",\n", " \"Fields of study parent list (L2)\", \"Fields of study parent list names (L2)\",\n", " \"Ref Number\", \"Authors Number\", \"Total Citations by Year\"]\n", "fields_sf = fields_sf.rename({\"Paper publish year\": \"Year\"})\n", "fields_sf = filter_sframe_by_years(fields_sf, 1850, 2010)\n", "fields_sf = fields_sf[fields_sf[\"Fields of study parent list names (L0)\"] != None]\n", "fields_sf = fields_sf[fields_sf[\"Fields of study parent list names (L0)\"].apply(lambda l: len(l) <= 10)] # remove papers that belong to more than 10 fields\n", "fields_sf = fields_sf[fields_sf[\"Ref Number\"] >= 5]\n", "fields_sf.materialize()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create SFrame with the information on all L0 fields of study." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Paper ID | \n", "Year | \n", "Authors Number | \n", "Ref Number | \n", "Total Citations by Year | \n", "L0 Field | \n", "
---|---|---|---|---|---|
7CFE299E | \n", "2009 | \n", "4 | \n", "14 | \n", "{'2015': 10.0, '2014': 9.0, '2011': 3.0, '20 ... | \n",
" Computer Science | \n", "
7CFE299E | \n", "2009 | \n", "4 | \n", "14 | \n", "{'2015': 10.0, '2014': 9.0, '2011': 3.0, '20 ... | \n",
" Sociology | \n", "
7CFE299E | \n", "2009 | \n", "4 | \n", "14 | \n", "{'2015': 10.0, '2014': 9.0, '2011': 3.0, '20 ... | \n",
" Mathematics | \n", "
7CFE299E | \n", "2009 | \n", "4 | \n", "14 | \n", "{'2015': 10.0, '2014': 9.0, '2011': 3.0, '20 ... | \n",
" Engineering | \n", "
59BEBE1C | \n", "2008 | \n", "3 | \n", "7 | \n", "None | \n", "Mathematics | \n", "
5873C011 | \n", "2002 | \n", "3 | \n", "8 | \n", "{'2003': 1.0, '2006': 3.0, '2007': 3.0, '20 ... | \n",
" Computer Science | \n", "
5C66D743 | \n", "2009 | \n", "2 | \n", "9 | \n", "{'2015': 1.0, '2014': 1.0} ... | \n",
" Computer Science | \n", "
5C66D743 | \n", "2009 | \n", "2 | \n", "9 | \n", "{'2015': 1.0, '2014': 1.0} ... | \n",
" Chemistry | \n", "
584D8787 | \n", "2009 | \n", "2 | \n", "9 | \n", "None | \n", "Computer Science | \n", "
584D8787 | \n", "2009 | \n", "2 | \n", "9 | \n", "None | \n", "Sociology | \n", "
Paper ID | \n", "Year | \n", "Fields of study parent list names (L0) ... | \n",
" Authors Number | \n", "Ref Number | \n", "Total Citations by Year | \n", "L1 Field ID | \n", "
---|---|---|---|---|---|---|
5873C011 | \n", "2002 | \n", "[Computer Science] | \n", "3 | \n", "8 | \n", "{'2003': 1.0, '2006': 3.0, '2007': 3.0, '20 ... | \n",
" 0C19BFCD | \n", "
59803B73 | \n", "2006 | \n", "[Computer Science, Chemistry, Biology, ... | \n",
" 4 | \n", "6 | \n", "None | \n", "033D6521 | \n", "
5ED460A9 | \n", "2009 | \n", "[Computer Science, Mathematics, Biology] ... | \n",
" 1 | \n", "24 | \n", "None | \n", "033D6521 | \n", "
7D75F10B | \n", "2008 | \n", "[Computer Science, Biology, Psychology] ... | \n",
" 2 | \n", "10 | \n", "None | \n", "0B170D53 | \n", "
8050742A | \n", "2008 | \n", "[Geology, Mathematics, Chemistry, Computer ... | \n",
" 2 | \n", "20 | \n", "None | \n", "033D6521 | \n", "
7F0016FA | \n", "2008 | \n", "[Computer Science, Medicine, Engineering, ... | \n",
" 3 | \n", "5 | \n", "{'2015': 10.0, '2014': 8.0, '2009': 1.0, '20 ... | \n",
" 033D6521 | \n", "
66CAB1C0 | \n", "2002 | \n", "[Computer Science, Biology] ... | \n",
" 5 | \n", "12 | \n", "None | \n", "01207101 | \n", "
5F656C25 | \n", "2009 | \n", "[Computer Science, Mathematics, Biology] ... | \n",
" 3 | \n", "14 | \n", "None | \n", "00640F05 | \n", "
5F656C25 | \n", "2009 | \n", "[Computer Science, Mathematics, Biology] ... | \n",
" 3 | \n", "14 | \n", "None | \n", "033D6521 | \n", "
7D34E03C | \n", "2007 | \n", "[Computer Science, Physics, Mathematics, ... | \n",
" 3 | \n", "5 | \n", "{'2015': 1.0, '2014': 1.0, '2013': 1.0, '20 ... | \n",
" 0390D066 | \n", "
L1 Field | \n", "
---|
Immunology | \n", "
Genetics | \n", "
Genetics | \n", "
Biological system | \n", "
Genetics | \n", "
Genetics | \n", "
Ecology | \n", "
Agronomy | \n", "
Genetics | \n", "
Botany | \n", "
Paper ID | \n", "Year | \n", "Authors Number | \n", "Ref Number | \n", "Total Citations by Year | \n", "L2 Field ID | \n", "L2 Field | \n", "
---|---|---|---|---|---|---|
5ED460A9 | \n", "2009 | \n", "1 | \n", "24 | \n", "None | \n", "027301DC | \n", "Epigenetics | \n", "
8050742A | \n", "2008 | \n", "2 | \n", "20 | \n", "None | \n", "0B470EAF | \n", "Genomics | \n", "
5C698B6E | \n", "2005 | \n", "2 | \n", "8 | \n", "None | \n", "027301DC | \n", "Epigenetics | \n", "
8178E2AA | \n", "2006 | \n", "5 | \n", "7 | \n", "None | \n", "093D58AF | \n", "Plant breeding | \n", "
7C55455E | \n", "2006 | \n", "6 | \n", "6 | \n", "{'2007': 1.0, '2015': 7.0, '2014': 7.0, '20 ... | \n",
" 0AE98092 | \n", "Developmental biology | \n", "
0856E642 | \n", "2007 | \n", "2 | \n", "5 | \n", "None | \n", "0B470EAF | \n", "Genomics | \n", "
6FE194EC | \n", "2010 | \n", "2 | \n", "10 | \n", "None | \n", "0B470EAF | \n", "Genomics | \n", "
7FBD2C55 | \n", "2008 | \n", "3 | \n", "19 | \n", "None | \n", "027301DC | \n", "Epigenetics | \n", "
81112497 | \n", "2008 | \n", "3 | \n", "9 | \n", "None | \n", "0B470EAF | \n", "Genomics | \n", "
5AE8B595 | \n", "2006 | \n", "3 | \n", "17 | \n", "None | \n", "027301DC | \n", "Epigenetics | \n", "
Parent Field of Study | \n", "Field of study name | \n", "Median Citations After 5-years ... | \n",
" MAX Citations After 5-years ... | \n",
" Papers Number | \n", "Average Author Number | \n", "
---|---|---|---|---|---|
Art | \n", "Publication | \n", "4.0 | \n", "100 | \n", "164 | \n", "3.38 | \n", "
Art | \n", "Etching | \n", "4.0 | \n", "124 | \n", "285 | \n", "5.35 | \n", "
Art | \n", "Video | \n", "4.5 | \n", "72 | \n", "170 | \n", "3.48 | \n", "
Art | \n", "Music | \n", "5.0 | \n", "204 | \n", "479 | \n", "2.83 | \n", "
Art | \n", "Clothing | \n", "5.0 | \n", "72 | \n", "123 | \n", "3.86 | \n", "
Art | \n", "Film | \n", "5.0 | \n", "252 | \n", "857 | \n", "4.11 | \n", "
Art | \n", "Physical model | \n", "6.0 | \n", "147 | \n", "452 | \n", "3.77 | \n", "
Art | \n", "Conceptual design | \n", "6.0 | \n", "51 | \n", "184 | \n", "4.39 | \n", "
Art | \n", "Photography | \n", "8.0 | \n", "158 | \n", "201 | \n", "3.7 | \n", "
Art | \n", "Performance | \n", "9.0 | \n", "473 | \n", "1467 | \n", "4.08 | \n", "
Parent Field of Study | \n", "Field of study name | \n", "Median Citations After 5-years ... | \n",
" MAX Citations After 5-years ... | \n",
" Papers Number | \n", "Average Author Number | \n", "
---|---|---|---|---|---|
Mathematics | \n", "Finite impulse response | \n", "2.0 | \n", "167 | \n", "337 | \n", "3.0 | \n", "
Computer Science | \n", "Pixel | \n", "2.0 | \n", "380 | \n", "2484 | \n", "3.27 | \n", "
Computer Science | \n", "Ontology | \n", "2.0 | \n", "616 | \n", "733 | \n", "3.35 | \n", "
Computer Science | \n", "Mesh networking | \n", "2.0 | \n", "62 | \n", "274 | \n", "3.43 | \n", "
Computer Science | \n", "Camera resectioning | \n", "2.0 | \n", "43 | \n", "114 | \n", "3.13 | \n", "
Computer Science | \n", "Session Initiation Protocol ... | \n",
" 2.0 | \n", "116 | \n", "100 | \n", "3.6 | \n", "
Chemistry | \n", "Gallium | \n", "2.0 | \n", "73 | \n", "484 | \n", "3.43 | \n", "
Mathematics | \n", "Presentation of a group | \n", "2.0 | \n", "91 | \n", "706 | \n", "3.22 | \n", "
Mathematics | \n", "Spiral | \n", "2.0 | \n", "80 | \n", "122 | \n", "3.65 | \n", "
Mathematics | \n", "Block code | \n", "2.0 | \n", "54 | \n", "281 | \n", "2.83 | \n", "