Part III - B: Analyzing Changing Trends in Academia - Author Trends

2. Author Dynamics

In this notebook, we are going to explore how authors' behaviors and characteristics have changed over the last two centuries. First, let's load all the required packages and add some helpful draw functions.In this notebook, we are going to explore how authors' behaviors and characteristics have changed over the last two centuries. First, let's load all the required packages and add some helpful draw functions.

In [1]:
from configs import *
import pandas as pd
import numpy as np
import altair as alt
alt.renderers.enable('notebook')
from visualization.visual_utils import *
import turicreate.aggregate as agg

def normalize_features_dict(feature_dict, start_year):
    d = {}
    feature_dict = {(y - start_year):v for y,v in feature_dict.iteritems()}
    return feature_dict

def get_values_sum_by_year_dict(d, max_keys):
    d2 = {}
    for i in range(max_keys):
        d2[i] = sum([v for k,v in d.iteritems() if k <= i])
    return d2 

#some help function to draw the decade graph
def draw_decades_avg_chart(sf, col_name, avg_col_name, max_year=2014, title=None):
    sf = sf[sf["Years since First Publication"] != None]
    sf = sf[sf[col_name] != None]
    sf = sf[sf.apply(lambda r: (r["Academic BirthYear"] + r["Years since First Publication"])<= max_year)]
    g = sf.groupby(["Academic Birth Decade",  "Years since First Publication"], {avg_col_name: agg.AVG(col_name)})
    df =g.to_dataframe()
    if title is not None:
        chart = alt.Chart(df, title=title)
    else:
        chart = alt.Chart(df)
    chart = chart.mark_line().encode(
        alt.X('Years since First Publication:Q', axis=alt.Axis(format='d'), scale=alt.Scale(zero=False)),
        alt.Y("%s:Q" % avg_col_name, scale=alt.Scale(zero=False)),
        color="Academic Birth Decade"   )
    return chart

#some help function to draw the decade graph
def draw_decades_med_chart(sf, col_name, med_col_name, max_year=2014, title=None):
    sf = sf[sf["Years since First Publication"] != None]
    sf = sf[sf[col_name] != None]
    sf = sf[sf.apply(lambda r: (r["Academic BirthYear"] + r["Years since First Publication"])<= max_year)]
    g = sf.groupby(["Academic Birth Decade",  "Years since First Publication"], {"Med List": agg.CONCAT(col_name)})
    g[med_col_name] = g["Med List"].apply(lambda l: np.median(l))
    g = g.remove_column('Med List')
    df = g.to_dataframe()
    
    if title is not None:
        chart = alt.Chart(df, title=title)
    else:
        chart = alt.Chart(df)
    chart = chart.mark_line().encode(
        alt.X('Years since First Publication:Q', axis=alt.Axis(format='d'), scale=alt.Scale(zero=False)),
        alt.Y("%s:Q" % med_col_name, scale=alt.Scale(zero=False)),
        color="Academic Birth Decade"   )
    return chart
/home/michael/anaconda2/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
/home/michael/anaconda2/lib/python2.7/site-packages/entrypoints.py:171: DeprecationWarning: You passed a bytestring as `filenames`. This will not work on Python 3. Use `cp.read_file()` or switch to using Unicode strings across the board.
  cp.read(path)
/home/michael/anaconda2/lib/python2.7/importlib/__init__.py:37: DeprecationWarning: The vega3 module is deprecated. Use vega instead.
  __import__(name)

2.1. Number of New Authors םver ime

First, we will start by observing how the number of new authors has changed over the years.

In [2]:
p_sf = tc.load_sframe(EXTENDED_PAPERS_SFRAME)["Paper ID", "Paper publish year"]
a_sf = tc.load_sframe(PAPER_AUTHOR_AFFILIATIONS_SFRAME)["Paper ID", "Author ID"]
sf = a_sf.join(p_sf)["Author ID",  "Paper publish year"]
sf = sf.rename({"Paper publish year": "Year"})
g = sf.groupby("Author ID", {"Start Year": agg.MIN("Year")}) # Create the starting year of 114,692,920 authors
g2 = g.groupby("Start Year", {"New Authors Number": agg.COUNT()})
g2 = g2.rename({"Start Year": "Year"})
draw_features_yearly_chart(g2, "New Authors Number", 1800, 2014, title="Number of New Authors over Time")
Out[2]:

It can be observed that over the years there has been a surge in the number of new authors. Next, let us examine how various authors' features have changed over time. To achieve this, in the rest of this notebook we will mainly see the AUTHORS_FEATURES_SFRAME, which contains various other features of over 20 million authors who published at least one paper with 5 references (see notebook). Let's utilize the AUTHORS_FEATURES_SFRAME and the authors' first names that will be used to predict authors’ genders and to observe trends of new authors' genders.

In [3]:
af_sf = tc.load_sframe(AUTHROS_FEATURES_SFRAME)
af_sf["Academic BirthYear"] =  af_sf["Papers by Years Dict"].apply(lambda d: min(d.keys()))
af_sf["Predicted Gender"] = af_sf["Gender Dict"].apply(lambda d: d["Gender"])
af_sf["Academic Birth Decade"] = af_sf["Academic BirthYear"].apply(lambda y: y - y%10)
sf = af_sf["Academic BirthYear", "Predicted Gender"]
g = sf.groupby(["Academic BirthYear", "Predicted Gender"], {"New Authors Number": agg.COUNT()}) 
g = g.rename({"Academic BirthYear": "Year"})
g = filter_sframe_by_years(g, 1800, 2014)
g = g[g["Predicted Gender"].apply(lambda n: n in {"Male", "Female"})]
In [4]:
chart = alt.Chart(g.to_dataframe(),title="New Authors by Gender over Time").mark_line().encode(
    alt.X('Year:Q', axis=alt.Axis(format='d'), scale=alt.Scale(zero=False)),
    alt.Y('New Authors Number:Q' , scale=alt.Scale(zero=False)),
    color="Predicted Gender"    )
chart
Out[4]:

From the above chart we can see that there is an increase in both male and female new authors.

2.2 Authors' Number of Papers over Time

Let's use the MAG dataset to observe how the average number of authors per paper has changed over the decades.

In [5]:
selected_decades = {1950,1960,1970,1980, 1990, 2000, 2010}


sf = af_sf["Papers by Years Dict", "Academic BirthYear", "Academic Birth Decade"]    
sf = sf[sf["Academic Birth Decade"].apply(lambda decade: decade in selected_decades)]

# First, for each author, we create a dict in which each key is the number of years since the author published his/her first paper, and the 
# the number of papers the author wrote in a specific year since the publication of his/her first paper
sf["Papers Number by Years Dict"] = sf["Papers by Years Dict"].apply(lambda d:{k:len(v) for k,v in d.iteritems()})
sf["Papers Number by Years Dict"] = sf.apply(lambda r: normalize_features_dict(r["Papers Number by Years Dict"], r["Academic BirthYear"]))

# Second, for each author, we create a dict with the total number of papers the author have written after 'n' years
# since he/she published his/her first paper
sf["Total Papers Dict"] = sf.apply(lambda r: get_values_sum_by_year_dict(r["Papers Number by Years Dict"], 
                                                                         min(30, 2014 - r["Academic BirthYear"])))

sf = sf["Academic Birth Decade","Academic BirthYear", "Total Papers Dict"]
t_sf = sf.stack("Total Papers Dict", new_column_name=["Years since First Publication", "Total Papers Number"])
draw_decades_avg_chart(t_sf, "Total Papers Number","Average Number of Published Papers", title="Average Number of Papers by Academic Birth Decade" )
Out[5]:

Let's redraw the chart, only this time using only authors that published at least 5 papers during all their career.

In [6]:
sf["Total Papers"] = sf["Total Papers Dict"].apply(lambda d: d[max(d.keys())] if d is not None and len(d.keys()) > 0 else None)
sf = sf[sf["Total Papers"] >= 5]            
t_sf = sf.stack("Total Papers Dict", new_column_name=["Years since First Publication", "Total Papers Number"])
draw_decades_avg_chart(t_sf, "Total Papers Number","Average Number of Published Papers", title="Average Number of Papers by Academic Birth Decade (min 5 papers)" )                                       
Out[6]:

As can be observed from the above graph, in each decade the rate of publications by new researchers considerably accelerated. For example, researchers who started their academic career in the 1960s published on average a little less than 2 papers after a decade. However, researchers that started their career in the 2000s published on average about 2 papers after less than 4 years, and about 12 papers after a decade.

2.3 Authors’ Publications in Venues over Time

Let's use the data to observe authors' trends in publishing in conferences and journals

In [7]:
af_sf["Academic BirthYear"] =  af_sf["Papers by Years Dict"].apply(lambda d: min(d.keys()))
af_sf["Predicted Gender"] = af_sf["Gender Dict"].apply(lambda d: d["Gender"])
af_sf["Academic Birth Decade"] = af_sf["Academic BirthYear"].apply(lambda y: y - y%10)


def is_empty_venue_list(l):
    l = [i for i in l if i != '']
    if len(l) == 0:
        return True
    return False
selected_decades = {1950,1970, 1990, 2000, 2010}        
sf = af_sf["Academic BirthYear", "Academic Birth Decade", 'Conference ID by Year Dict', 'Journal ID by Year Dict']
sf['Number of Conference by Year Dict'] = sf['Conference ID by Year Dict'].apply(lambda d:{k:len(v) for k,v in d.iteritems() if not is_empty_venue_list(v) })
sf['Number of Journals ID by Year Dict'] = sf['Journal ID by Year Dict'].apply(lambda d:{k:len(v) for k,v in d.iteritems() if not is_empty_venue_list(v)})
sf['Number of Conference by Year Dict'] = sf.apply(lambda r: normalize_features_dict(r['Number of Conference by Year Dict'], r['Academic BirthYear']))
sf['Number of Journals by Year Dict'] = sf.apply(lambda r: normalize_features_dict(r['Number of Journals ID by Year Dict'], r['Academic BirthYear']))
sf = sf[sf["Academic Birth Decade"].apply(lambda decade: decade in selected_decades)]
sf.materialize()
In [8]:
c_sf= sf["Academic BirthYear","Academic Birth Decade",'Number of Conference by Year Dict']
c_sf = c_sf[c_sf['Number of Conference by Year Dict'] != {}]
c_sf = c_sf.stack("Number of Conference by Year Dict", new_column_name=["Years since First Publication", "Total Number of Conferences"])
draw_decades_avg_chart(c_sf, 'Total Number of Conferences',"Average Number of Conference Papers", title="Number of Conference Papers over Time" )
Out[8]:

In [9]:
j_sf= sf["Academic BirthYear","Academic Birth Decade",'Number of Journals by Year Dict']
j_sf = j_sf[j_sf['Number of Journals by Year Dict'] != {}] 
j_sf = j_sf.stack("Number of Journals by Year Dict", new_column_name=["Years since First Publication", "Total Number of Journals"])
draw_decades_avg_chart(j_sf, 'Total Number of Journals',"Average Number of Journal Papers", title="Number of Journal Papers over Time"  )
Out[9]:

We can observe that with each decade, researchers publish more and more journal and conferences papers. Moreover, we can observe that since the 1990s there has been a trend to publish more and more papers in conferences. For example, while researchers that started their careers in the 1970s published on average about 2 conference papers and 1.65 journal papers after 10 years, researchers that started their career in the 2000s published about 4 conference papers and 2.59 journal papers.

2.4 Number of Coauthors over Time

Now let's calculate the average number of coauthors over time for an authors’ group that started their careers in the same decade.

In [10]:
from author import Author

def get_total_coauthors_number_by_year_dict(author_id, year_span=25, max_year=2014):
    a = Author(author_id)    
    start_year = a.first_publication_year
    last_year = min(start_year + year_span, max_year)
    try:
        coauthors = a.get_coauthors_list(start_year, last_year)
    except:
        return None

    return {(i - start_year):len(set(a.get_coauthors_list(start_year,i))) for i in range(start_year, last_year + 1)}

selected_decades = {1950,1970, 1990, 2000, 2010}
af_sf = tc.load_sframe(AUTHROS_FEATURES_SFRAME)
af_sf["Academic BirthYear"] =  af_sf["Papers by Years Dict"].apply(lambda d: min(d.keys()))
af_sf["Predicted Gender"] = af_sf["Gender Dict"].apply(lambda d: d["Gender"])
af_sf["Academic Birth Decade"] = af_sf["Academic BirthYear"].apply(lambda y: y - y%10)
sf = af_sf["Author ID", "Coauthors by Years Dict", "Academic BirthYear", "Academic Birth Decade", "Predicted Gender"]  
sf = sf[sf["Academic Birth Decade"].apply(lambda decade: decade in selected_decades)]
sf['Total Coauthors Dict'] = sf["Author ID"].apply(lambda i: get_total_coauthors_number_by_year_dict(i))
sf.materialize()
a_sf = sf.stack("Total Coauthors Dict", new_column_name=["Years since First Publication", "Total Coauthors Number"])
draw_decades_avg_chart(a_sf, "Total Coauthors Number","Average Number of Coauthors", title="Average Number of Coauthors by Academic Birth Decade" )
2018-07-20 18:09:33,342 [MainThread  ] [DEBUG]  Fetching author 00001F05
2018-07-20 18:09:33,353 [MainThread  ] [DEBUG]  Fetching author 00002AD3
2018-07-20 18:09:33,355 [MainThread  ] [DEBUG]  Fetching author 00006A31
2018-07-20 18:09:33,357 [MainThread  ] [DEBUG]  Fetching author 0000B5FA
2018-07-20 18:09:33,361 [MainThread  ] [DEBUG]  Fetching author 0001CF9B
2018-07-20 18:09:33,363 [MainThread  ] [DEBUG]  Fetching author 00040294
2018-07-20 18:09:33,366 [MainThread  ] [DEBUG]  Fetching author 0004B8AF
2018-07-20 18:09:33,368 [MainThread  ] [DEBUG]  Fetching author 000510E2
2018-07-20 18:09:33,370 [MainThread  ] [DEBUG]  Fetching author 00063841
2018-07-20 18:09:33,372 [MainThread  ] [DEBUG]  Fetching author 00065272
2018-07-20 18:09:33,374 [MainThread  ] [DEBUG]  Fetching author 0006F3CE
2018-07-20 18:09:33,376 [MainThread  ] [DEBUG]  Fetching author 000757F8
2018-07-20 18:09:33,379 [MainThread  ] [DEBUG]  Fetching author 00079EC2
2018-07-20 18:09:33,381 [MainThread  ] [DEBUG]  Fetching author 0007F8D3
2018-07-20 18:09:33,383 [MainThread  ] [DEBUG]  Fetching author 00089BDE
2018-07-20 18:09:33,385 [MainThread  ] [DEBUG]  Fetching author 0008B2A1
2018-07-20 18:09:33,387 [MainThread  ] [DEBUG]  Fetching author 0008E3A7
2018-07-20 18:09:33,390 [MainThread  ] [DEBUG]  Fetching author 0008FF7C
2018-07-20 18:09:33,392 [MainThread  ] [DEBUG]  Fetching author 00092599
2018-07-20 18:09:33,394 [MainThread  ] [DEBUG]  Fetching author 00098ED4
2018-07-20 18:09:33,397 [MainThread  ] [DEBUG]  Fetching author 000AA568
2018-07-20 18:09:33,399 [MainThread  ] [DEBUG]  Fetching author 000ABCFD
2018-07-20 18:09:33,402 [MainThread  ] [DEBUG]  Fetching author 000B2780
2018-07-20 18:09:33,405 [MainThread  ] [DEBUG]  Fetching author 000BE213
2018-07-20 18:09:33,407 [MainThread  ] [DEBUG]  Fetching author 000C4731
2018-07-20 18:09:33,409 [MainThread  ] [DEBUG]  Fetching author 000CCB25
2018-07-20 18:09:33,411 [MainThread  ] [DEBUG]  Fetching author 000E1158
2018-07-20 18:09:33,414 [MainThread  ] [DEBUG]  Fetching author 000F0C84
2018-07-20 18:09:33,417 [MainThread  ] [DEBUG]  Fetching author 000F1BBF
2018-07-20 18:09:33,420 [MainThread  ] [DEBUG]  Fetching author 000F6D6D
2018-07-20 18:09:33,422 [MainThread  ] [DEBUG]  Fetching author 0010B40E
2018-07-20 18:09:33,424 [MainThread  ] [DEBUG]  Fetching author 00110DB5
2018-07-20 18:09:33,426 [MainThread  ] [DEBUG]  Fetching author 001118FB
2018-07-20 18:09:33,428 [MainThread  ] [DEBUG]  Fetching author 00111C28
2018-07-20 18:09:33,430 [MainThread  ] [DEBUG]  Fetching author 0011690A
2018-07-20 18:09:33,432 [MainThread  ] [DEBUG]  Fetching author 0012BA45
2018-07-20 18:09:33,435 [MainThread  ] [DEBUG]  Fetching author 0012CFF4
2018-07-20 18:09:33,438 [MainThread  ] [DEBUG]  Fetching author 001359DD
2018-07-20 18:09:33,441 [MainThread  ] [DEBUG]  Fetching author 00137D3E
2018-07-20 18:09:33,443 [MainThread  ] [DEBUG]  Fetching author 0013FC04
2018-07-20 18:09:33,445 [MainThread  ] [DEBUG]  Fetching author 00153BD3
2018-07-20 18:09:33,448 [MainThread  ] [DEBUG]  Fetching author 001583BC
2018-07-20 18:09:33,450 [MainThread  ] [DEBUG]  Fetching author 0017223D
2018-07-20 18:09:33,453 [MainThread  ] [DEBUG]  Fetching author 001758C3
2018-07-20 18:09:33,455 [MainThread  ] [DEBUG]  Fetching author 001782A7
2018-07-20 18:09:33,457 [MainThread  ] [DEBUG]  Fetching author 0017B21B
2018-07-20 18:09:33,459 [MainThread  ] [DEBUG]  Fetching author 0017EF71
2018-07-20 18:09:33,461 [MainThread  ] [DEBUG]  Fetching author 0018456E
2018-07-20 18:09:33,464 [MainThread  ] [DEBUG]  Fetching author 00190FCD
2018-07-20 18:09:33,467 [MainThread  ] [DEBUG]  Fetching author 0019F876
2018-07-20 18:09:33,469 [MainThread  ] [DEBUG]  Fetching author 001A0EDE
2018-07-20 18:09:33,471 [MainThread  ] [DEBUG]  Fetching author 001B40D0
2018-07-20 18:09:33,473 [MainThread  ] [DEBUG]  Fetching author 001BF46E
2018-07-20 18:09:33,475 [MainThread  ] [DEBUG]  Fetching author 001BF8B9
2018-07-20 18:09:33,492 [MainThread  ] [DEBUG]  Fetching author 001C0C0F
2018-07-20 18:09:33,494 [MainThread  ] [DEBUG]  Fetching author 001DA72F
2018-07-20 18:09:33,497 [MainThread  ] [DEBUG]  Fetching author 001E81A5
2018-07-20 18:09:33,500 [MainThread  ] [DEBUG]  Fetching author 001F743B
2018-07-20 18:09:33,503 [MainThread  ] [DEBUG]  Fetching author 001F7C55
2018-07-20 18:09:33,505 [MainThread  ] [DEBUG]  Fetching author 0020485D
2018-07-20 18:09:33,507 [MainThread  ] [DEBUG]  Fetching author 00227336
2018-07-20 18:09:33,510 [MainThread  ] [DEBUG]  Fetching author 00242CEC
2018-07-20 18:09:33,512 [MainThread  ] [DEBUG]  Fetching author 00248137
2018-07-20 18:09:33,514 [MainThread  ] [DEBUG]  Fetching author 0025DC3C
2018-07-20 18:09:33,516 [MainThread  ] [DEBUG]  Fetching author 00277F6B
2018-07-20 18:09:33,518 [MainThread  ] [DEBUG]  Fetching author 0028629D
2018-07-20 18:09:33,521 [MainThread  ] [DEBUG]  Fetching author 00288374
2018-07-20 18:09:33,523 [MainThread  ] [DEBUG]  Fetching author 002896E4
2018-07-20 18:09:33,526 [MainThread  ] [DEBUG]  Fetching author 002908F2
2018-07-20 18:09:33,529 [MainThread  ] [DEBUG]  Fetching author 0029B698
2018-07-20 18:09:33,531 [MainThread  ] [DEBUG]  Fetching author 0029E206
2018-07-20 18:09:33,533 [MainThread  ] [DEBUG]  Fetching author 002B9170
2018-07-20 18:09:33,535 [MainThread  ] [DEBUG]  Fetching author 002BEBC2
2018-07-20 18:09:33,538 [MainThread  ] [DEBUG]  Fetching author 002BF8AA
2018-07-20 18:09:33,540 [MainThread  ] [DEBUG]  Fetching author 002C4496
2018-07-20 18:09:33,542 [MainThread  ] [DEBUG]  Fetching author 002CB240
2018-07-20 18:09:33,545 [MainThread  ] [DEBUG]  Fetching author 003059EB
2018-07-20 18:09:33,548 [MainThread  ] [DEBUG]  Fetching author 0030BA32
2018-07-20 18:09:33,550 [MainThread  ] [DEBUG]  Fetching author 003278A8
2018-07-20 18:09:33,552 [MainThread  ] [DEBUG]  Fetching author 003297BF
2018-07-20 18:09:33,555 [MainThread  ] [DEBUG]  Fetching author 0032EC03
2018-07-20 18:09:33,557 [MainThread  ] [DEBUG]  Fetching author 0033337E
2018-07-20 18:09:33,560 [MainThread  ] [DEBUG]  Fetching author 00333DFE
2018-07-20 18:09:33,563 [MainThread  ] [DEBUG]  Fetching author 00336BC5
2018-07-20 18:09:33,565 [MainThread  ] [DEBUG]  Fetching author 0034A056
2018-07-20 18:09:33,567 [MainThread  ] [DEBUG]  Fetching author 00355E67
2018-07-20 18:09:33,569 [MainThread  ] [DEBUG]  Fetching author 00362A2F
2018-07-20 18:09:33,571 [MainThread  ] [DEBUG]  Fetching author 00367A35
2018-07-20 18:09:33,573 [MainThread  ] [DEBUG]  Fetching author 0036FE23
2018-07-20 18:09:33,576 [MainThread  ] [DEBUG]  Fetching author 00381D0A
2018-07-20 18:09:33,578 [MainThread  ] [DEBUG]  Fetching author 00390B76
2018-07-20 18:09:33,580 [MainThread  ] [DEBUG]  Fetching author 003BB03B
2018-07-20 18:09:33,582 [MainThread  ] [DEBUG]  Fetching author 003BCEAF
2018-07-20 18:09:33,584 [MainThread  ] [DEBUG]  Fetching author 003BECC1
2018-07-20 18:09:33,586 [MainThread  ] [DEBUG]  Fetching author 003BFF5C
2018-07-20 18:09:33,589 [MainThread  ] [DEBUG]  Fetching author 003C02AD
2018-07-20 18:09:33,591 [MainThread  ] [DEBUG]  Fetching author 003CB0FC
2018-07-20 18:09:33,594 [MainThread  ] [DEBUG]  Fetching author 003D71D3
2018-07-20 18:09:33,596 [MainThread  ] [DEBUG]  Fetching author 003DEDA7
2018-07-20 18:09:33,598 [MainThread  ] [DEBUG]  Fetching author 003E84A8
Out[10]:

We can observe that the average number of coauthors considerably increased over the decades. Moreover, we can notice that while authors that started their careers in the 1950s and 1970s had on average only a few coauthors over a period of 25 years, researchers who started their careers in the 1990s had over 60 coauthors in the same career length of 25 years.

2.5 Authors’ Place in Authors List over Time

In this section, let's check the average/median place of authors in the papers' authors list based on the decade the authors published their first paper. Moreover, we will examine how the average number of times for authors to be a first author is based on the decade the authors' started their career and on their gender.

In [11]:
selected_decades = {1950,1970, 1990, 2000, 2010} 
sf = af_sf['Academic Birth Decade', 'Academic BirthYear', 'Sequence Number by Year Dict', 'Predicted Gender']
sf = sf[sf["Academic Birth Decade"].apply(lambda decade: decade in selected_decades)]
sf["Papers Number by Years Dict"] = sf.apply(lambda r: normalize_features_dict(r['Sequence Number by Year Dict'], r["Academic BirthYear"]))
s_sf = sf.stack("Papers Number by Years Dict", new_column_name=["Years since First Publication", "Sequence Number List"])
s_sf = s_sf.stack("Sequence Number List", new_column_name="Sequence Number")
In [12]:
sf = s_sf[ 'Academic BirthYear', "Academic Birth Decade","Years since First Publication", "Sequence Number",'Predicted Gender']
sf = sf[sf["Years since First Publication"] != None]
sf = sf[sf["Sequence Number"] != None]
sf = sf[sf["Years since First Publication"] <= 30]
draw_decades_med_chart(sf,"Sequence Number", "Median Sequence Number", title="Authors' Median Sequence Number over Time" )
Out[12]:

In [13]:
sf["Is First Author"] = sf["Sequence Number"].apply(lambda i: 1 if i == 1 else 0)
draw_decades_avg_chart(sf,"Is First Author", "Percentage of Times has First Author", title=". Percentage of Times Researcher Was First Author"  )
Out[13]:

In [14]:
sf = s_sf[ 'Academic BirthYear', "Academic Birth Decade","Years since First Publication", "Sequence Number",'Predicted Gender']
sf["Is First Author"] = sf["Sequence Number"].apply(lambda i: 1 if i == 1 else 0)
sf = sf[sf["Predicted Gender"] == "Female"]
draw_decades_avg_chart(sf,"Is First Author", "Percebtage Times has First Author",  title="Percentage of Times Female Researcher was First Author" )
Out[14]:

In [15]:
sf = s_sf[ 'Academic BirthYear', "Academic Birth Decade","Years since First Publication", "Sequence Number",'Predicted Gender']
sf["Is First Author"] = sf["Sequence Number"].apply(lambda i: 1 if i == 1 else 0)
sf = sf[sf["Predicted Gender"] == "Male"]
draw_decades_avg_chart(sf,"Is First Author", "Percebtage Times has First Author", title="Percentage of Times Male Researcher was First Author" )
Out[15]:

It can be observed that as time passes, the percentage of senior authors being a first author decreases.

In [16]:
sf = s_sf[ 'Academic BirthYear', "Academic Birth Decade","Years since First Publication","Sequence Number",'Predicted Gender']
sf["Is First Author"] = sf["Sequence Number"].apply(lambda i: 1 if i == 1 else 0)
g = sf.groupby(["Academic Birth Decade","Years since First Publication",'Predicted Gender'], {"Percebtage Times has First Author":agg.AVG("Is First Author")})
g = g.rename({'Predicted Gender':"Gender"})
g = g[g["Gender"].apply(lambda gender: gender in {"Male", "Female"})]
g = g.sort(["Academic Birth Decade","Years since First Publication" ])
In [17]:
c = sns.FacetGrid(g.to_dataframe(), col="Academic Birth Decade", hue='Gender', sharex=False, sharey=True, col_wrap=2)
c.map(plt.plot, "Years since First Publication", "Percebtage Times has First Author", alpha=.7)
c.add_legend()
Out[17]:
<seaborn.axisgrid.FacetGrid at 0x7f5a5cdf4f90>

We notice that over the decades, the gap between the number of times a male and female are, on average, first authors considerably decreases.