Technical 05: Free and Open Source Development

Imagine expert software developers taking their valuable technical knowledge (in social and market terms) and putting it on the internet for anyone to read, study, modify, and redistribute... what a pipe dream, right?

Well, we are sorry for the non-imaginative nay-sayers among us but this is exactly what has happened for more than thirty years with a socio-technical phenomenon that has captured the attention of economists, anthropologists, legal scholars, and, of course, companies and governments. This phenomenon consists in the creation of a new mode of collaborative production, improvement, and usage of Free and Open Source technologies that was quite innovative for the computing industry of the 1990s. This innovation in method was originally a product of small, arcane software development communities... but it became very much part of the computing industry today (as we saw in class).

In this homework you will dive into the world of Free and Open Source development: you will enter a software code repository and examine its social and technical dynamics. After your experience you will return to report to your instructors on what you have found.

Before proceeding, please visit this link and copy the template document for your HW #5 report. This is the format you must use to report back your findings.

May the source be with you all!

Exercise #1: Finding and profiling a code repository

Let's visit the biggest Free and Open Source code repository in existence today, GitHub, and pick one project to examine.

You may be interested, say, in Free Software for "Data Science." This is all good: there are plenty of examples out there... from statistical packages to libraries for machine learning, data visualization, parallel computing, and much, much more. You will click on the "search box" of GitHub and perform a search for a type of program of your interest. You should continue to dig into the immense archive of Github until you find a project you would like to profile.

To exemplify the steps you need to take we will pick a project we like called "Jupyter," which was briefly discussed in class. "Jupyter notebook," in particular, provides an interface for scientific computing tasks. It has become very popular among professionals and students in statistical and computational data analysis. First thing we will do is to take a good look at the repository:
 

Jupyter1

We will then respond to these basic questions about the project:

1. How many people have contributed to the code?

2. How many changes ("commits") have been contributed thus far?

3. How many times has it been "forked" (copied) by GitHub users?

4. How many "issues" have been reported that are still open?

5. What are the top-3 most prevalent types of issues ("labels") in the project?

There is a whole lot we can learn about a project by just looking at its descriptive statistics. A few things jump at us by looking at the Jupyter notebook repository: it has a big number of contributors over time with a substantial number of changes. We have also seen a huge update by the number of times it has been copied ("forked") for the purposes of studying and/or contributing back to the project. Also, we learn from the reported "issues" that (despite the big number, over 2K) they are distributed in many categories. The top ones are related to "enhancement" requests, "bug" reports, and accessibility features. We see also that quite a few are poorly reported bugs, which means, they "need more info" to help developers identify what is wrong:

Jupyter2



Exercise #2: Exploring the socio-technical dynamics of a project

One of the most powerful social and technical affordances of Free and Open Source tech is the ability to "fork" a piece of code. From the exercise above, you can see that "forking" refers to the ability to make a copy of a particular code base of a program and take it into a different direction. Sometimes the "different direction" becomes a new program (not compatible with the original); most often, however, forks are made for bug fixes and new features to be created asynchronously, so they can be merged back with the original project. Here is a graph with a simple representation of how this works:
 

Forking Github


You can see above that "upstream" refers to the main codebase where the most current software development is taking place, whereas a "fork" represents a parallel copy that any developer can make to introduce new features or bug fixes.

Let's see this works in practice. We will click on "forks" in our repository and then sort them by "open pull request." Once we have the list we will pick the first one to inspect:

Jupyter3


"Open pull request" means in practice that a developer 1) made a fork, 2) made a change, and 3) returned the change with a request to the "upstream" code repository for the change to be included. The "pull request" in this case is "open" because it has not yet been included into the main repository (the "upstream"). The author of the pull request is patiently waiting, because patience is a virtue!

Now you know how the basic mechanisms of collaboration work: this form of coordination around "forks and merges" allows for a wide distribution of asynchronous work. That is fundamentally how the Linux project manages to get its contributions from people worldwide. Let's see the content of a contribution ("commit") in the "fork." We will click on the title of the "commit" as shown below ("Add square logo..."):

Jupyter4

 

You will report what kind of feature was created to the original program. Hint: you do not need to read and understand the code (at least, not now and not for this exercise), you just need to read the text of the "commit" message and tell us what that piece of code is doing.

Here is an example: You will see the information about the logo of the program being added to the desktop installer, so when people search on their desktop for the program, they can find Jupyter notebook by the logo. This is a simple addition, but an important one for Desktop users!

Jupyter5

 

Here are the questions you will respond in your report for this exercise:

6. Are the top-3 forks with "open pull requests" coming from core developers (the ones with most commits / contributions) or not?

7. What is the content of the top 1st fork with an "open pull request"? What change is being made / proposed for inclusion "upstream"?

Now you know how "forking" and "merging" contributions work! This is the "bread and butter" of collaborative software development.

Exercise #3: Licensing, authorship and contribution guidelines

Let's examine another aspect of collaborative software development: the social and legal aspects Free and Open Source contributions.

For a project to be defined as Free and Open Source, as we saw in class, we need a license. The license needs to be compliant with the Free Software Definition or the Open Source Definition. "Choose a license" is a website that is often used by newcomers to GitHub to help them navigate the difficult choice of choosing a particular license for their projects.

If we navigate to the main page of the Jupyter notebook repository and look through its files, we will find an important one called "LICENSE." In the body of this (plain text) file, we will find the licensing information. This is a canonical file in Free and Open Source projects: it is included to indicate what license the project uses and copyright is detained (remember the legal hack of copyleft as "copyright inverted"). Here is the content of the LICENSE file for Jupyter:

Jupyter6

Next, let's navigate to another key document that is often included in the open projects: the "Contributors' guide." In this case, it is encoded as a "markdown" file "Contributing.md." This is also a key document because it describes how contributions should be sent to the project. Oftentimes it includes information about coding style and formatting (so contributors know how to properly format their contributions). It is normal also to find in these guides information about "codes of conduct" which specify how one should behave in the context of the project: it is always a good reminder for folks that they should be excellent to each other (especially on the Internet, right?). Here is an example of the contributors guide for Jupyter:

Jupyter7

Once you are done with your examination, you will report on the following questions:

8. What is the license of the project you selected?

9. What are the parameters for accepting new contributions in the "contributors guide"? What does the “code of conduct” require of newcomers to the project?

We have now covered key aspects of Free and Open Source development: you know how to read basic stats of a project, you learned about the contributors a bit and examined the key mechanism through which contributions are made (even by people across the planet whom we may never meet). You now know the legal details governing contributions, but also the guidelines that serve as an entry point for new contributors. You are now ready to start contributing to the projects that you like! Where there is a will (and source code), there is always a way...

Extra credits: Technical, Social, and Commercial Aspects of Licensing

Remember we covered the major categories that organize Free and Open Source software licensing in class? Now, we (the instructors) want you to take the next step to discuss why a certain licensing choice has been made for the project that you picked for this homework.

We want you to take the licensing information from Exercise #3 and report to us what types of "freedom" one can exercise under the license and, most importantly, why one would want to use this license instead of another one. What would be the reasoning behind it?

In order to complete this task for extra credit, you must discuss the pros and cons of the license adopted and explain why a development team would make the choice. As discussed in class, finding a suitable license is quite hard: it involves technical demands, social values, and economic interests. Sometimes your interests do not align with the contributors of a project, sometimes they do! License choice is situated right at this intersection of conflicting demands... between reciprocity and individual benefit, between social obligation and personal freedom, between commercial and social benefit. Good luck in solving this socio-technical "puzzle" !

Grading

You must submit the report document you prepared with the responses to the questions we provided above. The first question set (Exercise #1) is worth 30 points (total). The second and third question sets (Exercise #2 and #3) are worth 10 points each. The extra credit challenge is 10 points. You will be graded using the following criteria:

- Accuracy of your response

- Completeness of your response

- Demonstration of understanding of the concepts we covered in class (and in the readings) in your response


Submission Instructions

Place your report document in the Google drive folder you previously shared with Abby.

- Name the document using this convention: lastname_firstname_duedate.docx  Example: Swenor_Abby_032823.docx

Abby will collect each file directly from your drive after the due date.

Happy open sourcing!

(Please remember: sharing is caring!)