Afifon is planned to be a system for sharing software project information between development platform instances. Today, distributed search of projects is completely missing! The best way I know is to run a distributed web search using Yacy. But Yacy is heavy and does much more than needed, and I haven't examined how it works with semantic tagging. It's another thing to examine.

Working on short term goals will hopefully help solve the problem of project hosting centralization and proprietary tools (especially g1thu8 which sadly many people use, even software freedom advocates), and in the process give me the knowledge, understanding and experience needed for the long term goal of bringing the same tools to all kinds of software.

"Afifon" (עפיפון) means kite in Hebrew (the flying toy on a string).

Table of Contents

Data Model

Before the federation part, there must be a regular data model for repo hosting. I'd like to start minimal, because having the full package with wikis and issue tracking and CI etc. makes the federation core work unnecessarily harder.

Therefore, the working assumption is that the system just stores repositories.

There are no groups and no organizations and no projects - just a flat namespace for repos, and a flat namespace for users.

Permissions, for now, will be simple too. Given a user and a repo, the permission system can determine whether the user can push changes to the repo, and whether the user can manage the permissions of the repo. That's all, just these two booleans. The permissions for a repo are therefore a pair of lists. One list contains users who may make changes to the repo, and the other list contains users who may add and remove users from these lists for this repo.

From the permissions of the instance's repos, for each user it's possible to compute and maintain a list of repos she can push to. However, as merge requests are possible too, people can participate in other projects too.

A merge request is an ordered pair of branches from different repos. It means "please merge the first branch into the second branch". For each repo, a list of these should be maintained.

model.dia

Federation

Minimal

What is the minimal feature set required for removing all the inconvenience of decentralized repo hosting?

  • Shared user accounts
  • Repos, projects, tickets, wikis - all support grouping and collaboration between users of different instances, transparently
  • Forks and merge requests across instances
  • Global search and access to all data: Can reach any person, project, repo, group, wiki, etc. by starting to search or browse from any instance - no need to know the home base instance in advance. HTTP redirection between instances is OK, just make it happen automatically during the user browsing workflow
  • Maybe share user, project, repo, etc. name space, so that they're all unique and can be moved between instances without name collisions. Another option is to assign unique IDs, e.g. using UUIDs, without requiring unique names

User Operations

  • vcs operations: as usual from command line vcs program
  • (TODO)

Federation Features

  • Search for users and repositories worldwide
  • Give permissions to users from other instances
  • Take merge requests from users from other instances
  • Display user and repo links, icons, etc. for remote ones in the same way as for local ones, making the federation transparent. The UI doesn't differentiate between local and remote objects.

Cases to Study

  • Gitolite
  • Darcsden
  • Gogs
  • Kallithea

Implementation Plan

  • Storage backend which stores all the meta info (users, permissions, etc.)
  • Repo/file for configuration by server admin
  • Shell for SSH access, allowing commands according to permissions
  • Library which abstracts VCS access - repository
  • Library which implements all the data manipulations
  • Web API which wraps it
  • Some kind of UI - either web-based or desktop GUI, not critical right now

User Stories

Storage Sharing

Using:

  • Alice has a repository R on instance S (denoted S.R)
  • Alice makes a local commit on her computer, using her local copy
  • Alice pushes the commit to the remote branch on instance S
  • Repository R has two read-only backup mirrors, on instances T and U
  • Either immediately after the push, or peiodically, S pushes the changes in S.R to the backup repositories T.R and U.R, which aren't necessarily visible publicly (maybe through some admin UI or backup mirror UI, not the main UI anyway)

Choosing:

  • Alice creates a new repository R on instance S
  • S chooses, based on stats and the list of participating instances, two backup instances for R - instances T and U
  • Periodically, and/or when T or U have uptime or responsiveness problems, or they announce expected downtime, S may choose new backup instances, so that there are always two of them. When one or both are down, this period of lack of backup is minimized by detecting this and choosing new instances.

Distributed Search

Today ways to find projects and be found are:

  • Have your project hosted in a proprietary centralized system like g1thu8
  • Have your project hosted in a free software system like gitorious, savannah
  • Run your own hosting platform with gitorious, kallithea, gogs, gitolite, etc.
  • Use a proprietary user-tracking centralized search engine, like g00gle
  • Use a distributed free software search engine, like Yacy

Getting all the proprietary-ness, centralization and greed out of the picture, possible ideas are:

  • Make hosting platforms federate, i.e. each one has integrated project search and integration for merge requests, usernames, bugs, wiki pages etc. across instances over the network
  • Make distributed web search support semantic search and have instances communicate like social network nodes (e.g. Diaspora* pods), which probably doesn't need a DHT
  • Have a shared vocabulary and API for dev platforms to provide info, use info, declare features and subsystems, etc.

Storage Distribution

Should there be a clear concept of "hosting provider" in the traditional sense? In other words, should storage be completely distributed (i.e. a project doesn't belong to any specific hosting server), or should each project have a "home base"?

From the user's point of view, it doesn't matter much in the technical sense. If a user can see project details using any instance, the actual storage location doesn't matter. But e.g. unlike with Tahoe LAFS, the information is public, so the LAFS concept and its overhead aren't needed here. It may not be critical to have a single specific upstream instance visible to the user, but in any case there should be several backup instances for each project.

What about the regular version control system usage? Having no upstream instance means that things like git clone will be slower. The server will have to figure out the physical upstream, clone from it and stream the data to the requesting user client. Instead, if people get the URL of the physical upstream, they can clone it like always.

Problem: What happens if physical upstream is down? How can people still work with the repo? Should they be able to push too, or just pull and clone?

If a local GUI app handles detecting the functioning mirror, then no physical visible upstream is needed in the first place. Pulling only is trivial: Just detect the mirror and pull. Pushing means that once the upstream is down, one of the backups detects this and becomes the new upstream (there's a shared known protocol for choosing which backup it is). From that point on, users can work with it transparently.

Question: Should I work on all these ideas, or should I focus on a minimal addition to the existing centralized model?

Answer: Start minimal. In particular, it means storage sharing isn't needed.

Tasks

  • Is Yacy good for this? Can it do semantic search? How good is it today at finding independently hosted software projects?
  • Understand DHTs, examine existing ones
  • Write some basic ontology for project info
  • How do GNU Social, Pump.io and Diaspora* instances federate?
  • Given semantic search, is DHT still a good option for collaboration features? If yes, it may be better to use it for search too than rely on general-purpose search
  • Is a general-purpose quadstore DHT possible? Does it make sense? Maybe it can a meta-store which just says "who knows what" or "who's online", and querying can be done using direct connections to the found nodes?
  • Maybe it's much easier and reusable to use an existing DHT based system! For example, GNUnet and I2P are general-purpose (e.g. see how file sharing and instant messaging work with them) and maybe Freenet is relevant too. Also cjdns. Make a list of candidates to examine.

Features and Ideas

  • Allow instances to community with other instances, for queries and commands
  • DHT for distributed access where needed
  • Fork projects from other instances
  • Send merge requests to projects in other instances
  • Report bugs to projects in other instances
  • Use GPG WoT for trust between users and/or between instances
  • In the future try to use distributed storage for repos, or at least for backup. For now, just let projects be hosted on specific hosts
  • Global semantic project search
  • Transparent federation: UI doesn't make you handle local/remote difference. It gets abstracted, like Diaspora and GNU Social transparently connect you to people from other instances
  • Global uniform username space. Maybe avoid using hosts in the name, so moving between hosts is easy.
  • Easy 1-click move of projects between instances, without breaking anything

Random

A development platform may consist of many components, each providing a solution for a particular need. Afifon should rely on any specific combination, because the common choice of components is arbitrary. What Afifon can and should rely on, is theory behind the combination of components.

Under application projects and under the Kiwi ontologies, I have been working on general-purpose models for wikis, issue tracking, discussions and much more. There is still a lot of work to do on these models and on the deployment aspects of solutions based on them. At least for now, Afifon won't rely on these models, because it may take a lot of time for them to be ready.

The initial Afifon will just use existing common practices. Examples for common components are: version controlled repository, wiki, issue tracker, mailing list, forum, generated manual and API reference.

For the very beginning, only core features will be supported. No integrated issue tracking, no wiki, no discussions. Just the version control and code repository aspects.

This is a plan by Jessica Tallon, which at the time of writing is concerned with the federation messages. Afifon should also support things like DHT, distributed DNS, storage sharing, p2p, routing and more. GNUnet may be a good idea here.

Collaborating on that project may be an awesome idea, especially the information model. Also, the JSON snippets. While Idan may be more readable and more writable for humans, it's much too complicated for the basic simple needs of machine communication. Machines don't need convenience features like references and generators. So I suppose that with some inspiration from JSON-LD, I could define a mapping between JSON and Smaoin. Anyway, we'll see soon.