Re: Github repo for galaxy-nlp

classic Classic list List threaded Threaded
2 messages Options
| Threaded
Open this post in threaded view
|

Re: Github repo for galaxy-nlp

Richard Eckart de Castilho
On 19.12.2016, at 16:37, Suderman Keith <[hidden email]> wrote:
>
>> I don't think it is necessary to even keep the main master branch
>> in our repo. How about just keeping feature branches with
>> specific changes against the canonical master branch?
>
> Is there a way to create a fork that contains just specific branches?  I created a branch the other night, but we can delete it and start again if needed.

In principle, a repo does not need a master branch. You can even change
the default branch used by GitHub in the repo settings.

I don't know which branches are copied by default when you create a
fork on GitHub.

I would assume that in the case one works with feature branches
in a fork, the master branch in the fork is not required and that
it should be possible to maintain the feature branches based directly
off the original repository's master branch. However, I don't know
how well the GitHub web UI supports this.

Anyway, the question is what the strategy for the changes in the NLP
fork should be:

a) maintain a true fork with a whole bunch of modifications that may
   in fact never make it upstream

b) maintain a set of separate feature branches (i.e. patches) with the
   plan of eventually merging them upstream

In the case of a) it is easier in the beginning because all changes end
up in a big bunch and there is no overhead in keeping track how changes
relating to different features interact and how to keep them separate
from each other such that they may be merged upstream separately. As time
progresses, we may face increasing problems of merging in changes made
by upstream into our fork and maintenance overhead will increase. It
eventually means redundant work in the Galaxy and Galaxy-NLP communities.

In the case of b), we start already with an increased overhead as
changes for specific features need to be tracked in separate branches.
It requires deeper knowledge of how git works and it calls for setting up
a CI that merges multiple branches for the integration builds to test
them together. However, the idea in this case is that feature branches
are maintained not for a long time, but only for as long as necessary
for them to become acceptable to be merged by upstream. It means that
the Galaxy-NLP community contributes to the Galaxy community, tries
to avoid redundant work, and that the changes from the NLP community
become also more easily accessible by other Galaxy-using communities.

Cheers,

-- Richard

P.S.: I have CCed the mailing list ;) Let's see who gets the mail
      via the list.
_______________________________________________
Galaxy-NLP mailing list
[hidden email]
https://lists.galaxyproject.org/listinfo/galaxy-nlp
| Threaded
Open this post in threaded view
|

Re: Github repo for galaxy-nlp

Suderman Keith
Hi Richard,

On 19.12.2016, at 16:37, Suderman Keith <[hidden email]> wrote: 

>> I don't think it is necessary to even keep the main master branch 
>> in our repo. How about just keeping feature branches with 
>> specific changes against the canonical master branch? 

> Is there a way to create a fork that contains just specific branches?  I created a branch the other night, but we can delete it and start again if needed. 

In principle, a repo does not need a master branch. You can even change 
the default branch used by GitHub in the repo settings. 

Yes, the names 'master', 'develop', 'origin', 'upstream' etc are simply used by convention and have no real meaning to git.


I don't know which branches are copied by default when you create a 
fork on GitHub. 

I believe that when a repository is forked the entire repository is copied.  I just did a quick search and was unable to find any instructions on creating a fork with just selected branches.

I would assume that in the case one works with feature branches 
in a fork, the master branch in the fork is not required and that 
it should be possible to maintain the feature branches based directly 
off the original repository's master branch. However, I don't know 
how well the GitHub web UI supports this. 

I could be wrong, but I don't think Git supports this.  As far as I am aware "feature branches" are checked out based on the local repository.  That is, I don't think there is a commad to tell git "create a feature branch in my fork based on upstream/master".  We would first have to merge upstream/master into our fork and then create the branch from that.

$> git fetch upstream
$> git checkout master
$> git merge upstream/master
$> git checkout -b feature-branch


Anyway, the question is what the strategy for the changes in the NLP 
fork should be: 

a) maintain a true fork with a whole bunch of modifications that may 
   in fact never make it upstream 

b) maintain a set of separate feature branches (i.e. patches) with the 
   plan of eventually merging them upstream 

In the case of a) it is easier in the beginning because all changes end 
up in a big bunch and there is no overhead in keeping track how changes 
relating to different features interact and how to keep them separate 
from each other such that they may be merged upstream separately. As time 
progresses, we may face increasing problems of merging in changes made 
by upstream into our fork and maintenance overhead will increase. It 
eventually means redundant work in the Galaxy and Galaxy-NLP communities. 

In the case of b), we start already with an increased overhead as 
changes for specific features need to be tracked in separate branches. 
It requires deeper knowledge of how git works and it calls for setting up 
a CI that merges multiple branches for the integration builds to test 
them together. However, the idea in this case is that feature branches 
are maintained not for a long time, but only for as long as necessary 
for them to become acceptable to be merged by upstream. It means that 
the Galaxy-NLP community contributes to the Galaxy community, tries 
to avoid redundant work, and that the changes from the NLP community 
become also more easily accessible by other Galaxy-using communities. 

I think I favour a combination of mostly b) with a bit of a).  I can envisage having nlp-centric modifications that the Galaxy team would not be interested in (i.e. replacing mentions of genome with corpora) and I can also image modifications that would make sense to the entire Galaxy community (i.e. SSO with SAML/Shibboleth).  That is why I suggested leaving the 'master' and 'dev' branches as unmodified copies of the upstream repos, and using something like nlp-master and nlp-dev for the NLP-centric version of Galaxy.  Features and pull requests intended for the main Galaxy community would be based on the dev (or master) branch while NLP-centric features would be based on the nlp-dev branch.

So the work-flow for a feature to be submitted back to the main galaxy would go something like:
  1. git fetch upstream  # make sure we are up to date
  2. git checkout dev
  3. git merge upstream/dev
  4. git checkout -b feature-a  # feature from dev branch
  5. [work work work]
  6. git commit -am "Feature complete"
  7. git push origin feature-a
  8. send PR to merge galaxy-nlp/feature-a into galaxyproject/galaxy:dev
  9. git fetch upstream  # after PR accepted
  10. git checkout dev
  11. git merge upstream/dev # update with our PR now applied
  12. git checkout nlp-dev   # apply our PR to our nlp-dev branch
  13. git merge dev
  14. git push origin nlp-dev
I've attached an image of the information flow as I see it. Unfortunately, I have no idea how feasible or scalable a setup like this would be.

Cheers,
Keith


Cheers, 

-- Richard 

P.S.: I have CCed the mailing list ;) Let's see who gets the mail 
      via the list. 


_______________________________________________
Galaxy-NLP mailing list
[hidden email]
https://lists.galaxyproject.org/listinfo/galaxy-nlp