2018-05-16

Blue Sky Discussion: EPEL-next or EPIC

EPIC Planning Document

History / Background

Since 2007, Fedora Extra Packages for Enterprise Linux (EPEL) has been rebuilding Fedora Project Linux packages for Red Hat Enterprise Linux and its clones. Originally the goal was to compile packages that RHEL did not ship but were useful in the running of Fedora Infrastructure and other sites. Packages would be forked from the nearest Fedora release (Fedora 3 for EPEL-4, Fedora 6 for EPEL-5) with little updating or moving of packages in order to give similar lifetimes as the EL packages. Emphasis was made on back-porting fixes versus upgrading, and also not making large feature changes which would cause confusion. If a package could not longer be supported, it would be removed from the repository to eliminate security concerns. At the time RHEL lifetimes were thought to be only 5-6 years so back-porting did not look like a large problem.

As RHEL and its clones became more popular, Red Hat began to extend the lifetime of the Enterprise Linux releases from 6 years to 10 years of "active" support. This made trying to back-port fixes harder and many packages in EPEL would be "aged" out and removed. This in turn caused problems for consumers who had tied kick-starts and other scripts to having access to those packages. Attempts to fix this by pushing for release upgrade policies have run into resistance from packagers who find focusing on the main Fedora releases a full time job already and only build EPEL packages as one-offs. Other attempts to update policies have run into needing major updates and changes to build tools and scripting but no time to do so. Finally, because EPEL has not majorly changed in 10 years, conversations about changing fall into "well EPEL has always done it like this" from consumers, packagers, and engineering at different places.

In order to get around many of these resistance points with changing EPEL, I suggest that we frame the problems around a new project called Extra Packages for Inter Communities. The goal of this project would be to build packages from Fedora Project Linux releases to various Enterprise Linux whether they are Red Hat Enterprise Linux, CentOS, Scientific Linux or Oracle Enterprise Linux.

Problems and Proposals

Composer Limitations:

Problem:
Currently EPEL uses the Fedora build system to compose a release of packages every couple of days. Because each day creates a new compose, the only channels are the various architectures and a testing where future packages can be tested. Updates are not in a separate because EPEL does not track releases.
EPEL packagers currently have to support a package for the 10 year lifetime of an RHEL release. If they have to update a release, all older versions are no longer available. If they no longer want to support a package it is completely removed. While this sounds like it increases security of consumers, Fedora does not remove old packages from older releases.
Proposed Solution
EPIC will match the Enterprise Linux major/minor numbers for releases. This means that a set of packages will be built for say EL5 sub-release 11 (aka 5.11). Those packages would populate for each supported architecture a release, updates and updates-testing directory. This will allow for a set of packages to be composed when the sub-release occurs and then stay until the release is ended.
/pub/epic/releases/5/5.11/{x86_64,source,i386,aarch64,arm,ppc64}/
/pub/epic/updates/5/5.11/{x86_64,source,i386,aarch64,arm,ppc64}/
/pub/epic/updates/testing/5/5.11/{x86_64,source,i386,aarch64,arm,ppc64}/
/pub/epic/development/5/CR/

Once a minor release is done, the old tree will be hard linked to an appropriate archive directory.

/pub/archives/epic/releases/5/5.11/{x86_64,source,i386,aarch64,arm,ppc64}/
/pub/archives/epic/updates/5/5.11/{x86_64,source,i386,aarch64,arm,ppc64}/
/pub/archives/epic/updates/testing/5/5.11/{x86_64,source,i386,aarch64,arm,ppc64}/

A new one will be built and placed in appropriate sub directories. Hard links to the latest will point to the new one, and after some time the old-tree will be removed from the active directory tree.

Channel Limitations:

Problem
EPEL is built against a subset of channels that Red Hat Enterprise Linux has for customers, namely the Server, High Availability, Optional, and some sort of Extras. Effort is made to make sure that EPEL does not replace with newer packages anything in those channels. However this does not extend to packages which are in the Workstation, Desktop, and similar channels. This can cause problems where EPEL’s packages replace something in those channels.
Proposed Solution
EPIC will be built against the latest released CentOS minor release using the channels which are enabled by default in the CentOS-Base.repo. These packages are built from source code that Red Hat delivers via a git mechanism to the CentOS project in order to rebuild them for mass consumption. Packages will not be allowed to replace/update according to the standard RPM Name-Epoch-Version-Release (NEVR) mechanism. This will allow EPIC to actually service more clients

Build System Limitations

Problem
EPEL is built against Red Hat Enterprise Linux. Because these packages are not meant for general consumption, the Fedora Build-system does not import them but builds them similarly to a hidden build-root. This causes multiple problems:
  • If EPEL has a package with the same name, it supersedes the RHEL one even if the NEVR is newer. This means old packages may get built against and constant pruning needs to be done.
  • If the EPEL package has a newer NEVR, it will replace the RHEL one which may not be what the consumer intended. This may break other software requirements.
  • Because parts of the build are hidden the package build may not be as audit-able as some consumers would like.
Proposed Solution
EPIC will import into the build system the CentOS build it is building against. With this the build is not hidden from view. It also makes it easier to put in rules that an EPIC package will never replace/remove a core build package. Audits of how a build is done can be clearly shown.

Greater Frequency Rebasing

Problem
Red Hat Enterprise Linux have been split between competing customer needs. Customers wish to have some packages stay steady for 10 years with only some updates to them, but they have also found that they need rapidly updated software. In order to bridge this, recent RHEL releases have rebased many software packages during a minor release. This has caused problems because EPEL packages were built against older software ABI’s which no longer work with the latest RHEL. This requires the EPEL software to be rebased and rebuilt regularly. Conversely, because of how the Fedora build system sees Red Hat Enterprise Linux packages, it only knows about the latest packages. In the 2-4 weeks between various community rebuilds getting their minor release packages built, EPEL packages may be built against API’s which are not available.

Proposed Solution
The main EPIC releases will be built against specific CentOS releases versus the Continual Release (CR) channel. When the next RHEL minor is announced, the EPIC releng will create new git branch from the current minor version (aka 5.10 → 5.11). Packagers can then make major updates to versions or other needs done. When the CentOS CR is populated with the new rpms, CR will be turned on in koji and packages will be built in the new tree using those packages. After 2 weeks, the EPIC minor release will be frozen and any new packages or fixes will occur in the updates tree.

Guidelines

Packaging

EL-4

This release is no longer supported by CentOS and will not be supported by EPIC.

EL-5

This release is no longer supported by CentOS and will not be supported by EPIC.

EL-6

This release is supported until Nov 30 2020 (2020-11-30). The base packaging rules for any package would be those used by the Fedora Project during its 12 and 13 releases. Where possible, EPIC will make macros to keep packaging more in line with current packaging rules.

EL-7

This release is supported until Jun 30 2024 (2024-06-30). The base packaging rules for any package would be those used by the Fedora Project during its 18 and 19 releases. Because EL7 has seen major updates in certain core software, newer packaging rules from newer releases is possible to follow.

EL-next

Red Hat has not publicly announced what its next release will be, when it will be released, or what its lifetime is. When that occurs, it will be clearer which Fedora release packaging will be based off of.

GIT structure

Currently EPEL uses only 1 branch for every major RHEL release. In order to better match how current RHEL releases contain major differences, EPIC will have a branch for every major.minor release. This is to allow for people who need older versions for their usage to better snapshot and build their own software off of it. There are several naming patterns which need to be researched:

/<package_name>/epic/6/10/
/<package_name>/epic/6/11/
/<package_name>/epic/7/6/
/<package_name>/epic/7/7/
//epic-6/6.10/
/<package_name>/epic-6/6.11/
/<package_name>/epic-7/7.6/
/<package_name>/epic-7/7.7/

/<package_name>/epic-6.10/
/<package_name>/epic-6.11/
/<package_name>/epic-7.6/
/<package_name>/epic-7.7/
Git module patterns will need to match what upstream delivers for any future EL.

Continuous Integration (CI) Gating

EPIC-6

The EL-6 life-cycle is reaching its final sub releases with more focus and growth in EL-7 and the future. Because of this gating will be turned off EPIC-6. Testing of packages can be done at the packagers discretion but is not required.

EPIC-7

The EL-7 life-cycle is midstream with 1-2 more minor releases with major API changes. Due to this, it makes sense to research if gating can be put in place for the next minor release. If the time and energy to retrofit tools to the older EL are possible then it can be turned on.

EPIC-next

Because gating is built into current Fedora releases, there should be no problem with turning it on for a future release. Packages which do not pass testing will be blocked just as they will be in Fedora 29+ releases.

Modules

EPIC-6

Because EL-6’s tooling is locked at this point, it does not make sense to investigate modules.

EPIC-7

Currently EL-7 does not support Fedora modules and would require updates to yum, rpm and other tools in order to do so. If these show up in some form in a future minor release, then trees for modules can be created and builds done.

EPIC-next

The tooling for modules can match how Fedora approaches it. This means that rules for module inclusion will be similar to package inclusion. EPIC-next modules must not replace/conflict with CentOS modules. They may use their own name-space to offer newer versions than what is offered and those modules may be removed in the next minor release if CentOS offers them then.

Build/Update Policy

Major Release

In the past, Red Hat has released a public beta before it finalizes its next major version. If possible, the rebuilders have come out with their versions of this release in order to learn what gotchas they will have when the .0 release occurs. Once the packages for the beta are built, EPIC will make a public call for packages to be released to it. Because packagers may not want to support a beta or they know that there will be other problems, these packages will NOT be auto branched from Fedora.

Minor Release

The current method CentOS uses to build a minor release is to begin rebuilding packages, patching problems and then when ready put those packages in their /cr/ directory. These are then tested for by people while updates are built and ISOs for the final minor release is done. The steps for EPIC release engineering will be the following:
  1. Branch all current packages from X.Y to X.Y+1
  2. Make any Bugzilla updates needed
  3. Rebuild all branched packages against CR
  4. File FTBFS against any packages.
  5. Packagers will announce major updates to mailing list
  6. Packagers will build updates against CR.
  7. 2 weeks in, releng will cull any packages which are still FTBFS
  8. 2 weeks in, releng will compose and lock the X.Y+1 release
  9. symlinks will point to the new minor release.
  10. 4 weeks in, releng will finish archiving off the X.Y release

Between Releases

Updates and new packages between releases will be pushed to the appropriate /updates/X.Y/ tree. Packagers will be encouraged to only make minor non-api breaking updates during this time. Major changes are possible, but need to follow this work flow:
  1. Announce to the EPEL list that a change is required and why
  2. Open a ticket to EPIC steering committee on this change
  3. EPIC steering committee approves/disapproves change
  4. If approved change happens but packages are in updates
  5. If not approved it can be done next minor release.

Build System

Build in Fedora

Currently EPEL is built in Fedora using the Fedora Build system which integrates koji, bodhi, greenwave, other tools together. This could be still used with EPIC.

Build in CentOS

EPIC could be built in the CentOS BuildSystem (CBS) which also uses koji and has some integration to the CentOS Jenkins CI system.

Build in Cloud

Instead of using existing infrastructure, EPIC is built with newly stood up builders in Amazon or similar cloud environments. The reasoning behind this would be to see if other build systems can transition there eventually.

Definitions

Blue Sky Project
A project with a different name to help eliminate preconceptions with the existing project.
Customer
A person who pays for a service either in money, time or goods.
Consumer
Sometimes called a user. A person who is consuming the service without work put into it.
EPEL
Extra Packages for Enterprise Linux. A product name which was to be replaced years ago, but no one came up with a better one.
EPIC
Extra Packages Inter Community.
RHEL
Red Hat Enterprise Linux

Last updated 2018-05-16 19:10:17 EDT This document was imported from an adoc..

2018-05-11

EPEL Outage Report 2018-11-05

Problem Description:

On 2018-05-11 04:00 UTC reports started coming into centos IRC channels about EPEL being corrupted and causing breakages. These were then reported to #fedora-admin and #epel-devel. The problem would show up as something like:

 One of the configured repositories failed (Unknown),
 and yum doesn't have enough cached data to continue. At this point the only
 safe thing yum can do is fail. There are a few ways to work "fix" this:

The problem was examined and turned out to be that an NFS problem on the backend systems causing the createrepo_c to create the repositories to create a corrupted SQL file. A program which was to catch this did not work for some reason still being investigated and the corrupted sqllite file was mirrored out.

Admins began filling up the #epel and #centos channel asking why their systems were broken. I would like to thank avij, tmz and others who worked on answering as many of the people as possible. I would also like to thank Kevin Fenzi for figuring out the problem, regenerating the builds and unstopping the NFS blockage.

Solution:

Because of the way mirroring works, this problem may affect clients for hours after the fix has been made on the server. There are three things a client can do:
  1. If you have a dedicated mirror, have the mirror update itself with the upstream mirrors.
  2. On client systems you may need to do a yum clean all in order to remove the bad sql in case yum thinks it is still good to cache from.
  3. You can skip yum on updates with:
    
    yum --disablerepo=epel update

Notes:

This will be filled out later as more information and future steps are taken.
  1. Mirrormanager did not have anything to do with this. It's job is to check that mirrors match the master site and in this case the master site was borked so it happily told people to go to mirrors which matched that.
  2. The problem showed up at 04:00 UTC because most servers are set up using GMT/UTC as their clock. At 04:00 the cron.daily starts up and many sites use a daily yum update which broke and mailed them.

2018-05-09

Looking for old game source Conquer (FOUND)

Early this morning (or late last night.. ) while trying to rescue some computers which decided to die during reboot.. I got hit by a memory of computer labs in the late 1980's when I first went to college. While many of us would play Nethack and hang out on MUD's, the big draw was playing a turn based game called Conquer. The point of the game was that you would be in some sort of fantasy based world and you were the King of that country. Your job was to grow your country be it vampires, orcs, elves, humans and destroy all competition. I believe it was based off the classic Empire games but I am not sure. I expect it was not 'Free or open source' and I know it was full of really bad coding as the main point of the game for the CS people was to find a new overflow to make your country win.

Years later I met someone who had helped write a similar game called Dominion which is also very similar.  The game has been kept up and is under a GPL license which is probably why it is still findable.

And while waiting for ansible to rebuild various virtual machines which had existed on the now kaput servers, I went diving to find source code. My 2 am searches didn't come up with any copies of the Conquer code, but I expect it is because various search engines expect me to want to look for clones of Command and Conquer versus "Conquer".   Looking for fantasy Empire like games brings up tons of clones of Ages of Empire. Even looking for Dominion brings me to many clones of the Dominion board game versus the actual source code. I did find that someone has made updated versions of Trade Wars and Taipan! which made me happy as those were ones I had played a lot when I was in High School in the mid 1980's.  I was even able to find some code for Xtank which was another diversion on the poor Sun Sparc SLC systems which did not have enough RAM or CPU to do the game justice.

I expect that the game source code is probably sitting somewhere easily findable or that the game was called Conquest or something similar and I am not remembering correctly after 30 years. I also expect that the code has no usable lessons in it.. it just seemed important at 4 am and 6 am this morning when I couldn't get back to sleep. Hopefully a blog post will put that little worry to bed. It was like "Who wrote the game?", "Where did it go?", "Why did I always lose?" Ok the last one was easy.. I am not good at strategy and I was playing the wrong game (aka I was trying to play by the inside game rules versus the social "hey look at what we should join and do" and the "oh wow did you see what this does if you send 4 ^k to the game?")

One thing I do remember from these games was that there was no idea about client and server in them. Everything was written into one application (which was were most of the security problems came up). These days, the game would probably be written as a webserver application which would send HTML5 to the clients which the players would manipulate to send back 'moves'. This would then be checked by the server to make sure they were legitimate and confirm when the turn ran. Conflicts like army A moving into army B space would then get dealt with at the turn cycle and the next turn would begin.

[Quickly found by Alan Cox (thanks Alan) at https://ftp.gnome.org/mirror/archive/ftp.sunet.se/pub/usenet/ftp.uu.net/comp.sources.games/. It was originally called Conquest and then renamed Conquer. The game was written by Ed Barlow. Adding in Ed Barlow now gives the source code engines enough to find other versions. Looking at the source code license https://github.com/quixadhal/conquer/blob/master/header.h this is not open source in any way. There was a discussion on Debian Legal about the license being changed to GPL ?!? but without a formal release from the original author.. I am leery of saying it was done.]

2018-05-07

Computers and honesty

In today's Quote Investigator, they investigate a quote from Isaac Asimov which shows up from time to time.

Part of the inhumanity of the computer is that once it is competently programmed and working smoothly—it is completely honest.

I remembered this quote from my time in computer science (CS) courses in the late 1980's as something that non CS people would bring up, and CS people would laugh and laugh about.

Isaac Asimov was a 20th century author who wrote about almost everything at one point or another. While best known as a science fiction writer, he wrote many popular science books which were most of the literature I read from the local South Carolina library when I was in elementary school. Some of his most famous science fiction were around robots who were programmed to work according to three laws with the stories revolving around how the laws broke down in some way or another in what people expected them to do.

Modern computers are incredibly complex systems, and the first thing you learn in any complex analysis is that they are never going to run smoothly enough that complete honesty will happen. The system may think it is being honest, but at some point, somewhere 1+1 =1 happened (or 1+1=0 or 1+1=3) . In fact a large amount of electrical engineering in chip design, BIOS writing, and other low level sorcery is cleaning that up. Maybe the chip redoes the calculation a couple of times, maybe there are just low bits you never use to clean up that electrical signal loss, or some other trick of the trade. However at some point, those incantations will fail and the little bit of Maxwell's demon leaks out somewhere.

Even when you have a smooth enough working system, the fact that the programmer is never competent enough is a completely different problem. We all have our off days, we all don't see all the ways a piece of code might get used where it does mostly what you want.. but not all. [Or we do see it, pop open the cheap Scotch and try to desperately to forget. ]

Even not counting the complex system problems, there are many times when we find either our programming or our computer will prove both Asimov and Charles Babbage wrong:

On two occasions I have been asked, — "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" In one case a member of the Upper, and in the other a member of the Lower, House put this question. I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.

I expect that the programmers for Alexa, Cortana, Siri, and Gooda (google voice needs a name and I am horrible with names) are having to deal with this daily. A person may ask a question which literally means one thing, but has a different contextual common meaning. Giving the literal answer would not be lying, but the person asking feels the computer did. Giving the contextually correct answer has the computer lying, but the person getting the 'honest' answer they expected. [And somewhere in England, they have hooked up Babbages spinning casket to a electrical motor to produce free electricity.]

In the end, I wonder if all this means we need to re-evaluate the 'humanity' of modern computers (or at least the definition of 'humanity' as posited by Asimov nearly 40 years ago 😉.)

From the quote investigator page:

  1. 1981, Change! Seventy-One Glimpses of the Future by Isaac Asimov, Chapter 6: Who Needs Money?, Start Page 15, Quote Page 17, Houghton Mifflin Company, Boston, Massachusetts. (Verified with scans) 

2018-04-29

Cygwin: FAST_CWD problem

Cygwin is a useful set of tools which make working on Windows systems closer to working on a UNIX/Linux system. These tools used to be bundled with various other software which may run into problems if they do not update to newer versions. What normally is seen is that a person will try to compile a program and get:

find_fast_cwd: WARNING: Couldn't compute FAST_CWD pointer. Please report
this problem to the public mailing list cygwin@cygwin.com

If you are seeing this error, you have a very very old version of cygwin and should contact the software vendor who you got the software from. They need to rebase their version of Cygwin to a more current version in order to get both security updates and other fixes you need for Cygwin to work with the version of Windows you have.

This error has been showing up a lot on the Cygwin mailing lists from software associated with:
  • Some particular Eclipse plugin
  • Some circuit diagram software that wasn't named.
Please see the Cygwin FAQ entry for more information.

2018-04-23

Fedora Infrastructure Meeting Change to Thursdays 1400 UTC

For several years, the Fedora Infrastructure meeting has been held every Thursday at 1800 UTC. This would be lunchtime to morning for the the U.S. members,early evening for our European members, and late night for people in India. [I think it is a different day in China and Japan.]  In order to see if attendance was problematic because of the time, the Fedora Infrastructure leader Kevin Fenzi recently asked for a new meeting time. The results came back in and the meetings will be moved to 1400 UTC on Thursdays. In order to see what the time is in your time zone you can use the date command


[smooge@smoogen-laptop ~]$ date -d "Apr 26 14:00:00 UTC 2018"
Thu Apr 26 10:00:00 EDT 2018
Fedora Infrastructure tries to set its meetings against UTC versus any local daylight savings/unsavings times since many regions do not have them or start/end them at different times.

2018-04-20

Fedora Infrastructure Hackathon (day 1-5)

From 2018-04-09 to 2018-04-13, most of the Fedora Infrastructure team was in Fredericksburg, Virginia working face to face on various issues. I already covered my trip on the 08th to Fredericksburg so this is a followup blog to cover what happened. Each day had a pretty predictable cycle to it starting with waking up around 06:30 and getting a shower and breakfast downstairs. The hotel was near Quantico which is used by various government agencies for training so I got to see a lot of people every morning suiting up. Around 07:30, various coworkers from different time zones would start stumbling in.. some because it was way too late to get up in a day, and others because it was way too early. Everyone would get a cup or two of coffee in them and Paul would show up to herd us towards the cars. [Sometimes it took two or three attempts as someone would straggle away to try and get another 40 winks.] Then we would drive over to the University of Mary Washington extension campus.

I wanted to give an enormous shout-out to the staff there, people checked in on us every day to see if we had any problems, and worked around our weird schedules. They also helped get our firewall items fixed as the campus is fairly locked down for guests but made it so our area had an exception for the week so that ssh would work. 

Once we got situated in the room, we would work through the days problems we would try to tackle. Monday was documentation, Tuesday was reassigning tasks, Wednesday was working through AWX rollouts, Thursday was trying to get bodhi working with openshift. Friday we headed home via our different methods. [I took a train though not this one.. this was the CSX shipping train which came through before ours.]

Most of the work I did during this was working on tasks to get people enabled and working. I helped get Dusty and Sinny into a group which could log into various atomic staging systems to see what logs and builds were doing. I worked with Paul Frields on writing service level expectations that I will be putting into more detail in next weeks blogs. I talked with Brian Stinson and Jim Perrin on CentOS/EPEL build tools and plans.


Finally I worked with Matthew Miller on statistics needs and will be looking to work with CoreOS people someday in the future on how to update how we collect data. As with any face to face meetings, it was mostly about getting personal feedback on what is working and what isn't. I have a better idea on things needed in the future for the Fedora Apprentice group (my blogs for 2 weeks from now), Service Level Expectations, and EPEL (3 to 4 weeks from now).