etcd PR 11104 (https://github.com/etcd-io/etcd/pull/11104) implemented mutex
TryLock. Since it's only available in etcd master just copy relevant code and
add a TODO to remove it when updating the etcd client to a version implementing
TryLock.
Use TryLock everywhere where it'll be useful.
Rename errCh to doneCh (error is not needed) and always send to it when one of
the HandleEvents functions exits (not only on error).
This will ensure that all the goroutines will be stopped also if one of them
returns without an error.
* objectstorage: remove `types` package and move `ErrNotExist` in base package
* objectstorage: Implement .Is and add helper `IsErrNotExist` for `ErrNotExist`
* util: Rename `ErrNotFound` to `ErrNotExist`
* util: Add `IsErr*` helpers and use them in place of `errors.Is()`
* datamanager: add `ErrNoDataStatus` to report when there's not data status in ost
* runservice/common: remove `ErrNotExist` and use errors in util package
When during a checkpoint more than one file is created the entries position in
the index is not right since it's not reset at every new index.
Fix it and add related tests.
When creating a datafile name make it start with the current data sequence. This
is useful in future to know which data sequence created a new data file.
Use limitreader only when size is specified (greater or equal to 0).
When size is unknown (less than 0) limitreader will immediately return EOF
instead of writing the whole data.
Looks like GetRepoRef doesn't correcly handle gitea repo refs response expecting
a single entry. Instead, at least with latest gitea version, the response is
always an array of refs. So use GetRepoRefs.
We were passing the source branch name as the Branch value in the webhook data.
This patch will just delete this assignment. If in future it's needed let's add
it with a different name to not cause confusion.
Only match the current ref type, ie: don't match a branch when the ref type is a
tag or pull request.
Ref is always matched because it's not related to a specific ref type.
* Add a generic container volume option that currently only support tmpfs. In
future it could be expanded to use of host volumes or other kind of volumes (if
supported by the underlying executor)
* Implement creation of tmpfs volumes in docker and k8s drivers.
Reorganize ExecutorTask to better distinguish between the task Spec and
the Status.
Split the task Spec in a sub part called ExecutorTaskSpecData that contains
tasks data that don't have to be saved in etcd because it contains data that can
be very big and can be generated starting from the run and the runconfig.
Currently, if no shell is defined in the task and in the step, the executor will
use an hardcoded default shell.
This will cause changed run behavior if we add an option to globally set the
agola default shell.
To avoid this set the task shell to the default shell inside the runconfig if
it's empty so future executions will always use this value.
Defining an option to override the user for a run step is too much fine grained
and, for consistency, will require to do the same also for the other steps
(clone, *workspace etc...).
Remove it since it's probably enough to define it at the task level.
On a git process error don't write the error message to the response body since
it'll break the git protocol and don't try to write the status header (since it's not
possible as it was automatically written by the go http server before writing
the body).
Currently we are using different `When` types for every service and convert
between them. This is a good approach if we want to keep isolated all the
services (like if we were using different repos for every service instead of the
current monorepo).
But currently, since When is identical between all the services, simplify this by
using a common When type.
Currently `advanceRunTasks` isn't deterministic and doesn't calculate the final
state in one call. So could happen that `getTasksToRun` will select a task to be
executed since its parent are finished (marked as skipped in advanceRunTasks)
but the task isn't marked to be skipped (because advanceRunTasks has calculated
this task before its parents).
Currently fix this doing the same task selection logic done in `advanceRunTasks`
and add a TODO to make `advanceRunTasks` be deterministic by processing tasks by
their level (from level 0).
In c1ff28ef9f we exported various types. Unfortunately the types used by cmd
variable create/update are the wrong types and marshalling fails. Fix it using
the right type. In future this internal types should be exported.
Since the current logic is to use the first available private ip address as the
advertized address we have to listen on wildcard since a different host provided
in web.ListenAddress will make the executor unreachable.
In future improve this to let the user to manually define the bind and the
advertized address (perhaps using go-sockaddr templates like done by consul) to
also support nat between the schedulers and the executors.
Allow setting the destination branch/tag/ref so users can test the run
conditions based on the branch/tag/ref.
To simulate a pull request an user can define a ref that matches one of these
regular expressions: `refs/pull/(\d+)/head`, `refs/merge-requests/(\d+)/head`
Older version of docker doesn't support the exec api Env and WorkingDir options.
Support these versions by doing the same we already do with the k8s driver: use
the `toolbox exec` command that will set the provided Env, change the cwd to the
WorkingDir and the exec the wanted command.
Export clients and related packages.
The main rule is to not import internal packages from exported packages.
The gateway client and related types are totally decoupled from the gateway
service (not shared types between the client and the server).
Instead the configstore and the runservice client currently share many types
that are now exported (decoupling them will require that a lot of types must be
duplicated and the need of functions to convert between them, this will be done
in future when the APIs will be declared as stable).
Since they're not types common to all the services but belongs to the
configstore.
Next step will be to make them local to the configstore and not directly used by
other services since these types are also stored.
* Don't fail tasks inside the delete executor action, just delete the executor
from etcd
* The scheduler, when detecting a task without a related executor will mark the
task as failed and correctly set end time of the task and its steps.
Implement runservice maintenance mode and export/import.
When runservice is set in maintenance mode it'll start only the maintenance and
export/import handlers.
Setting maintenance mode will set a key in etcd so all the runservice instances
will detect it and enter in maintenance mode. This is done asyncronously so it
could take some time (future improvements will add some api to show all the
runservice states)
Export is always available and will export the datamanager contents. Currently
only datamanager contents are exported (no logs and workspace archives).
Import is available only during maintenance, given a datamanager export will
import it and reset etcd to this import state.
There was a typo so we weren't setting the task endTime when the setup step
failed.
Also unify all logic to just use `et` (instead of a mix of `et` or `rt.et`)
Implement configstore maintenance mode and export/import.
When configstore is set in maintenance mode it'll start only the maintenance and
export/import handlers.
Setting maintenance mode will set a key in etcd so all the configstore instances
will detect it and enter in maintenance mode. This is done asyncronously so it
could take some time (future improvements will add some api to show all the
configstore states)
Export is always available and will export the datamanager contents.
Import is available only during maintenance, given a datamanager export will
import it and reset etcd to this import state.
Use the go sql context functions (ExecContext, QueryContext etc...)
The context is saved inside Tx so the library users should only pass it one time
to the db.Do function.
* export: exports the newest data checkpoint. It forces a checkpoint before
exporting (currently no wals are exported)
* import: cleans up etcd, creates a new datasnaphot from the provided import stream
and then initializes etcd. Currently no old data is removed from the object
storage but it's just ignored.
Since we are using the shared cache with the lock notify we won't receive
SQLITE_BUSY errors but we could receive SQLITE_LOCKED errors due to deadlocks or
locked tables on concurrent read and write transactions.
This patch catches this kind of errors and retries the tx until maxTxRetries.
In runservice readdb Run method we could end with a deadlock if two of the
goroutines that call HandleEvents.* try to write to the errCh at the same
time before the errCh is read. If this happens one of the two will be blocked on
writing to the channel but the read won't happen since it'll blocked by
wg.Wait().
Fix this doing:
* use a buffered channel large as the number of executed goroutines.
* create a new errCh at every loop (so we'll ignore later errors after the first
one)
Note: we could also use a non blocking send to avoid this situation but we
should also start the wg.Wait before the goroutines or earlier errors could be
lost causing another kind of hang.
When doing an initEtcd (new instance or etcd reset) create a new wal (that will
have a new sequence epoch) and do a checkpoint.
In this way:
* readdb will detect that an epoch change and do a full resync
* we always have a data file (also if empty) that provides the last checkpointed
wal. This information could be used by readdb to resync