Replace zap with zerolog.
zerolog has a cleaner interface and can be easily configured with custom
error chain printing using a new error handling library that will be
implemented in another PR.
* Create an APIError that should only be used for api returned errors.
It'll wrap an error and can have different Kinds and optional code and
message.
* The http handlers will use the first APIError available in the
error chain and generate a json response body containing the code and
the user message. The wrapped error is internal and is not sent in the
response.
If no api error is available in the chain a generic internal
server error will be returned.
* Add a RemoteError type that will be created from remote services calls
(runservice, configstore). It's similar to the APIError but a
different type to not propagate to the caller response and it'll not
contain any wrapped error.
* Gateway: when we call a remote service, by default, we'll create a
APIError using the RemoteError Kind (omitting the code and the
message that usually must not be propagated).
This is done for all the remote service calls as a starting point, in
future, if this default behavior is not the right one for a specific
remote service call, a new api error with a different kind and/or
augmented with the calling service error codes and user messages could
be created.
* datamanager: Use a dedicated ErrNotExist (and converting objectstorage
ErrNotExist).
Replace https://github.com/satori/go.uuid with maintained version at
https://github.com/gofrs/uuid
Since the new version uuid.NewV4 returns an error when failing to read
from the random source reader we use uuid.Must to panic on error since
it's considered an unrecoverable error.
In future, if needed, we could handle the error instead of panicking.
Handle `.agola/config.star` files in starlark config format.
To provide a context like done for jsonnet we require that the starlark agola
config file contains a main function that will receive a config context as a
dict.
We also had to implement our own json conversion from a starlark dict since go
starlark removed its own function.
skip fetching of tasks with status skipped, not only tasks marked as skip.
This avoid many wrong an noisy logs of type "executor task with id taskid
doesn't exist. This shouldn't happen. Skipping fetching"
updateRunTaskStatus should also accept transitions from not started to a
finished state like "success", "failed", "stopped" since we could miss some
status updates from the executor for many reasons.
rename activeExecutorTasks to scheduledExecutorTasks and don't filter out
finished tasks.
In some logic we need all the scheduled tasks and not only the not finished
ones.
taskUpdater will be called serially and won't block. It'll execute a goroutine
for executing the task and for sending the task state to the scheduler.
executeTask will just start task execution, all the logic of choosing if
starting a task is moved inside taskUpdater
In this way we avoid concurrency issues when handling the same executorTask
in parallel
Since the executor only periodically updates its state we could end up
scheduling much more tasks than the executor ActiveTasksLimit. This will happen
in the case of many parallel tasks that can all start at the same time.
To avoid this also considere the executor tasks saved in etcd that represent
the real view of scheduled tasks.
Currently when a run is marked to stop we are going to stop currently running
tasks and then their childs will be marked as skipped.
But tasks not depending on a stopped task (root task or childs with a finished
parent) that are just waiting for an executor slot, will be scheduled when
there will be a free slot also if the run is marked to stop (and then the
scheduler will stop them after some seconds).
This patch will mark all not started tasks as skipped when the run is marked to
stop.
During tests provide a zaptest Logger so all services output will be redirected
to golang testing logger.
When multiple services of the same type are provided add a unique name field to
distinguish them.
Instead of doing the current hack of copying the agola toolbox inside the host
tmp dir (always done but only needed when running the executor inside a docker
container) that has different issues (like tmp file removal done by
tmpwatch/systemd-tmpfiles), use a solution similar to the k8s driver: for every
pod create a volume containing the agola-toolbox and remove it at pod removal.
We could also use a single "global" volume but we should handle cases like
volume removal (i.e. a docker volume prune command). So for now just create a
dedicated per pod volume.
Explicitly write and flush the headers in the various services LogHandlers.
Currently the 200 response and the other headers will be automatically written
by the golang http implementation only when we send something in the body. But if
there's nothing to send (no logs yet written) the client will never receive the
headers and cannot know if the request was successful.
* return errNotExist in readTaskLogs when the executor task doesn't exist: so
the client will receive a 404 instead of a 500 (since a generic error will be
mapped to a 500).
* Wrap the errNotExist returned by readTaskLogs with a new ErrNotExits reporting
"log doesn't exist"
etcd PR 11104 (https://github.com/etcd-io/etcd/pull/11104) implemented mutex
TryLock. Since it's only available in etcd master just copy relevant code and
add a TODO to remove it when updating the etcd client to a version implementing
TryLock.
Use TryLock everywhere where it'll be useful.
Rename errCh to doneCh (error is not needed) and always send to it when one of
the HandleEvents functions exits (not only on error).
This will ensure that all the goroutines will be stopped also if one of them
returns without an error.
* objectstorage: remove `types` package and move `ErrNotExist` in base package
* objectstorage: Implement .Is and add helper `IsErrNotExist` for `ErrNotExist`
* util: Rename `ErrNotFound` to `ErrNotExist`
* util: Add `IsErr*` helpers and use them in place of `errors.Is()`
* datamanager: add `ErrNoDataStatus` to report when there's not data status in ost
* runservice/common: remove `ErrNotExist` and use errors in util package