Commit Graph

143 Commits

Author SHA1 Message Date
Simone Gotti
d1b4ab4296 *: use zerolog for logging
Replace zap with zerolog.

zerolog has a cleaner interface and can be easily configured with custom
error chain printing using a new error handling library that will be
implemented in another PR.
2022-02-28 10:40:55 +01:00
Simone Gotti
c1da3ab566 *: Improve error handling
* Create an APIError that should only be used for api returned errors.
  It'll wrap an error and can have different Kinds and optional code and
  message.
* The http handlers will use the first APIError available in the
  error chain and generate a json response body containing the code and
  the user message. The wrapped error is internal and is not sent in the
  response.
  If no api error is available in the chain a generic internal
  server error will be returned.
* Add a RemoteError type that will be created from remote services calls
  (runservice, configstore). It's similar to the APIError but a
  different type to not propagate to the caller response and it'll not
  contain any wrapped error.
* Gateway: when we call a remote service, by default, we'll create a
  APIError using the RemoteError Kind (omitting the code and the
  message that usually must not be propagated).
  This is done for all the remote service calls as a starting point, in
  future, if this default behavior is not the right one for a specific
  remote service call, a new api error with a different kind and/or
  augmented with the calling service error codes and user messages could
  be created.
* datamanager: Use a dedicated ErrNotExist (and converting objectstorage
  ErrNotExist).
2022-02-25 16:11:19 +01:00
Simone Gotti
87f182a0c9 *: use errors.Is/errors.As to handle wrapped error checking
Enable golangci-lint errorlint linter to check proper use of errors.Is
and error.As instead of direct comparison or error type casting.
2022-02-24 17:07:29 +01:00
Simone Gotti
0544586ade *: call ListenAndServeTLS when tls is enabled in config 2021-03-19 10:53:16 +01:00
Carlo Mandelli
45eb092871 Remove WaitingApproval for stopped tasks 2021-02-03 19:45:27 +01:00
Simone Gotti
0e2b01a586 runservice: correctly handle skipped tasks in fetcher
skip fetching of tasks with status skipped, not only tasks marked as skip.
This avoid many wrong an noisy logs of type "executor task with id taskid
doesn't exist. This shouldn't happen. Skipping fetching"
2020-03-02 10:40:59 +01:00
Simone Gotti
eb180da914
Merge pull request #225 from sgotti/runservice_fix_handling_of_wrong_executortask_status
runservice: fix handling of wrong executortask status
2020-03-02 10:26:32 +01:00
Simone Gotti
19611c18e7 runservice: fix handling of wrong executortask status
updateRunTaskStatus should also accept transitions from not started to a
finished state like "success", "failed", "stopped" since we could miss some
status updates from the executor for many reasons.
2020-02-28 13:02:35 +01:00
Simone Gotti
3ac018e6e5 runservice: use all scheduled tasks in scheduleRun
rename activeExecutorTasks to scheduledExecutorTasks and don't filter out
finished tasks.
In some logic we need all the scheduled tasks and not only the not finished
ones.
2020-02-28 09:56:12 +01:00
Simone Gotti
145c87b4c0 runservice: minimize scheduling of tasks that will be queued by the executor
Since the executor only periodically updates its state we could end up
scheduling much more tasks than the executor ActiveTasksLimit. This will happen
in the case of many parallel tasks that can all start at the same time.

To avoid this also considere the executor tasks saved in etcd that represent
the real view of scheduled tasks.
2020-02-27 11:03:03 +01:00
Simone Gotti
5dd9e587fe runservice: mark not running tasks as skipped when run marked to stop
Currently when a run is marked to stop we are going to stop currently running
tasks and then their childs will be marked as skipped.

But tasks not depending on a stopped task (root task or childs with a finished
parent) that are just waiting for an executor slot, will be scheduled when
there will be a free slot also if the run is marked to stop (and then the
scheduler will stop them after some seconds).

This patch will mark all not started tasks as skipped when the run is marked to
stop.
2020-02-26 16:45:09 +01:00
Simone Gotti
2de91549a3 tests: improve services logging
During tests provide a zaptest Logger so all services output will be redirected
to golang testing logger.

When multiple services of the same type are provided add a unique name field to
distinguish them.
2020-01-15 12:30:34 +01:00
Carlo Mandelli
3e47bc601a gateway/runservice: add api to delete step logs 2019-11-18 10:34:56 +01:00
Simone Gotti
7e8f7155d7 runservice: improve errors in logsHandler
return errNotExist in readTaskLogs when the run,task or step doesn't exist.
2019-11-15 15:50:58 +01:00
Simone Gotti
f7d0950ca1 *: write and flush header on log handlers
Explicitly write and flush the headers in the various services LogHandlers.

Currently the 200 response and the other headers will be automatically written
by the golang http implementation only when we send something in the body. But if
there's nothing to send (no logs yet written) the client will never receive the
headers and cannot know if the request was successful.
2019-11-14 10:52:45 +01:00
Simone Gotti
32a08ec5c8
Merge pull request #179 from sgotti/runservice_logshandler_improve_errors
runservice: improve errors in logsHandler
2019-11-14 09:56:24 +01:00
Simone Gotti
66e182a55d runservice: improve errors in logsHandler
* return errNotExist in readTaskLogs when the executor task doesn't exist: so
the client will receive a 404 instead of a 500 (since a generic error will be
mapped to a 500).
* Wrap the errNotExist returned by readTaskLogs with a new ErrNotExits reporting
"log doesn't exist"
2019-11-13 15:50:20 +01:00
Simone Gotti
07cde065c8 runservice: use etcd mutex TryLock on fetching
When fetching avoid concurrent fetches from multiple runservices using an etcd
mutex TryLock.
2019-11-13 11:53:54 +01:00
Simone Gotti
5ab9f7c970 *: use etcd mutex TryLock
etcd PR 11104 (https://github.com/etcd-io/etcd/pull/11104) implemented mutex
TryLock. Since it's only available in etcd master just copy relevant code and
add a TODO to remove it when updating the etcd client to a version implementing
TryLock.

Use TryLock everywhere where it'll be useful.
2019-11-12 22:27:17 +01:00
Simone Gotti
d679254516 readdb: improve HandleEvents goroutine exiting
Rename errCh to doneCh (error is not needed) and always send to it when one of
the HandleEvents functions exits (not only on error).

This will ensure that all the goroutines will be stopped also if one of them
returns without an error.
2019-11-12 11:03:21 +01:00
Simone Gotti
72f279c4c3 *: improve error handling
* objectstorage: remove `types` package and move `ErrNotExist` in base package
* objectstorage: Implement .Is and add helper `IsErrNotExist` for `ErrNotExist`
* util: Rename `ErrNotFound` to `ErrNotExist`
* util: Add `IsErr*` helpers and use them in place of `errors.Is()`
* datamanager: add `ErrNoDataStatus` to report when there's not data status in ost
* runservice/common: remove `ErrNotExist` and use errors in util package
2019-11-11 12:17:35 +01:00
Simone Gotti
5af07d0852 objectstorage: use a single package
remove all the subpackages and just use a single package
2019-11-08 16:31:48 +01:00
Simone Gotti
9c0eb3d7ef datamanager: refactor ReadWal
make ReadWal directly return a *WalHeader
2019-11-08 13:24:43 +01:00
Simone Gotti
e18794764e go.mod: update dependencies
Update all the updatable dependencies
2019-10-29 09:31:38 +01:00
Simone Gotti
39829f1ec4 runservice: save step exitstatus in run.
For every step save also the command exit status.
2019-09-17 14:35:37 +02:00
Simone Gotti
12b02143b2 runservice: don't save executor task data in etcd
Reorganize ExecutorTask to better distinguish between the task Spec and
the Status.

Split the task Spec in a sub part called ExecutorTaskSpecData that contains
tasks data that don't have to be saved in etcd because it contains data that can
be very big and can be generated starting from the run and the runconfig.
2019-09-17 12:03:43 +02:00
Simone Gotti
7d375e4c4e runservice: add run workspace cleaner
Removes old workspace files (defaults to 7 days)
2019-09-17 09:40:23 +02:00
Simone Gotti
bfc42ef60e runservice: fix get tasks to run
Currently `advanceRunTasks` isn't deterministic and doesn't calculate the final
state in one call. So could happen that `getTasksToRun` will select a task to be
executed since its parent are finished (marked as skipped in advanceRunTasks)
but the task isn't marked to be skipped (because advanceRunTasks has calculated
this task before its parents).

Currently fix this doing the same task selection logic done in `advanceRunTasks`
and add a TODO to make `advanceRunTasks` be deterministic by processing tasks by
their level (from level 0).
2019-08-30 15:59:25 +02:00
Simone Gotti
c1ff28ef9f *: export clients and related types
Export clients and related packages.

The main rule is to not import internal packages from exported packages.

The gateway client and related types are totally decoupled from the gateway
service (not shared types between the client and the server).

Instead the configstore and the runservice client currently share many types
that are now exported (decoupling them will require that a lot of types must be
duplicated and the need of functions to convert between them, this will be done
in future when the APIs will be declared as stable).
2019-08-02 12:02:01 +02:00
Simone Gotti
d0c5621201 util: remove time.go
The same function is already provided by pointer.go
2019-08-01 14:14:56 +02:00
Simone Gotti
b81ad4cd8c runservice: fix/improve executor delete logic
* Don't fail tasks inside the delete executor action, just delete the executor
from etcd

* The scheduler, when detecting a task without a related executor will mark the
task as failed and correctly set end time of the task and its steps.
2019-07-29 12:06:15 +02:00
Simone Gotti
f812597410 runservice: maintenance/export/import
Implement runservice maintenance mode and export/import.

When runservice is set in maintenance mode it'll start only the maintenance and
export/import handlers.

Setting maintenance mode will set a key in etcd so all the runservice instances
will detect it and enter in maintenance mode. This is done asyncronously so it
could take some time (future improvements will add some api to show all the
runservice states)

Export is always available and will export the datamanager contents. Currently
only datamanager contents are exported (no logs and workspace archives).

Import is available only during maintenance, given a datamanager export will
import it and reset etcd to this import state.
2019-07-29 11:52:30 +02:00
Simone Gotti
ceafc2ef98 readdb: close and open readdb on Run 2019-07-25 17:59:54 +02:00
Simone Gotti
6f3798e8fe *: use sleep timer in loops
So we'll react instantly to a context cancel instead of waiting on time.Sleep
returning.
2019-07-25 16:22:54 +02:00
Simone Gotti
b8c2b4020a db: use context functions
Use the go sql context functions (ExecContext, QueryContext etc...)

The context is saved inside Tx so the library users should only pass it one time
to the db.Do function.
2019-07-25 14:49:53 +02:00
Simone Gotti
77ee8d9e7d
Merge pull request #58 from sgotti/readdb_fix_deadlock
readdb: fix deadlock in Run method
2019-07-23 15:47:05 +02:00
Simone Gotti
85876310af readdb: fix deadlock in Run method
In runservice readdb Run method we could end with a deadlock if two of the
goroutines that call HandleEvents.* try to write to the errCh at the same
time before the errCh is read. If this happens one of the two will be blocked on
writing to the channel but the read won't happen since it'll blocked by
wg.Wait().

Fix this doing:
* use a buffered channel large as the number of executed goroutines.
* create a new errCh at every loop (so we'll ignore later errors after the first
one)

Note: we could also use a non blocking send to avoid this situation but we
should also start the wg.Wait before the goroutines or earlier errors could be
lost causing another kind of hang.
2019-07-23 14:56:26 +02:00
Simone Gotti
75d68b2b52 runservice: stop run also if result is not set 2019-07-23 12:11:01 +02:00
Simone Gotti
3a963ef95f readdb: error if there's no wal in etcd 2019-07-18 16:44:28 +02:00
Simone Gotti
3f64bda0cc readdb: save walSequence provided by data file 2019-07-18 16:44:28 +02:00
Simone Gotti
16820e9033 readdb: insert current wal sequence after checking wal status 2019-07-18 16:44:28 +02:00
Simone Gotti
86d822a247 service: handle cors config and use it only on gateway
* Don't make cors enabled on all (*) by default.
* Handle related web.allowedOrigins options
* Only the gateway api should be called by a browser so setup the cors handler
only on it
2019-07-13 23:15:00 +02:00
Simone Gotti
940264e413 runservice: add lock around compatchangegroups
just to avoid concurrency errors when multiple instances are running
2019-07-10 10:20:35 +02:00
Simone Gotti
11a2ff48d6 runservice: delete executor task early
currently we are deleting the executor tasks only when all the run tasks
log/archives were fetched. But it'll better to remove a single executor task
when the task fetching is finished.

This could also fix possible issues on k8s since we are scheduling tasks but the
k8s scheduler may not schedule them if there aren't enough resources causing a
scheduling deadlock since we won't remove finished pods because their related
tasks are not removed and k8s cannot start new pods since it has no resources.
2019-07-08 16:03:14 +02:00
Simone Gotti
45a460ebc0 runservice: handle run not existing
Check if the response from the readdb is null and return an http not found error
2019-07-08 09:30:15 +02:00
Simone Gotti
04ef20464d runservice: add getruns filter by result 2019-07-05 10:32:51 +02:00
Simone Gotti
929a6fb654 service/*: log error only if nil 2019-07-04 15:50:37 +02:00
Simone Gotti
5db23410d0 runservice: set base path for datamanager
Don't put datamanager base dirs inside the root of the ost but use a base path.

Let's do it now before releasing since this is a breaking change that requires
moving the ost data to the new path
2019-07-03 17:18:21 +02:00
Simone Gotti
d989fe9639 datamanager: always handle basepath
Currently we aren't setting a basepath and it wasn't always correctly handled.
Fix missing basepath handling and improve tests to also use a non empty
basepath.
2019-07-03 17:03:37 +02:00
Simone Gotti
87a472aaaf runservice: add CacheGroup field to runconfig
The cache group fields defines under which cache group the run cache data will
belong. This is needed/useful for some next changes:

* Make cache correctly work for user direct runs. Since the user direct runs all
belong to the same run group (the user id) all the use direct runs will share the
same caches. To distinguish between the different caches we need to use something
in addition to the user id (the local repo uuid generated by the direct run
start command)
* Share the cache between multiple projects
2019-07-03 15:16:37 +02:00