agola

a/agola

Author	SHA1	Message	Date
Simone Gotti	d2b09d854f	: use new errors handling library Implement a new error handling library based on pkg/errors. It provides stack saving on wrapping and exports some function to add stack saving also to external errors. It also implements custom zerolog error formatting without adding too much verbosity by just printing the chain error file:line without a full stack trace of every error. Add a --detailed-errors options to print error with they full chain * Wrap all error returns. Use errors.WithStack to wrap without adding a new messsage and error.Wrap[f] to add a message. * Add golangci-lint wrapcheck to check that external packages errors are wrapped. This won't check that internal packages error are wrapped. But we want also to ensure this case so we'll have to find something else to check also these.	2022-02-28 12:49:13 +01:00
Simone Gotti	d1b4ab4296	*: use zerolog for logging Replace zap with zerolog. zerolog has a cleaner interface and can be easily configured with custom error chain printing using a new error handling library that will be implemented in another PR.	2022-02-28 10:40:55 +01:00
Simone Gotti	c1da3ab566	: Improve error handling Create an APIError that should only be used for api returned errors. It'll wrap an error and can have different Kinds and optional code and message. * The http handlers will use the first APIError available in the error chain and generate a json response body containing the code and the user message. The wrapped error is internal and is not sent in the response. If no api error is available in the chain a generic internal server error will be returned. * Add a RemoteError type that will be created from remote services calls (runservice, configstore). It's similar to the APIError but a different type to not propagate to the caller response and it'll not contain any wrapped error. * Gateway: when we call a remote service, by default, we'll create a APIError using the RemoteError Kind (omitting the code and the message that usually must not be propagated). This is done for all the remote service calls as a starting point, in future, if this default behavior is not the right one for a specific remote service call, a new api error with a different kind and/or augmented with the calling service error codes and user messages could be created. * datamanager: Use a dedicated ErrNotExist (and converting objectstorage ErrNotExist).	2022-02-25 16:11:19 +01:00
Simone Gotti	87f182a0c9	*: use errors.Is/errors.As to handle wrapped error checking Enable golangci-lint errorlint linter to check proper use of errors.Is and error.As instead of direct comparison or error type casting.	2022-02-24 17:07:29 +01:00
Simone Gotti	0544586ade	*: call ListenAndServeTLS when tls is enabled in config	2021-03-19 10:53:16 +01:00
Carlo Mandelli	45eb092871	Remove WaitingApproval for stopped tasks	2021-02-03 19:45:27 +01:00
Simone Gotti	0e2b01a586	runservice: correctly handle skipped tasks in fetcher skip fetching of tasks with status skipped, not only tasks marked as skip. This avoid many wrong an noisy logs of type "executor task with id taskid doesn't exist. This shouldn't happen. Skipping fetching"	2020-03-02 10:40:59 +01:00
Simone Gotti	eb180da914	Merge pull request #225 from sgotti/runservice_fix_handling_of_wrong_executortask_status runservice: fix handling of wrong executortask status	2020-03-02 10:26:32 +01:00
Simone Gotti	19611c18e7	runservice: fix handling of wrong executortask status updateRunTaskStatus should also accept transitions from not started to a finished state like "success", "failed", "stopped" since we could miss some status updates from the executor for many reasons.	2020-02-28 13:02:35 +01:00
Simone Gotti	3ac018e6e5	runservice: use all scheduled tasks in scheduleRun rename activeExecutorTasks to scheduledExecutorTasks and don't filter out finished tasks. In some logic we need all the scheduled tasks and not only the not finished ones.	2020-02-28 09:56:12 +01:00
Simone Gotti	145c87b4c0	runservice: minimize scheduling of tasks that will be queued by the executor Since the executor only periodically updates its state we could end up scheduling much more tasks than the executor ActiveTasksLimit. This will happen in the case of many parallel tasks that can all start at the same time. To avoid this also considere the executor tasks saved in etcd that represent the real view of scheduled tasks.	2020-02-27 11:03:03 +01:00
Simone Gotti	5dd9e587fe	runservice: mark not running tasks as skipped when run marked to stop Currently when a run is marked to stop we are going to stop currently running tasks and then their childs will be marked as skipped. But tasks not depending on a stopped task (root task or childs with a finished parent) that are just waiting for an executor slot, will be scheduled when there will be a free slot also if the run is marked to stop (and then the scheduler will stop them after some seconds). This patch will mark all not started tasks as skipped when the run is marked to stop.	2020-02-26 16:45:09 +01:00
Simone Gotti	2de91549a3	tests: improve services logging During tests provide a zaptest Logger so all services output will be redirected to golang testing logger. When multiple services of the same type are provided add a unique name field to distinguish them.	2020-01-15 12:30:34 +01:00
Carlo Mandelli	3e47bc601a	gateway/runservice: add api to delete step logs	2019-11-18 10:34:56 +01:00
Simone Gotti	7e8f7155d7	runservice: improve errors in logsHandler return errNotExist in readTaskLogs when the run,task or step doesn't exist.	2019-11-15 15:50:58 +01:00
Simone Gotti	f7d0950ca1	*: write and flush header on log handlers Explicitly write and flush the headers in the various services LogHandlers. Currently the 200 response and the other headers will be automatically written by the golang http implementation only when we send something in the body. But if there's nothing to send (no logs yet written) the client will never receive the headers and cannot know if the request was successful.	2019-11-14 10:52:45 +01:00
Simone Gotti	32a08ec5c8	Merge pull request #179 from sgotti/runservice_logshandler_improve_errors runservice: improve errors in logsHandler	2019-11-14 09:56:24 +01:00
Simone Gotti	66e182a55d	runservice: improve errors in logsHandler * return errNotExist in readTaskLogs when the executor task doesn't exist: so the client will receive a 404 instead of a 500 (since a generic error will be mapped to a 500). * Wrap the errNotExist returned by readTaskLogs with a new ErrNotExits reporting "log doesn't exist"	2019-11-13 15:50:20 +01:00
Simone Gotti	07cde065c8	runservice: use etcd mutex TryLock on fetching When fetching avoid concurrent fetches from multiple runservices using an etcd mutex TryLock.	2019-11-13 11:53:54 +01:00
Simone Gotti	5ab9f7c970	*: use etcd mutex TryLock etcd PR 11104 (https://github.com/etcd-io/etcd/pull/11104) implemented mutex TryLock. Since it's only available in etcd master just copy relevant code and add a TODO to remove it when updating the etcd client to a version implementing TryLock. Use TryLock everywhere where it'll be useful.	2019-11-12 22:27:17 +01:00
Simone Gotti	d679254516	readdb: improve HandleEvents goroutine exiting Rename errCh to doneCh (error is not needed) and always send to it when one of the HandleEvents functions exits (not only on error). This will ensure that all the goroutines will be stopped also if one of them returns without an error.	2019-11-12 11:03:21 +01:00
Simone Gotti	72f279c4c3	: improve error handling objectstorage: remove `types` package and move `ErrNotExist` in base package * objectstorage: Implement .Is and add helper `IsErrNotExist` for `ErrNotExist` * util: Rename `ErrNotFound` to `ErrNotExist` * util: Add `IsErr` helpers and use them in place of `errors.Is()` datamanager: add `ErrNoDataStatus` to report when there's not data status in ost * runservice/common: remove `ErrNotExist` and use errors in util package	2019-11-11 12:17:35 +01:00
Simone Gotti	5af07d0852	objectstorage: use a single package remove all the subpackages and just use a single package	2019-11-08 16:31:48 +01:00
Simone Gotti	9c0eb3d7ef	datamanager: refactor ReadWal make ReadWal directly return a *WalHeader	2019-11-08 13:24:43 +01:00
Simone Gotti	e18794764e	go.mod: update dependencies Update all the updatable dependencies	2019-10-29 09:31:38 +01:00
Simone Gotti	39829f1ec4	runservice: save step exitstatus in run. For every step save also the command exit status.	2019-09-17 14:35:37 +02:00
Simone Gotti	12b02143b2	runservice: don't save executor task data in etcd Reorganize ExecutorTask to better distinguish between the task Spec and the Status. Split the task Spec in a sub part called ExecutorTaskSpecData that contains tasks data that don't have to be saved in etcd because it contains data that can be very big and can be generated starting from the run and the runconfig.	2019-09-17 12:03:43 +02:00
Simone Gotti	7d375e4c4e	runservice: add run workspace cleaner Removes old workspace files (defaults to 7 days)	2019-09-17 09:40:23 +02:00
Simone Gotti	bfc42ef60e	runservice: fix get tasks to run Currently `advanceRunTasks` isn't deterministic and doesn't calculate the final state in one call. So could happen that `getTasksToRun` will select a task to be executed since its parent are finished (marked as skipped in advanceRunTasks) but the task isn't marked to be skipped (because advanceRunTasks has calculated this task before its parents). Currently fix this doing the same task selection logic done in `advanceRunTasks` and add a TODO to make `advanceRunTasks` be deterministic by processing tasks by their level (from level 0).	2019-08-30 15:59:25 +02:00
Simone Gotti	c1ff28ef9f	*: export clients and related types Export clients and related packages. The main rule is to not import internal packages from exported packages. The gateway client and related types are totally decoupled from the gateway service (not shared types between the client and the server). Instead the configstore and the runservice client currently share many types that are now exported (decoupling them will require that a lot of types must be duplicated and the need of functions to convert between them, this will be done in future when the APIs will be declared as stable).	2019-08-02 12:02:01 +02:00
Simone Gotti	d0c5621201	util: remove time.go The same function is already provided by pointer.go	2019-08-01 14:14:56 +02:00
Simone Gotti	b81ad4cd8c	runservice: fix/improve executor delete logic * Don't fail tasks inside the delete executor action, just delete the executor from etcd * The scheduler, when detecting a task without a related executor will mark the task as failed and correctly set end time of the task and its steps.	2019-07-29 12:06:15 +02:00
Simone Gotti	f812597410	runservice: maintenance/export/import Implement runservice maintenance mode and export/import. When runservice is set in maintenance mode it'll start only the maintenance and export/import handlers. Setting maintenance mode will set a key in etcd so all the runservice instances will detect it and enter in maintenance mode. This is done asyncronously so it could take some time (future improvements will add some api to show all the runservice states) Export is always available and will export the datamanager contents. Currently only datamanager contents are exported (no logs and workspace archives). Import is available only during maintenance, given a datamanager export will import it and reset etcd to this import state.	2019-07-29 11:52:30 +02:00
Simone Gotti	ceafc2ef98	readdb: close and open readdb on Run	2019-07-25 17:59:54 +02:00
Simone Gotti	6f3798e8fe	*: use sleep timer in loops So we'll react instantly to a context cancel instead of waiting on time.Sleep returning.	2019-07-25 16:22:54 +02:00
Simone Gotti	b8c2b4020a	db: use context functions Use the go sql context functions (ExecContext, QueryContext etc...) The context is saved inside Tx so the library users should only pass it one time to the db.Do function.	2019-07-25 14:49:53 +02:00
Simone Gotti	77ee8d9e7d	Merge pull request #58 from sgotti/readdb_fix_deadlock readdb: fix deadlock in Run method	2019-07-23 15:47:05 +02:00
Simone Gotti	85876310af	readdb: fix deadlock in Run method In runservice readdb Run method we could end with a deadlock if two of the goroutines that call HandleEvents.* try to write to the errCh at the same time before the errCh is read. If this happens one of the two will be blocked on writing to the channel but the read won't happen since it'll blocked by wg.Wait(). Fix this doing: * use a buffered channel large as the number of executed goroutines. * create a new errCh at every loop (so we'll ignore later errors after the first one) Note: we could also use a non blocking send to avoid this situation but we should also start the wg.Wait before the goroutines or earlier errors could be lost causing another kind of hang.	2019-07-23 14:56:26 +02:00
Simone Gotti	75d68b2b52	runservice: stop run also if result is not set	2019-07-23 12:11:01 +02:00
Simone Gotti	3a963ef95f	readdb: error if there's no wal in etcd	2019-07-18 16:44:28 +02:00
Simone Gotti	3f64bda0cc	readdb: save walSequence provided by data file	2019-07-18 16:44:28 +02:00
Simone Gotti	16820e9033	readdb: insert current wal sequence after checking wal status	2019-07-18 16:44:28 +02:00
Simone Gotti	86d822a247	service: handle cors config and use it only on gateway * Don't make cors enabled on all () by default. Handle related web.allowedOrigins options * Only the gateway api should be called by a browser so setup the cors handler only on it	2019-07-13 23:15:00 +02:00
Simone Gotti	940264e413	runservice: add lock around compatchangegroups just to avoid concurrency errors when multiple instances are running	2019-07-10 10:20:35 +02:00
Simone Gotti	11a2ff48d6	runservice: delete executor task early currently we are deleting the executor tasks only when all the run tasks log/archives were fetched. But it'll better to remove a single executor task when the task fetching is finished. This could also fix possible issues on k8s since we are scheduling tasks but the k8s scheduler may not schedule them if there aren't enough resources causing a scheduling deadlock since we won't remove finished pods because their related tasks are not removed and k8s cannot start new pods since it has no resources.	2019-07-08 16:03:14 +02:00
Simone Gotti	45a460ebc0	runservice: handle run not existing Check if the response from the readdb is null and return an http not found error	2019-07-08 09:30:15 +02:00
Simone Gotti	04ef20464d	runservice: add getruns filter by result	2019-07-05 10:32:51 +02:00
Simone Gotti	929a6fb654	service/*: log error only if nil	2019-07-04 15:50:37 +02:00
Simone Gotti	5db23410d0	runservice: set base path for datamanager Don't put datamanager base dirs inside the root of the ost but use a base path. Let's do it now before releasing since this is a breaking change that requires moving the ost data to the new path	2019-07-03 17:18:21 +02:00
Simone Gotti	d989fe9639	datamanager: always handle basepath Currently we aren't setting a basepath and it wasn't always correctly handled. Fix missing basepath handling and improve tests to also use a non empty basepath.	2019-07-03 17:03:37 +02:00

1 2 3

144 Commits