Export clients and related packages.
The main rule is to not import internal packages from exported packages.
The gateway client and related types are totally decoupled from the gateway
service (not shared types between the client and the server).
Instead the configstore and the runservice client currently share many types
that are now exported (decoupling them will require that a lot of types must be
duplicated and the need of functions to convert between them, this will be done
in future when the APIs will be declared as stable).
Since they're not types common to all the services but belongs to the
configstore.
Next step will be to make them local to the configstore and not directly used by
other services since these types are also stored.
* Don't fail tasks inside the delete executor action, just delete the executor
from etcd
* The scheduler, when detecting a task without a related executor will mark the
task as failed and correctly set end time of the task and its steps.
Implement runservice maintenance mode and export/import.
When runservice is set in maintenance mode it'll start only the maintenance and
export/import handlers.
Setting maintenance mode will set a key in etcd so all the runservice instances
will detect it and enter in maintenance mode. This is done asyncronously so it
could take some time (future improvements will add some api to show all the
runservice states)
Export is always available and will export the datamanager contents. Currently
only datamanager contents are exported (no logs and workspace archives).
Import is available only during maintenance, given a datamanager export will
import it and reset etcd to this import state.
There was a typo so we weren't setting the task endTime when the setup step
failed.
Also unify all logic to just use `et` (instead of a mix of `et` or `rt.et`)
Implement configstore maintenance mode and export/import.
When configstore is set in maintenance mode it'll start only the maintenance and
export/import handlers.
Setting maintenance mode will set a key in etcd so all the configstore instances
will detect it and enter in maintenance mode. This is done asyncronously so it
could take some time (future improvements will add some api to show all the
configstore states)
Export is always available and will export the datamanager contents.
Import is available only during maintenance, given a datamanager export will
import it and reset etcd to this import state.
Use the go sql context functions (ExecContext, QueryContext etc...)
The context is saved inside Tx so the library users should only pass it one time
to the db.Do function.
* export: exports the newest data checkpoint. It forces a checkpoint before
exporting (currently no wals are exported)
* import: cleans up etcd, creates a new datasnaphot from the provided import stream
and then initializes etcd. Currently no old data is removed from the object
storage but it's just ignored.
Since we are using the shared cache with the lock notify we won't receive
SQLITE_BUSY errors but we could receive SQLITE_LOCKED errors due to deadlocks or
locked tables on concurrent read and write transactions.
This patch catches this kind of errors and retries the tx until maxTxRetries.
In runservice readdb Run method we could end with a deadlock if two of the
goroutines that call HandleEvents.* try to write to the errCh at the same
time before the errCh is read. If this happens one of the two will be blocked on
writing to the channel but the read won't happen since it'll blocked by
wg.Wait().
Fix this doing:
* use a buffered channel large as the number of executed goroutines.
* create a new errCh at every loop (so we'll ignore later errors after the first
one)
Note: we could also use a non blocking send to avoid this situation but we
should also start the wg.Wait before the goroutines or earlier errors could be
lost causing another kind of hang.
When doing an initEtcd (new instance or etcd reset) create a new wal (that will
have a new sequence epoch) and do a checkpoint.
In this way:
* readdb will detect that an epoch change and do a full resync
* we always have a data file (also if empty) that provides the last checkpointed
wal. This information could be used by readdb to resync
* Don't make cors enabled on all (*) by default.
* Handle related web.allowedOrigins options
* Only the gateway api should be called by a browser so setup the cors handler
only on it
currently we are deleting the executor tasks only when all the run tasks
log/archives were fetched. But it'll better to remove a single executor task
when the task fetching is finished.
This could also fix possible issues on k8s since we are scheduling tasks but the
k8s scheduler may not schedule them if there aren't enough resources causing a
scheduling deadlock since we won't remove finished pods because their related
tasks are not removed and k8s cannot start new pods since it has no resources.
Before kubernetes 1.14 nodes were labeled with the "beta.kubernetes.io/arch"
label instead of the "kubernetes.io/arch".
Current k8s version (v1.15) labels nodes with both labels but it's
deprecated and will removed in future versions.
At driver start get the current k8s api version and choose the right label to
use as node selector based on it.