This was already defined in the config but not implemented in the executor and
drivers.
All the containers defined in the runtime after the first one will be "service"
containers. They will share the same network namespace with the other containers
in the "pod" so they can communicate between themself on loopback
Logs and archives can be shared by multiple runs. So removing a run doesn't
imply that we could also remote the logs and archives since they could be
"referenced" by another run.
Store also the runids as specific objects along with the logs and archives so,
we'll remove them only when no runids objects exist.
Also if they are logically part of the runservice the names runserviceExecutor
and runserviceScheduler are long and quite confusing for an external user
Simplify them separating both the code parts and updating the names:
runserviceScheduler -> runservice
runserviceExecutor -> executor
* runservice: use generic task annotations instead of approval annotations
* runservice: add method to set task annotations
* gateway: when an user call the run task approval action, it will set in the
task annotations the approval users ids. The task won't be approved.
* scheduler: when the number of approvers meets the required minimum number
(currently 1) call the runservice to approve the task
In this way we could easily implement some approval features like requiring a
minimum number of approvers (saved in the task annotations) before marking the
run as approved in the runservice.
All the validation must be done inside the configstore since it's the source of
truth.
The gateway could also do some validation to avoid bad requests to the
configstore when needed or when the logic resides outside the configstore (like
project setup or user registration)
RemoteRepositoryConfigType defines how a remote repository is configured and
managed. Currently only "remotesource" is supported.
In future other config types (like a fully manual config) could be supported.
In future we may support specifying a remote source for a project without a
linked account and thus use a user provided token (saved in the project) or
other ways to define a remote repo (like standard git repos over ssh).
On s3 limit the max object size to 1GiB when the size is not provided (-1) or
the minio client will calculate a big part size since it tries to use the
maximum object size (5TiB) and will allocate a very big buffer in ram. Also
leave as commented out the previous hack that was firstly creating the file
locally to calculate the size and then put it (for future reference).
This options is a noop on s3 but on the posix implementation it becomes useful
when there isn't the need to have a persistent file, thus avoiding some fsync
calls.
changegroup names are based on names that will contain slashes and could be very
long. So calculate the sha256 sum of the starting name and use it as the
changegroup name.
run changegroup names are based on the run path but it will contain slashes and
could be very long. So calculate the sha256 sum of the path and use it as the
changegroup name.
* Rename to datamanager since it handles a complete "database" backed by an
objectstorage and etcd
* Don't write every single entry as a single file but group them in a single
file. In future improve this to split the data in multiple files of a max size.
`lts` was choosen to reflect a "long term storage" but currently it's just an
object storage implementation. So use this term and "ost" as its abbreviation
(to not clash with "os").
An executor can handle multiple archs (an executor that talks with a k8s cluster
with multi arch nodes). Don't use a label for archs but a custom executor
field.
* Add the concept of executor groups and siblings executors
* Add the concept of dynamic executor: an executor in an executor group that
doesn't need to be manually deleted from the scheduler since the other sibling
executors will take care of cleaning up its pods.
* Remove external labels visibility from pod.
* Add functions to return the sibling executors and the executor group
* Delete pods of disappeared sibling executors
take and change a copy of the current run so we'll change newRun and use curRun
status for logic decision. In this way result are reproducible or they will be
affected by the random run.Tasks map iteration order.
Handle config files with name `config.jsonnet`, `config.json` and
`config.yml` and take the first from the repository in this order
For a jsonnet file execute it and use the generated output as the config
The current config format was thought for future extensions for reusing runtimes
and job definitions adding some parameters.
After a lot of thoughts this looks like a complex approach: the final result
will be a sort of templating without a lot of powers.
Other approach like external templating should be an alternative but I really
don't think templating yaml is the way to go.
A much better approach will to just use jsonnet when we need to create matrix
runs and a lot of other use cases.
So just make the config a simple yaml/json. User can generate their config using
any preferred tool and in future we'll leverage jsonnet automated parsing and
provide a lot of jsonnet based examples for most use cases.
Main changes:
* Runs are now an array and not a map. The run name is in the Name field
* Tasks are now an array and not a map. The task name is in the Name field
* Use https://github.com/ghodss/yaml so we'll use json struct tags and unmarshall functions
Handle the task dependencies conditions:
* on_success (default if no conditions are specified)
* on_failure
* on_skipped
Not the runservice won't stop run but continue executing tasks that depends on a
parent also if this is failed
Additionally don't save a CloneURL field inside the project type.
If in future some git source doesn't provide a clone url we could just calculate
it from project.RepoPath or call the remote api to retrieve it.
* split functions in sub parts to ease future testing
* save run fewer times
* rework events logic to considere both run phase and result changes (emit an
event on every phase or result change)
Add the ability to define a run with a setuperror phase.
When the run setup has errors client could submit a run with a list of setup
errors. In such case the run will be created in the setuperror phase.
Setup errors are currently generated by the webhook receiver and the run service
when it checks the run config for possible issues.
* Use just RunConfig
* Use StaticEnvironment vs Environment in RunConfig to distinguish between env
that won't change at run recreation from env that could change at every
recreation
* The RunCreate api will just receive the runtasks instead of a runconfig (more
right)
* client: always parse the json error message field and return its contents
* Use ErrBadRequest and ErrNotFound in every handler and command
* Gateway: by default pass underlying service error (configstore, runservice) to
client keeping the status code and message. In future, if some errors must be
masked, we should change the specific parts that need special handling.
* Command: use ErrBadRequest
* Always return a json message also on error. For internal errors return a
generic "internal server error" message to not leak the real internal error to
clients
* Return 201 Created on resource creation
* Return 204 No Content on resource deletion and other action with no json
output
* Always return a json message also on error. For internal errors return a
generic "internal server error" message to not leak the real internal error to
clients
* Return 201 Created on resource creation
* Return 204 No Content on resource deletion and other action with no json
output
* Always return a json message also on error. For internal errors return a
generic "internal server error" message to not leak the real internal error to
clients
* Return 201 Created on resource creation
* Return 204 No Content on resource deletion and other action with no json
output
* Remove all the small index files on the lts
* Keep on s3 only a full index of all runs containing the runid, grouppath and phase
million of runs can take only some hundred of megabytes
* Periodically create a new dump of the index