In runservice readdb Run method we could end with a deadlock if two of the
goroutines that call HandleEvents.* try to write to the errCh at the same
time before the errCh is read. If this happens one of the two will be blocked on
writing to the channel but the read won't happen since it'll blocked by
wg.Wait().
Fix this doing:
* use a buffered channel large as the number of executed goroutines.
* create a new errCh at every loop (so we'll ignore later errors after the first
one)
Note: we could also use a non blocking send to avoid this situation but we
should also start the wg.Wait before the goroutines or earlier errors could be
lost causing another kind of hang.
When doing an initEtcd (new instance or etcd reset) create a new wal (that will
have a new sequence epoch) and do a checkpoint.
In this way:
* readdb will detect that an epoch change and do a full resync
* we always have a data file (also if empty) that provides the last checkpointed
wal. This information could be used by readdb to resync
* Don't make cors enabled on all (*) by default.
* Handle related web.allowedOrigins options
* Only the gateway api should be called by a browser so setup the cors handler
only on it
currently we are deleting the executor tasks only when all the run tasks
log/archives were fetched. But it'll better to remove a single executor task
when the task fetching is finished.
This could also fix possible issues on k8s since we are scheduling tasks but the
k8s scheduler may not schedule them if there aren't enough resources causing a
scheduling deadlock since we won't remove finished pods because their related
tasks are not removed and k8s cannot start new pods since it has no resources.
Before kubernetes 1.14 nodes were labeled with the "beta.kubernetes.io/arch"
label instead of the "kubernetes.io/arch".
Current k8s version (v1.15) labels nodes with both labels but it's
deprecated and will removed in future versions.
At driver start get the current k8s api version and choose the right label to
use as node selector based on it.
* Override the provided remotesource id with the current one (it could not be
provided or provided with a different id but the remotesource ref is the way to
get the current remote source).
* When changing remotesource name check that a remote source with the new name
does not already exist.
* Make the new fields RegistrationEnabled/LoginEnabled in types.RemoteSource
bool pointers (since they are new fields that don't exist in previously saved
remote sources) and default them to true if null when unmarshaling (or existing
remotesources will have registration and login disabled)
* Add options to cmd remotesource create/update to set the registration/login
disabled.
Don't put datamanager base dirs inside the root of the ost but use a base path.
Let's do it now before releasing since this is a breaking change that requires
moving the ost data to the new path
Don't put datamanager base dirs inside the root of the ost but use a base path.
Let's do it now before releasing since this is a breaking change that requires
moving the ost data to the new path
Currently we aren't setting a basepath and it wasn't always correctly handled.
Fix missing basepath handling and improve tests to also use a non empty
basepath.
Since the user direct runs all belong to the same run group (the user id) all
the user direct runs will share the same caches. To distinguish between the
different caches we need to use something in addition to the user id. In this
case we are usin the local repo uuid generated by the direct run start command.
The cache group fields defines under which cache group the run cache data will
belong. This is needed/useful for some next changes:
* Make cache correctly work for user direct runs. Since the user direct runs all
belong to the same run group (the user id) all the use direct runs will share the
same caches. To distinguish between the different caches we need to use something
in addition to the user id (the local repo uuid generated by the direct run
start command)
* Share the cache between multiple projects
* add a config option allowPrivilegedContainers
* fail task setup if privileged containers are requested but they aren't
allowed.
* report if privileged containers are allowed to the runservice
* don't remove the runningTask when executeTask finishes but just mark the
runningTask a not executing
* add a loop to periodically update executorTask status and remove the
runningTask if not executing and status update was successful
* remove runningTask when it disappears from the runservice
* Delete the command and it's rule in the Makefile
* Don't use it inside gitserver and remove related config option (also from
examples)
* Remove webhook parsing from agolagit gitsource
Add an API and related action to manually create a run from a git branch/tag/ref
with optional commitSHA.
Currently only branches and tags are supported (no pull requests).
If not commitSHA is provided the commit sha referenced by the provided branch/tag/ref is
used.
ErrInternal is an internal error that should be provided to the user (http api
will return a 500 with the error message)
It'll be used for any kind of error that are not auth or bad requests (like
errors to communicate to another service)
Introduce a runRefType that represent the ref type of the Run (branch/tag/PR)
Convert the webhook event type to the runRefType and use it to generate the run
group.
* Don't use a complex UnmarshalJSON for RunConfigTask and ExecutorTask but
introduce a Steps type as a slice of Step (where Step is an empty interface)
and declare an UnmarshalJSON method on the Step type.
split data files in multiple files of a max size (default 10Mib)
In this way every data snapshot will change only the datafiles that have some
changes instead of the whole single file.
Don't create an ErrFromRemote wrapping the returned error but
wrap the ErrFromRemote
Also use xerrors Is/As to get the underlying error to return to api clients
while maintaining context for logging
Just a raw replace of "github.com/pkg/errors".
Next steps will improve errors (like remote errors, api errors, not exist errors
etc...) to leverage its functionalities
rename the previous posix storage to posixflat and make it currently not user
selectable (since I'm not sure it's really worth using it).
The new posix storage uses the filesystem without any escaping so it's not a
real flat namespace.
This isn't a real issue since also minio is not a flat namespace and we are so
forced to use it like a hierarchycal filesystem.
In this way, when bundling the web interface inside the agola binaries, oauth2
redirect to the web interfaces will be served by the webbundle handler and
return the web SPA and not resolve directly the /oauth2/callback api call.
If the remote source username/password based login fails return the right error
code: 401 (unauthorized) on wrong username/password or a 500 on other errors.
Since the get tokens gitea api is used to do auth by username password we need
to know the api status code to detect if it's an unauthorized error (wrong
username/password) or another error.
Since the gitea client doesn't return the http response to inspect the status
code we'll use our own api call.
This was already defined in the config but not implemented in the executor and
drivers.
All the containers defined in the runtime after the first one will be "service"
containers. They will share the same network namespace with the other containers
in the "pod" so they can communicate between themself on loopback
Logs and archives can be shared by multiple runs. So removing a run doesn't
imply that we could also remote the logs and archives since they could be
"referenced" by another run.
Store also the runids as specific objects along with the logs and archives so,
we'll remove them only when no runids objects exist.
Also if they are logically part of the runservice the names runserviceExecutor
and runserviceScheduler are long and quite confusing for an external user
Simplify them separating both the code parts and updating the names:
runserviceScheduler -> runservice
runserviceExecutor -> executor
* runservice: use generic task annotations instead of approval annotations
* runservice: add method to set task annotations
* gateway: when an user call the run task approval action, it will set in the
task annotations the approval users ids. The task won't be approved.
* scheduler: when the number of approvers meets the required minimum number
(currently 1) call the runservice to approve the task
In this way we could easily implement some approval features like requiring a
minimum number of approvers (saved in the task annotations) before marking the
run as approved in the runservice.
All the validation must be done inside the configstore since it's the source of
truth.
The gateway could also do some validation to avoid bad requests to the
configstore when needed or when the logic resides outside the configstore (like
project setup or user registration)
RemoteRepositoryConfigType defines how a remote repository is configured and
managed. Currently only "remotesource" is supported.
In future other config types (like a fully manual config) could be supported.
In future we may support specifying a remote source for a project without a
linked account and thus use a user provided token (saved in the project) or
other ways to define a remote repo (like standard git repos over ssh).
On s3 limit the max object size to 1GiB when the size is not provided (-1) or
the minio client will calculate a big part size since it tries to use the
maximum object size (5TiB) and will allocate a very big buffer in ram. Also
leave as commented out the previous hack that was firstly creating the file
locally to calculate the size and then put it (for future reference).