Implement a new error handling library based on pkg/errors. It provides
stack saving on wrapping and exports some function to add stack saving
also to external errors.
It also implements custom zerolog error formatting without adding too
much verbosity by just printing the chain error file:line without a full
stack trace of every error.
* Add a --detailed-errors options to print error with they full chain
* Wrap all error returns. Use errors.WithStack to wrap without adding a
new messsage and error.Wrap[f] to add a message.
* Add golangci-lint wrapcheck to check that external packages errors are
wrapped. This won't check that internal packages error are wrapped.
But we want also to ensure this case so we'll have to find something
else to check also these.
Replace zap with zerolog.
zerolog has a cleaner interface and can be easily configured with custom
error chain printing using a new error handling library that will be
implemented in another PR.
* Create an APIError that should only be used for api returned errors.
It'll wrap an error and can have different Kinds and optional code and
message.
* The http handlers will use the first APIError available in the
error chain and generate a json response body containing the code and
the user message. The wrapped error is internal and is not sent in the
response.
If no api error is available in the chain a generic internal
server error will be returned.
* Add a RemoteError type that will be created from remote services calls
(runservice, configstore). It's similar to the APIError but a
different type to not propagate to the caller response and it'll not
contain any wrapped error.
* Gateway: when we call a remote service, by default, we'll create a
APIError using the RemoteError Kind (omitting the code and the
message that usually must not be propagated).
This is done for all the remote service calls as a starting point, in
future, if this default behavior is not the right one for a specific
remote service call, a new api error with a different kind and/or
augmented with the calling service error codes and user messages could
be created.
* datamanager: Use a dedicated ErrNotExist (and converting objectstorage
ErrNotExist).
Replace https://github.com/satori/go.uuid with maintained version at
https://github.com/gofrs/uuid
Since the new version uuid.NewV4 returns an error when failing to read
from the random source reader we use uuid.Must to panic on error since
it's considered an unrecoverable error.
In future, if needed, we could handle the error instead of panicking.
During tests provide a zaptest Logger so all services output will be redirected
to golang testing logger.
When multiple services of the same type are provided add a unique name field to
distinguish them.
etcd PR 11104 (https://github.com/etcd-io/etcd/pull/11104) implemented mutex
TryLock. Since it's only available in etcd master just copy relevant code and
add a TODO to remove it when updating the etcd client to a version implementing
TryLock.
Use TryLock everywhere where it'll be useful.
* objectstorage: remove `types` package and move `ErrNotExist` in base package
* objectstorage: Implement .Is and add helper `IsErrNotExist` for `ErrNotExist`
* util: Rename `ErrNotFound` to `ErrNotExist`
* util: Add `IsErr*` helpers and use them in place of `errors.Is()`
* datamanager: add `ErrNoDataStatus` to report when there's not data status in ost
* runservice/common: remove `ErrNotExist` and use errors in util package
When during a checkpoint more than one file is created the entries position in
the index is not right since it's not reset at every new index.
Fix it and add related tests.
When creating a datafile name make it start with the current data sequence. This
is useful in future to know which data sequence created a new data file.
* export: exports the newest data checkpoint. It forces a checkpoint before
exporting (currently no wals are exported)
* import: cleans up etcd, creates a new datasnaphot from the provided import stream
and then initializes etcd. Currently no old data is removed from the object
storage but it's just ignored.
When doing an initEtcd (new instance or etcd reset) create a new wal (that will
have a new sequence epoch) and do a checkpoint.
In this way:
* readdb will detect that an epoch change and do a full resync
* we always have a data file (also if empty) that provides the last checkpointed
wal. This information could be used by readdb to resync
Currently we aren't setting a basepath and it wasn't always correctly handled.
Fix missing basepath handling and improve tests to also use a non empty
basepath.
split data files in multiple files of a max size (default 10Mib)
In this way every data snapshot will change only the datafiles that have some
changes instead of the whole single file.
Just a raw replace of "github.com/pkg/errors".
Next steps will improve errors (like remote errors, api errors, not exist errors
etc...) to leverage its functionalities
rename the previous posix storage to posixflat and make it currently not user
selectable (since I'm not sure it's really worth using it).
The new posix storage uses the filesystem without any escaping so it's not a
real flat namespace.
This isn't a real issue since also minio is not a flat namespace and we are so
forced to use it like a hierarchycal filesystem.
On s3 limit the max object size to 1GiB when the size is not provided (-1) or
the minio client will calculate a big part size since it tries to use the
maximum object size (5TiB) and will allocate a very big buffer in ram. Also
leave as commented out the previous hack that was firstly creating the file
locally to calculate the size and then put it (for future reference).
This options is a noop on s3 but on the posix implementation it becomes useful
when there isn't the need to have a persistent file, thus avoiding some fsync
calls.
* Rename to datamanager since it handles a complete "database" backed by an
objectstorage and etcd
* Don't write every single entry as a single file but group them in a single
file. In future improve this to split the data in multiple files of a max size.