Always walking the filesystem and searching for UUIDs slowed p2phttp
operations down significantly on a production server with more than a
handful of repositories. This caching strategy ensures that only the
first call is rather slow, and subsequent ones should be much faster.
This should better be implemented as a background job, but for now this
is a simple solution to the problem.
Fixes#53.
Reviewed-on: https://codeberg.org/forgejo-aneksajo/forgejo-aneksajo/pulls/54
Co-authored-by: Matthias Riße <m.risse@fz-juelich.de>
Co-committed-by: Matthias Riße <m.risse@fz-juelich.de>
This adds a new endpoint under `/git-annex-p2phttp` which acts as an
authenticating proxy to git-annex' p2phttp server. This makes it
possible to set `annex+<server-url>/git-annex-p2phttp` as
`remote.<name>.annexurl` and use git-annex fully over http(s) with the
normal credentials and access tokens provided by Forgejo.
Fixes#25.
Reviewed-on: https://codeberg.org/matrss/forgejo-aneksajo/pulls/42
Co-authored-by: Matthias Riße <m.risse@fz-juelich.de>
Co-committed-by: Matthias Riße <m.risse@fz-juelich.de>
The previous implementation both uploaded to the annex and pushed to the
git repository. This meant that the tests checking that uploads without
permission fail actually could pass when the git push failed but the
git-annex upload didn't. The tests didn't catch the situation where
unauthorized users could modify the annex.
Reviewed-on: https://codeberg.org/matrss/forgejo-aneksajo/pulls/46
Co-authored-by: Matthias Riße <m.risse@fz-juelich.de>
Co-committed-by: Matthias Riße <m.risse@fz-juelich.de>
Previously, an external renderer that matched on an annexed file would
only see its content streamed via `STDIN`, or a temporary file with a copy
of its content would be generated and passed-by-filepath (with
`IS_INPUT_FILE=true`). Whether that happens, is also subject to
`MAX_DISPLAY_FILE_SIZE` (which defaults to 8MB).
This was problematic, because annexed files tend to be large. Moreover,
if present, they already exist as write-protected files on the
file-system. Creating a copy is both expensive and serves no particular
purpose.
This commit changes how external renderers are called.
1) With `IS_INPUT_FILE=true`, the renderer is passed the true location
of an annex key, if present, and an empty path, if not.
2) The original, repository-relative path of the rendering target is
made available to the renderer via the `GITEA_RELATIVE_PATH`
environment variable.
To achieve a lean implementation, the `Blob` of the rendering target
is passed on to the `RenderContext` (because the implementation of
the annex-related functionality is centered on this dtype.
This change makes it less costly to increase `MAX_DISPLAY_FILE_SIZE`,
in order to make large, annexed files eligible for markup rendering,
because no content copies will be made any longer.
External renderers can now use the original file path, with the full
original filename, including extensions, for decision making. For
example, to detect particular compression formats based in a file name
extension, or to alter the rendering based on contextual information
encoded in the file path (e.g., a multi-file data structure with a
particular organization pattern).
Apart from the additional environment variable, there is no change to
the handling of renderers that take their input via `STDIN` (i.e.,
`IS_INPUT_FILE=false`).
Fixes#35.
Reviewed-on: https://codeberg.org/matrss/forgejo-aneksajo/pulls/36
Reviewed-by: matrss <matrss@noreply.codeberg.org>
Co-authored-by: Michael Hanke <michael.hanke@gmail.com>
Co-committed-by: Michael Hanke <michael.hanke@gmail.com>
This implements support for uploading files into the annex using the web
interface.
If a repository is a git-annex-enabled repository all files will be
added to it using git annex add. This means that the repository's
configuration for what to put into the annex (annex.largefiles in
gitattributes) will be respected.
Plain git repositories without git-annex will work as before, directly
uploading to git.
Fixes#5.
Reviewed-on: https://codeberg.org/matrss/forgejo-aneksajo/pulls/21
Co-authored-by: Matthias Riße <m.risse@fz-juelich.de>
Co-committed-by: Matthias Riße <m.risse@fz-juelich.de>
This updates the repo index/file view endpoints so annex files match the way
LFS files are rendered, making annexed files accessible via the web instead of
being black boxes only accessible by git clone.
This mostly just duplicates the existing LFS logic. It doesn't try to combine itself
with the existing logic, to make merging with upstream easier. If upstream ever
decides to accept, I would like to try to merge the redundant logic.
The one bit that doesn't directly copy LFS is my choice to hide annex-symlinks.
LFS files are always _pointer files_ and therefore always render with the "file"
icon and no special label, but annex files come in two flavours: symlinks or
pointer files. I've conflated both kinds to try to give a consistent experience.
The tests in here ensure the correct download link (/media, from the last PR)
renders in both the toolbar and, if a binary file (like most annexed files will be),
in the main pane, but it also adds quite a bit of code to make sure text files
that happen to be annexed are dug out and rendered inline like LFS files are.
This moves the `annexObjectPath()` helper out of the tests and into a
dedicated sub-package as `annex.ContentLocation()`, and expands it with
`.Pointer()` (which validates using `git annex examinekey`),
`.IsAnnexed()` and `.Content()` to make it a more useful module.
The tests retain their own wrapper version of `ContentLocation()`
because I tried to follow close to the API modules/lfs uses, which in
terms of abstract `git.Blob` and `git.TreeEntry` objects, not in terms
of `repoPath string`s which are more convenient for the tests.
This makes HTTP symmetric with SSH clone URLs.
This gives us the fancy feature of _anonymous_ downloads,
so people can access datasets without having to set up an
account or manage ssh keys.
Previously, to access "open access" data shared this way,
users would need to:
1. Create an account on gitea.example.com
2. Create ssh keys
3. Upload ssh keys (and make sure to find and upload the correct file)
4. `git clone git@gitea.example.com:user/dataset.git`
5. `cd dataset`
6. `git annex get`
This cuts that down to just the last three steps:
1. `git clone https://gitea.example.com/user/dataset.git`
2. `cd dataset`
3. `git annex get`
This is significantly simpler for downstream users, especially for those
unfamiliar with the command line.
Unfortunately there's no uploading. While git-annex supports uploading
over HTTP to S3 and some other special remotes, it seems to fail on a
_plain_ HTTP remote. See https://github.com/neuropoly/gitea/issues/7
and https://git-annex.branchable.com/forum/HTTP_uploads/#comment-ce28adc128fdefe4c4c49628174d9b92.
This is not a major loss since no one wants uploading to be anonymous anyway.
To support private repos, I had to hunt down and patch a secret extra security
corner that Gitea only applies to HTTP for some reason (services/auth/basic.go).
This was guided by https://git-annex.branchable.com/tips/setup_a_public_repository_on_a_web_site/
Fixes https://github.com/neuropoly/gitea/issues/3
Co-authored-by: Mathieu Guay-Paquet <mathieu.guaypaquet@polymtl.ca>
Fixes https://github.com/neuropoly/gitea/issues/11
Tests:
* `git annex init`
* `git annex copy --from origin`
* `git annex copy --to origin`
over:
* ssh
for:
* the owner
* a collaborator
* a read-only collaborator
* a stranger
in a
* public repo
* private repo
And then confirms:
* Deletion of the remote repo (to ensure lockdown isn't messing with us: https://git-annex.branchable.com/internals/lockdown/#comment-0cc5225dc5abe8eddeb843bfd2fdc382)
------
To support all this:
* Add util.FileCmp()
* Patch withKeyFile() so it can be nested in other copies of itself
-------
Many thanks to Mathieu for giving style tips and catching several bugs,
including a subtle one in util.filecmp() which neutered it.
Co-authored-by: Mathieu Guay-Paquet <mathieu.guay-paquet@polymtl.ca>
[git-annex](https://git-annex.branchable.com/) is a more complicated cousin to
git-lfs, storing large files in an optional-download side content. Unlike lfs,
it allows mixing and matching storage remotes, so the content remote(s) doesn't
need to be on the same server as the git remote, making it feasible to scatter
a collection across cloud storage, old harddrives, or anywhere else storage can
be scavenged. Since this can get complicated, fast, it has a content-tracking
database (`git annex whereis`) to help find everything later.
The use-case we imagine for including it in Gitea is just the simple case, where
we're primarily emulating git-lfs: each repo has its large content at the same URL.
Our motivation is so we can self-host https://www.datalad.org/ datasets, which
currently are only hostable by fragilely scrounging together cloud storage --
and having to manage all the credentials associated with all the pieces -- or at
https://openneuro.org which is fragile in its own ways.
Supporting git-annex also allows multiple Gitea instance to be annex remotes for
each other, mirroring the content or otherwise collaborating the split up the
hosting costs.
Enabling
--------
TODO
HTTP
----
TODO
Permission Checking
-------------------
This tweaks the API in routers/private/serv.go to expose the calling user's
computed permission, instead of just returning HTTP 403.
This doesn't fit in super well. It's the opposite from how the git-lfs support is
done, where there's a complete list of possible subcommands and their matching
permission levels, and then the API compares the requested with the actual level
and returns HTTP 403 if the check fails.
But it's necessary. The main git-annex verbs, 'git-annex-shell configlist' and
'git-annex-shell p2pstdio' are both either read-only or read-write operations,
depending on the state on disk on either end of the connection and what the user
asked it to ask for, with no way to know before git-annex examines the situation.
So tell the level via GIT_ANNEX_READONLY and trust it to handle itself.
In the older Gogs version, the permission was directly read in cmd/serv.go:
```
mode, err = db.UserAccessMode(user.ID, repo)
```
- 966e925cf3/internal/cmd/serv.go (L334)
but in Gitea permission enforcement has been centralized in the API layer.
(perhaps so the cmd layer can avoid making direct DB connections?)
Deletion
--------
git-annex has this "lockdown" feature where it tries
really quite very hard to prevent you deleting its
data, to the point that even an rm -rf won't do it:
each file in annex/objects/ is nested inside a
folder with read-only permissions.
The recommended workaround is to run chmod -R +w when
you're sure you actually want to delete a repo. See
https://git-annex.branchable.com/internals/lockdown
So we edit util.RemoveAll() to do just that, so now
it's `chmod -R +w && rm -rf` instead of just `rm -rf`.
- Add a `purpose` column, this allows the `forgejo_auth_token` table to
be used by other parts of Forgejo, while still enjoying the
no-compromise architecture.
- Remove the 'roll your own crypto' time limited code functions and
migrate them to the `forgejo_auth_token` table. This migration ensures
generated codes can only be used for their purpose and ensure they are
invalidated after their usage by deleting it from the database, this
also should help making auditing of the security code easier, as we're
no longer trying to stuff a lot of data into a HMAC construction.
-Helper functions are rewritten to ensure a safe-by-design approach to
these tokens.
- Add the `forgejo_auth_token` to dbconsistency doctor and add it to the
`deleteUser` function.
- TODO: Add cron job to delete expired authorization tokens.
- Unit and integration tests added.
(cherry picked from commit 1ce33aa38d)
v9: Removed migration - XORM can handle this case automatically without
migration. Add `DEFAULT 'long_term_authorization'`.
- _Simply_ add `^$` to regexp that didn't had it yet, this avoids any
content being allowed that simply had the allowed content as a
substring.
- Fix file-preview regex to have `$` instead of `*`.
(cherry picked from commit 7067cc7da4)
v9: added fix for ref-issue, this is already fixed in forgejo branch but
not backported as it was part of a feature.
- When a dependency is renamed, specified via `package="actual-name"` in
Cargo.toml, this should become the name of the depedency when the
package is retrieved from the registery by cargo and the old name should
be available in the `package` field.
- The reference implementation also does this: 490e66a9d6/src/controllers/krate/publish.rs (L702-L705)
- Resolves#5936
- Unit test added.
(cherry picked from commit bb93d3e6c8)
When a workflow has
on:
pull_request:
types:
- labeled
- unlabeled
The payload misses the label field describing the added or removed
label.
The unlabeled event type was also incorrectly mapped to the labeled
event type.
(cherry picked from commit 58e3c1fbdb)
- If `GetAffectedFiles` is called for a push with an empty oldCommitID,
then set the oldCommitID to the empty tree. This will effictively diff
all the changes included in the push, which is the expected behavior for
branches.
- Integration test added.
- Resolves#5683
- Port of gitea#31778 but implemented differently.
(cherry picked from commit f5e025917f)
- This is another regression from
5a0bc35799, where the default value was
changed to "alphabetically" because it relied on `ExploreDefaultSort`
providing a fallback value.
- Set the default value for `EXPLORE_DEFAULT_SORT` to `recentupdate`,
this was already the behavior explicitly for existing users of this setting
but with 5a0bc35799 it didn't provide a
explicit fallback to `recentupdate`. So opting for a 'easy' fix, that
doesn't add boilerplate code to those instances.
(cherry picked from commit f4be4e733c)
- In the case that [go-enry](https://github.com/go-enry/go-enry/)
returned langauge doesn't match a lexer name (Either because its not
available or because it doesn't match Chroma's name), a last effort
attempt is made to use Chroma's matching.
- go-enry already applies `strings.ToLower` onto the filename to avoid
being case-sensitive, add the same code for Chroma's matching. The code
being used doesn't rely on the filename being case senstive for correct
matching.
- Adds unit test.
- Resolves#752
(cherry picked from commit dcc442351d)
## Checklist
The [contributor guide](https://forgejo.org/docs/next/contributor/) contains information that will be helpful to first time contributors. There also are a few [conditions for merging Pull Requests in Forgejo repositories](https://codeberg.org/forgejo/governance/src/branch/main/PullRequestsAgreement.md). You are also welcome to join the [Forgejo development chatroom](https://matrix.to/#/#forgejo-development:matrix.org).
### Tests
- I added test coverage for Go changes...
- [ ] in their respective `*_test.go` for unit tests.
- [ ] in the `tests/integration` directory if it involves interactions with a live Forgejo server.
- I added test coverage for JavaScript changes...
- [ ] in `web_src/js/*.test.js` if it can be unit tested.
- [ ] in `tests/e2e/*.test.e2e.js` if it requires interactions with a live Forgejo server (see also the [developer guide for JavaScript testing](https://codeberg.org/forgejo/forgejo/src/branch/forgejo/tests/e2e/README.md#end-to-end-tests)).
### Documentation
- [ ] I created a pull request [to the documentation](https://codeberg.org/forgejo/docs) to explain to Forgejo users how to use this change.
- [x] I did not document these changes and I do not expect someone else to do it.
### Release notes
- [ ] I do not want this change to show in the release notes.
- [ ] I want the title to show in the release notes with a link to this pull request.
- [ ] I want the content of the `release-notes/<pull request number>.md` to be be used for the release notes instead of the title.
Co-authored-by: Ehsan Shirvanian <ehsan@duck.com>
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/5503
Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
Co-authored-by: ehshi <ehshi@noreply.codeberg.org>
Co-committed-by: ehshi <ehshi@noreply.codeberg.org>
(cherry picked from commit 82b1ab56de)
- [x] add architecture-specific removal support
- [x] Fix upload competition
- [x] Fix not checking input when downloading
docs: https://codeberg.org/forgejo/docs/pulls/874
### Release notes
- [ ] I do not want this change to show in the release notes.
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/5351
Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
Co-authored-by: Exploding Dragon <explodingfkl@gmail.com>
Co-committed-by: Exploding Dragon <explodingfkl@gmail.com>
(cherry picked from commit 89742c4913)
This PR addresses the missing `bin` field in Composer metadata, which
currently causes vendor-provided binaries to not be symlinked to
`vendor/bin` during installation.
In the current implementation, running `composer install` does not
publish the binaries, leading to issues where expected binaries are not
available.
By properly declaring the `bin` field, this PR ensures that binaries are
correctly symlinked upon installation, as described in the [Composer
documentation](https://getcomposer.org/doc/articles/vendor-binaries.md).
(cherry picked from commit d351a42494e71b5e2da63302c2f9b46c78e6dbde)
(cherry picked from commit 9d34731198)
Replace #32001.
To prevent the context cache from being misused for long-term work
(which would result in using invalid cache without awareness), the
context cache is designed to exist for a maximum of 10 seconds. This
leads to many false reports, especially in the case of slow SQL.
This PR increases it to 5 minutes to reduce false reports.
5 minutes is not a very safe value, as a lot of changes may have
occurred within that time frame. However, as far as I know, there has
not been a case of misuse of context cache discovered so far, so I think
5 minutes should be OK.
Please note that after this PR, if warning logs are found again, it
should get attention, at that time it can be almost 100% certain that it
is a misuse.
(cherry picked from commit a323a82ec4bde6ae39b97200439829bf67c0d31e)
(cherry picked from commit a5818470fe62677d8859b590b2d80b98fe23d098)
Signed-off-by: Gergely Nagy <forgejo@gergo.csillger.hu>
Conflicts:
- .github/ISSUE_TEMPLATE/bug-report.yaml
.github/ISSUE_TEMPLATE/config.yml
.github/ISSUE_TEMPLATE/feature-request.yaml
.github/ISSUE_TEMPLATE/ui.bug-report.yaml
templates/install.tmpl
All of these are Gitea-specific. Resolved the conflict by not
picking their change.
- Currently for the `nosql` module (which simply said provides a manager
for redis clients) returns the
[`redis.UniversalClient`](https://pkg.go.dev/github.com/redis/go-redis/v9#UniversalClient)
interface. The interfaces exposes all available commands.
- In generalm, dead code elimination should be able to take care of not
generating the machine code for methods that aren't being used. However
in this specific case, dead code elimination either is disabled or gives
up on trying because of exhaustive call stack the client by
`GetRedisClient` is used.
- Help the Go compiler by explicitly specifying which methods we use.
This reduces the binary size by ~400KB (397312 bytes). As Go no longer
generate machine code for commands that aren't being used.
- There's a **CAVEAT** with this, if a developer wants to use a new
method that isn't specified, they will have to know about this
hack (by following the definition of existing Redis methods) and add the
method definition from the Redis library to the `RedisClient` interface.
- This is in the spirit of #5090.
- Move to a fork of gitea.com/go-chi/cache,
code.forgejo.org/go-chi/cache. It removes unused code (a lot of
adapters, that can't be used by Forgejo) and unused dependencies (see
go.sum). Also updates existing dependencies.
8c64f1a362..main