This updates the repo index/file view endpoints so annex files match the way
LFS files are rendered, making annexed files accessible via the web instead of
being black boxes only accessible by git clone.
This mostly just duplicates the existing LFS logic. It doesn't try to combine itself
with the existing logic, to make merging with upstream easier. If upstream ever
decides to accept, I would like to try to merge the redundant logic.
The one bit that doesn't directly copy LFS is my choice to hide annex-symlinks.
LFS files are always _pointer files_ and therefore always render with the "file"
icon and no special label, but annex files come in two flavours: symlinks or
pointer files. I've conflated both kinds to try to give a consistent experience.
The tests in here ensure the correct download link (/media, from the last PR)
renders in both the toolbar and, if a binary file (like most annexed files will be),
in the main pane, but it also adds quite a bit of code to make sure text files
that happen to be annexed are dug out and rendered inline like LFS files are.
Previously, Gitea's LFS support allowed direct-downloads of LFS content,
via http://$HOSTNAME:$PORT/$USER/$REPO/media/branch/$BRANCH/$FILE
Expand that grace to git-annex too. Now /media should provide the
relevant *content* from the .git/annex/objects/ folder.
This adds tests too. And expands the tests to try symlink-based annexing,
since /media implicitly supports both that and pointer-file-based annexing.
The git repository must be closed after using it. Without this change
some tests started to fail due to the lingering repository running into
a timeout.
This moves the `annexObjectPath()` helper out of the tests and into a
dedicated sub-package as `annex.ContentLocation()`, and expands it with
`.Pointer()` (which validates using `git annex examinekey`),
`.IsAnnexed()` and `.Content()` to make it a more useful module.
The tests retain their own wrapper version of `ContentLocation()`
because I tried to follow close to the API modules/lfs uses, which in
terms of abstract `git.Blob` and `git.TreeEntry` objects, not in terms
of `repoPath string`s which are more convenient for the tests.
Usage of `path` was replaced by `path/filepath` in upstream forgejo, and
it made sense to use that as well where `path` was previously used. The
`setHeaderCacheForever` function and the `sendFile` method had their
signature changed.
This makes HTTP symmetric with SSH clone URLs.
This gives us the fancy feature of _anonymous_ downloads,
so people can access datasets without having to set up an
account or manage ssh keys.
Previously, to access "open access" data shared this way,
users would need to:
1. Create an account on gitea.example.com
2. Create ssh keys
3. Upload ssh keys (and make sure to find and upload the correct file)
4. `git clone git@gitea.example.com:user/dataset.git`
5. `cd dataset`
6. `git annex get`
This cuts that down to just the last three steps:
1. `git clone https://gitea.example.com/user/dataset.git`
2. `cd dataset`
3. `git annex get`
This is significantly simpler for downstream users, especially for those
unfamiliar with the command line.
Unfortunately there's no uploading. While git-annex supports uploading
over HTTP to S3 and some other special remotes, it seems to fail on a
_plain_ HTTP remote. See https://github.com/neuropoly/gitea/issues/7
and https://git-annex.branchable.com/forum/HTTP_uploads/#comment-ce28adc128fdefe4c4c49628174d9b92.
This is not a major loss since no one wants uploading to be anonymous anyway.
To support private repos, I had to hunt down and patch a secret extra security
corner that Gitea only applies to HTTP for some reason (services/auth/basic.go).
This was guided by https://git-annex.branchable.com/tips/setup_a_public_repository_on_a_web_site/
Fixes https://github.com/neuropoly/gitea/issues/3
Co-authored-by: Mathieu Guay-Paquet <mathieu.guaypaquet@polymtl.ca>
Multiple tests that worked fine on v1.20.4-1 started to fail after the
rebase onto v1.20.5-1. These tests are:
- TestGitAnnexPermissions/Private/Owner/HTTP/Init
- TestGitAnnexPermissions/Private/Owner/HTTP/Download
- TestGitAnnexPermissions/Private/Writer/HTTP/Init
- TestGitAnnexPermissions/Private/Writer/HTTP/Download
- TestGitAnnexPermissions/Private/Reader/HTTP/Init
- TestGitAnnexPermissions/Private/Reader/HTTP/Download
What these tests have in common is that they all operate on a private
repository via http with authentication.
They broke at some point between v1.20.4-1 and v1.20.5-1, so I did a
bisect between these two points running the offending tests. This
brought me to the conclusion that
ee48c0d5ea introduced the issue.
The thing is, this commit does not change any code, it only changes the
test environment. Among other things that didn't look as suspicious, it
changes the container image from a bespoke test_env image based on
debian bullseye to a node image based on debian bookworm. Obviously,
this means that there are many version differences between the two.
The first one I looked at was git. The previous bullseye image used a
manually installed git version 2.40.0, while the bookworm image has
2.39.2 installed. Updating git in the new image did not fix the issue,
however.
The next thing I looked at was the git-annex version. Bullseye had
8.20210223 installed and worked, while bookworm used 10.20230126 when
the tests broke. So I tried my luck upgrading to a more recent version
via neurodebian (10.20240227-1~ndall+1). This still worked fine on
bullseye and now also works fine on bookworm.
I have no idea why this specific version of git-annex broke the tests,
but at least there was a commit to pinpoint this to, which isn't always
the case with docker images silently changing beneath you...
Below are the versions as they are reported by git and git-annex:
bullseye (works):
git version 2.30.2
git-annex version: 8.20210223
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite S3 WebDAV
dependency versions: aws-0.22 bloomfilter-2.0.1.0 cryptonite-0.26 DAV-1.3.4 feed-1.3.0.1 ghc-8.8.4 http-client-0.6.4.1 persistent-sqlite-2.10.6.2 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.1.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: linux x86_64
supported repository versions: 8
upgrade supported from repository versions: 0 1 2 3 4 5 6 7
bullseye + git-annex from neurodebian (works):
git version 2.30.2
git-annex version: 10.20240227-1~ndall+1
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV
dependency versions: aws-0.22.1 bloomfilter-2.0.1.0 cryptonite-0.29 DAV-1.3.4 feed-1.3.2.1 ghc-9.0.2 http-client-0.7.13.1 persistent-sqlite-2.13.1.0 torrent-10000.1.1 uuid-1.3.15 yesod-1.6.2.1
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: linux x86_64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
bookworm (fails):
git version 2.39.2
git-annex version: 10.20230126
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV
dependency versions: aws-0.22.1 bloomfilter-2.0.1.0 cryptonite-0.29 DAV-1.3.4 feed-1.3.2.1 ghc-9.0.2 http-client-0.7.13.1 persistent-sqlite-2.13.1.0 torrent-10000.1.1 uuid-1.3.15 yesod-1.6.2.1
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: linux x86_64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
bookworm + git-annex from neurodebian (works):
git version 2.39.2
git-annex version: 10.20240227-1~ndall+1
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV
dependency versions: aws-0.22.1 bloomfilter-2.0.1.0 cryptonite-0.29 DAV-1.3.4 feed-1.3.2.1 ghc-9.0.2 http-client-0.7.13.1 persistent-sqlite-2.13.1.0 torrent-10000.1.1 uuid-1.3.15 yesod-1.6.2.1
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: linux x86_64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
Fixes https://github.com/neuropoly/gitea/issues/11
Tests:
* `git annex init`
* `git annex copy --from origin`
* `git annex copy --to origin`
over:
* ssh
for:
* the owner
* a collaborator
* a read-only collaborator
* a stranger
in a
* public repo
* private repo
And then confirms:
* Deletion of the remote repo (to ensure lockdown isn't messing with us: https://git-annex.branchable.com/internals/lockdown/#comment-0cc5225dc5abe8eddeb843bfd2fdc382)
------
To support all this:
* Add util.FileCmp()
* Patch withKeyFile() so it can be nested in other copies of itself
-------
Many thanks to Mathieu for giving style tips and catching several bugs,
including a subtle one in util.filecmp() which neutered it.
Co-authored-by: Mathieu Guay-Paquet <mathieu.guay-paquet@polymtl.ca>
[git-annex](https://git-annex.branchable.com/) is a more complicated cousin to
git-lfs, storing large files in an optional-download side content. Unlike lfs,
it allows mixing and matching storage remotes, so the content remote(s) doesn't
need to be on the same server as the git remote, making it feasible to scatter
a collection across cloud storage, old harddrives, or anywhere else storage can
be scavenged. Since this can get complicated, fast, it has a content-tracking
database (`git annex whereis`) to help find everything later.
The use-case we imagine for including it in Gitea is just the simple case, where
we're primarily emulating git-lfs: each repo has its large content at the same URL.
Our motivation is so we can self-host https://www.datalad.org/ datasets, which
currently are only hostable by fragilely scrounging together cloud storage --
and having to manage all the credentials associated with all the pieces -- or at
https://openneuro.org which is fragile in its own ways.
Supporting git-annex also allows multiple Gitea instance to be annex remotes for
each other, mirroring the content or otherwise collaborating the split up the
hosting costs.
Enabling
--------
TODO
HTTP
----
TODO
Permission Checking
-------------------
This tweaks the API in routers/private/serv.go to expose the calling user's
computed permission, instead of just returning HTTP 403.
This doesn't fit in super well. It's the opposite from how the git-lfs support is
done, where there's a complete list of possible subcommands and their matching
permission levels, and then the API compares the requested with the actual level
and returns HTTP 403 if the check fails.
But it's necessary. The main git-annex verbs, 'git-annex-shell configlist' and
'git-annex-shell p2pstdio' are both either read-only or read-write operations,
depending on the state on disk on either end of the connection and what the user
asked it to ask for, with no way to know before git-annex examines the situation.
So tell the level via GIT_ANNEX_READONLY and trust it to handle itself.
In the older Gogs version, the permission was directly read in cmd/serv.go:
```
mode, err = db.UserAccessMode(user.ID, repo)
```
- 966e925cf3/internal/cmd/serv.go (L334)
but in Gitea permission enforcement has been centralized in the API layer.
(perhaps so the cmd layer can avoid making direct DB connections?)
Deletion
--------
git-annex has this "lockdown" feature where it tries
really quite very hard to prevent you deleting its
data, to the point that even an rm -rf won't do it:
each file in annex/objects/ is nested inside a
folder with read-only permissions.
The recommended workaround is to run chmod -R +w when
you're sure you actually want to delete a repo. See
https://git-annex.branchable.com/internals/lockdown
So we edit util.RemoveAll() to do just that, so now
it's `chmod -R +w && rm -rf` instead of just `rm -rf`.
- This unifies the security behavior of enrolling security keys with
enrolling TOTP as a 2FA method. When TOTP is enrolled, you cannot use
basic authorization (user:password) to make API request on behalf of the
user, this is now also the case when you enroll security keys.
- The usage of access tokens are the only method to make API requests on
behalf of the user when a 2FA method is enrolled for the user.
- Integration test added.
(cherry picked from commit e6bbecb02d)
- Add a `purpose` column, this allows the `forgejo_auth_token` table to
be used by other parts of Forgejo, while still enjoying the
no-compromise architecture.
- Remove the 'roll your own crypto' time limited code functions and
migrate them to the `forgejo_auth_token` table. This migration ensures
generated codes can only be used for their purpose and ensure they are
invalidated after their usage by deleting it from the database, this
also should help making auditing of the security code easier, as we're
no longer trying to stuff a lot of data into a HMAC construction.
-Helper functions are rewritten to ensure a safe-by-design approach to
these tokens.
- Add the `forgejo_auth_token` to dbconsistency doctor and add it to the
`deleteUser` function.
- TODO: Add cron job to delete expired authorization tokens.
- Unit and integration tests added.
(cherry picked from commit 1ce33aa38d)
v9: Removed migration - XORM can handle this case automatically without
migration. Add `DEFAULT 'long_term_authorization'`.
- If the incoming mail feature is enabled, tokens are being sent with
outgoing mails. These tokens contains information about what type of
action is allow with such token (such as replying to a certain issue
ID), to verify these tokens the code uses the HMAC-SHA256 construction.
- The output of the HMAC is truncated to 80 bits, because this is
recommended by RFC2104, but RFC2104 actually doesn't recommend this. It
recommends, if truncation should need to take place, it should use
max(80, hash_len/2) of the leftmost bits. For HMAC-SHA256 this works out
to 128 bits instead of the currently used 80 bits.
- Update to token version 2 and disallow any usage of token version 1,
token version 2 are generated with 128 bits of HMAC output.
- Add test to verify the deprecation of token version 1 and a general
MAC check test.
(cherry picked from commit 9508aa7713)
- _Simply_ add `^$` to regexp that didn't had it yet, this avoids any
content being allowed that simply had the allowed content as a
substring.
- Fix file-preview regex to have `$` instead of `*`.
(cherry picked from commit 7067cc7da4)
v9: added fix for ref-issue, this is already fixed in forgejo branch but
not backported as it was part of a feature.
- Consider private/limited users in the `AccessibleRepositoryCondition`
query, previously this only considered private/limited organization.
This limits the ability for anomynous users to do code search on
private/limited user's repository
- Unit test added.
(cherry picked from commit b70196653f)
- The RSS and atom feed for branches exposes details about the code, it
therefore should be guarded by the requirement that the doer has access
to the code of that repository.
- Added integration testing.
(cherry picked from commit 3e3ef76808)
- If a repository is forked to a private or limited user/organization,
the fork should not be visible in the list of forks depending on the
doer requesting the list of forks.
- Added integration testing for web and API route.
(cherry picked from commit 061abe6004)
- Ensure that the specified push mirror ID belongs to the requested
repository, otherwise it is possible to modify the intervals of the push
mirrors that do not belong to the requested repository.
- Integration test added.
(cherry picked from commit 786dfc7fb8)
When the CI vars.ROLE is forgejo-coding, it is assumed to be the
repository where collaborative coding happens,
i.e. https://codeberg.org/forgejo/forgejo
When the CI vars.ROLE is forgejo-testing, it is assumed that only codebase
testing is to be run and no other tests such as release build
integration, label constraints, backporting etc.
(cherry picked from commit 068558accd)
Conflicts:
.forgejo/workflows/testing.yml
was in .forgejo/workflows/e2e.yml
When the CI vars.ROLE is forgejo-coding, it is assumed to be the
repository where collaborative coding happens,
i.e. https://codeberg.org/forgejo/forgejo
When the CI vars.ROLE is forgejo-testing, it is assumed that only codebase
testing is to be run and no other tests such as release build
integration, label constraints, backporting etc.
(cherry picked from commit f82840f1ea)
Conflicts:
.forgejo/workflows/merge-requirements.yml
- When a dependency is renamed, specified via `package="actual-name"` in
Cargo.toml, this should become the name of the depedency when the
package is retrieved from the registery by cargo and the old name should
be available in the `package` field.
- The reference implementation also does this: 490e66a9d6/src/controllers/krate/publish.rs (L702-L705)
- Resolves#5936
- Unit test added.
(cherry picked from commit bb93d3e6c8)
Notify https://code.forgejo.org/forgejo/forgejo that a new release was
published by setting the trigger label to
https://code.forgejo.org/forgejo/forgejo/issues/5.
It is only ever useful when a stable release is published, the
experimental releases are not mirrored. But it is triggered in all
cases. This will waste a few mirror check daily, when experimental
releases are built. This is an improvement compared to the current
situation where mirrors are checked hourly:
* Instead of being checked 24 times per day it will be down to less
than 5
* The mirror happens immediately after the release is published
instead of waiting for the next run of the cron job.
If a mirror operation is in progress, as evidenced by the presence of
the trigger label on the issure, it means two releases are being
published. Wait up to 1h for the mirror to complete and remove the
trigger label.
(cherry picked from commit 7492330721)
When a new commit is pushed to an existing pull request, the update of
the commit status will happen asynchronously, via the git hook.
--- FAIL: TestPullRequestCommitStatus/synchronize (2.14s)
actions_trigger_test.go:331:
Error Trace: /workspace/forgejo/forgejo/tests/integration/actions_trigger_test.go:331
Error: Should be true
Test: TestPullRequestCommitStatus/synchronize
(cherry picked from commit 983aed4268)
This css class was used to display the "forgot password"-link right and above the password field.
cd75519a0b moves this link, so this class is now unused
Previously hitting tab in the username field set the focus to the "forgot password" link. Only on the next hit the password field was selected.
This is an issue for some password managers (keepassdx android keyboard) and not as nice for accessibility.
Now the forgot link is below the sign up link at the bottom of the page.
Using "tabindex" didn't work properly with the templating engine because many elements get assigned a tabindex of "0" by default disrupting the tab selection sequence.