This updates the repo index/file view endpoints so annex files match the way
LFS files are rendered, making annexed files accessible via the web instead of
being black boxes only accessible by git clone.
This mostly just duplicates the existing LFS logic. It doesn't try to combine itself
with the existing logic, to make merging with upstream easier. If upstream ever
decides to accept, I would like to try to merge the redundant logic.
The one bit that doesn't directly copy LFS is my choice to hide annex-symlinks.
LFS files are always _pointer files_ and therefore always render with the "file"
icon and no special label, but annex files come in two flavours: symlinks or
pointer files. I've conflated both kinds to try to give a consistent experience.
The tests in here ensure the correct download link (/media, from the last PR)
renders in both the toolbar and, if a binary file (like most annexed files will be),
in the main pane, but it also adds quite a bit of code to make sure text files
that happen to be annexed are dug out and rendered inline like LFS files are.
This moves the `annexObjectPath()` helper out of the tests and into a
dedicated sub-package as `annex.ContentLocation()`, and expands it with
`.Pointer()` (which validates using `git annex examinekey`),
`.IsAnnexed()` and `.Content()` to make it a more useful module.
The tests retain their own wrapper version of `ContentLocation()`
because I tried to follow close to the API modules/lfs uses, which in
terms of abstract `git.Blob` and `git.TreeEntry` objects, not in terms
of `repoPath string`s which are more convenient for the tests.
This makes HTTP symmetric with SSH clone URLs.
This gives us the fancy feature of _anonymous_ downloads,
so people can access datasets without having to set up an
account or manage ssh keys.
Previously, to access "open access" data shared this way,
users would need to:
1. Create an account on gitea.example.com
2. Create ssh keys
3. Upload ssh keys (and make sure to find and upload the correct file)
4. `git clone git@gitea.example.com:user/dataset.git`
5. `cd dataset`
6. `git annex get`
This cuts that down to just the last three steps:
1. `git clone https://gitea.example.com/user/dataset.git`
2. `cd dataset`
3. `git annex get`
This is significantly simpler for downstream users, especially for those
unfamiliar with the command line.
Unfortunately there's no uploading. While git-annex supports uploading
over HTTP to S3 and some other special remotes, it seems to fail on a
_plain_ HTTP remote. See https://github.com/neuropoly/gitea/issues/7
and https://git-annex.branchable.com/forum/HTTP_uploads/#comment-ce28adc128fdefe4c4c49628174d9b92.
This is not a major loss since no one wants uploading to be anonymous anyway.
To support private repos, I had to hunt down and patch a secret extra security
corner that Gitea only applies to HTTP for some reason (services/auth/basic.go).
This was guided by https://git-annex.branchable.com/tips/setup_a_public_repository_on_a_web_site/
Fixes https://github.com/neuropoly/gitea/issues/3
Co-authored-by: Mathieu Guay-Paquet <mathieu.guaypaquet@polymtl.ca>
Fixes https://github.com/neuropoly/gitea/issues/11
Tests:
* `git annex init`
* `git annex copy --from origin`
* `git annex copy --to origin`
over:
* ssh
for:
* the owner
* a collaborator
* a read-only collaborator
* a stranger
in a
* public repo
* private repo
And then confirms:
* Deletion of the remote repo (to ensure lockdown isn't messing with us: https://git-annex.branchable.com/internals/lockdown/#comment-0cc5225dc5abe8eddeb843bfd2fdc382)
------
To support all this:
* Add util.FileCmp()
* Patch withKeyFile() so it can be nested in other copies of itself
-------
Many thanks to Mathieu for giving style tips and catching several bugs,
including a subtle one in util.filecmp() which neutered it.
Co-authored-by: Mathieu Guay-Paquet <mathieu.guay-paquet@polymtl.ca>
[git-annex](https://git-annex.branchable.com/) is a more complicated cousin to
git-lfs, storing large files in an optional-download side content. Unlike lfs,
it allows mixing and matching storage remotes, so the content remote(s) doesn't
need to be on the same server as the git remote, making it feasible to scatter
a collection across cloud storage, old harddrives, or anywhere else storage can
be scavenged. Since this can get complicated, fast, it has a content-tracking
database (`git annex whereis`) to help find everything later.
The use-case we imagine for including it in Gitea is just the simple case, where
we're primarily emulating git-lfs: each repo has its large content at the same URL.
Our motivation is so we can self-host https://www.datalad.org/ datasets, which
currently are only hostable by fragilely scrounging together cloud storage --
and having to manage all the credentials associated with all the pieces -- or at
https://openneuro.org which is fragile in its own ways.
Supporting git-annex also allows multiple Gitea instance to be annex remotes for
each other, mirroring the content or otherwise collaborating the split up the
hosting costs.
Enabling
--------
TODO
HTTP
----
TODO
Permission Checking
-------------------
This tweaks the API in routers/private/serv.go to expose the calling user's
computed permission, instead of just returning HTTP 403.
This doesn't fit in super well. It's the opposite from how the git-lfs support is
done, where there's a complete list of possible subcommands and their matching
permission levels, and then the API compares the requested with the actual level
and returns HTTP 403 if the check fails.
But it's necessary. The main git-annex verbs, 'git-annex-shell configlist' and
'git-annex-shell p2pstdio' are both either read-only or read-write operations,
depending on the state on disk on either end of the connection and what the user
asked it to ask for, with no way to know before git-annex examines the situation.
So tell the level via GIT_ANNEX_READONLY and trust it to handle itself.
In the older Gogs version, the permission was directly read in cmd/serv.go:
```
mode, err = db.UserAccessMode(user.ID, repo)
```
- 966e925cf3/internal/cmd/serv.go (L334)
but in Gitea permission enforcement has been centralized in the API layer.
(perhaps so the cmd layer can avoid making direct DB connections?)
Deletion
--------
git-annex has this "lockdown" feature where it tries
really quite very hard to prevent you deleting its
data, to the point that even an rm -rf won't do it:
each file in annex/objects/ is nested inside a
folder with read-only permissions.
The recommended workaround is to run chmod -R +w when
you're sure you actually want to delete a repo. See
https://git-annex.branchable.com/internals/lockdown
So we edit util.RemoveAll() to do just that, so now
it's `chmod -R +w && rm -rf` instead of just `rm -rf`.
Backport of #27915Fixes#27819
We have support for two factor logins with the normal web login and with
basic auth. For basic auth the two factor check was implemented at three
different places and you need to know that this check is necessary. This
PR moves the check into the basic auth itself.
(cherry picked from commit 00705da102be929dfa41519b030be3bdd8c68472)
Backport #27752 by @earl-warren
- The label HTML contained a quote that wasn't being closed.
Refs: https://codeberg.org/forgejo/forgejo/pulls/1651
(cherry picked from commit e2bc2c9a1fff482c49dbeb3a51e4e1c698bf506c)
Co-authored-by: Earl Warren <109468362+earl-warren@users.noreply.github.com>
Co-authored-by: Gusted <postmaster@gusted.xyz>
(cherry picked from commit 63512cd15d14254beadc0fe105d4239708fb758d)
Backport #27655 by @wolfogre
When `webhook.PROXY_URL` has been set, the old code will check if the
proxy host is in `ALLOWED_HOST_LIST` or reject requests through the
proxy. It requires users to add the proxy host to `ALLOWED_HOST_LIST`.
However, it actually allows all requests to any port on the host, when
the proxy host is probably an internal address.
But things may be even worse. `ALLOWED_HOST_LIST` doesn't really work
when requests are sent to the allowed proxy, and the proxy could forward
them to any hosts.
This PR fixes it by:
- If the proxy has been set, always allow connectioins to the host and
port.
- Check `ALLOWED_HOST_LIST` before forwarding.
Co-authored-by: Jason Song <i@wolfogre.com>
(cherry picked from commit ca4418eff12d92a4da29bba4331451bf6cd0b620)
- The current architecture is inherently insecure, because you can
construct the 'secret' cookie value with values that are available in
the database. Thus provides zero protection when a database is
dumped/leaked.
- This patch implements a new architecture that's inspired from: [Paragonie Initiative](https://paragonie.com/blog/2015/04/secure-authentication-php-with-long-term-persistence#secure-remember-me-cookies).
- Integration testing is added to ensure the new mechanism works.
- Removes a setting, because it's not used anymore.
(cherry-pick from eff097448b1ebd2a280fcdd55d10b1f6081e9ccd)
Conflicts:
modules/context/context_cookie.go
trivial context conflicts
routers/web/web.go
ctx.GetSiteCookie(setting.CookieRememberName) moved from services/auth/middleware.go
Backport #27310 by @earl-warren
- Modify the deleted branch orphan check to check for the new table
instead.
- Regression from 6e19484f4d
- Resolves https://codeberg.org/forgejo/forgejo/issues/1522
(cherry picked from commit c1d888686fe445e4edecb9d835c5b3893b574b75)
Co-authored-by: Earl Warren <109468362+earl-warren@users.noreply.github.com>
Co-authored-by: Gusted <postmaster@gusted.xyz>
(cherry picked from commit 2138661dae2fbb88eba1a04d48faf27a2cebb934)
Backport of #27205Fixes#27174
`release` is a reserved keyword in MySql. I can't reproduce the issue on
my setup and we have a test for that code but it seems there can be
setups where it fails.
(cherry picked from commit eae6985b63e332e0f6e63b3922d1eae2f4ec1108)
backport #26991
Unfortunately, when a system setting hasn't been stored in the database,
it cannot be cached.
Meanwhile, this PR also uses context cache for push email avatar display
which should avoid to read user table via email address again and again.
According to my local test, this should reduce dashboard elapsed time
from 150ms -> 80ms .
(cherry picked from commit 9df573bddcc01bc8c3ab495629678ad577071831)
Backport #26999
If the AppURL(ROOT_URL) is an HTTPS URL, then the COOKIE_SECURE's
default value should be true.
And, if a user visits an "http" site with "https" AppURL, they won't be
able to login, and they should have been warned. The only problem is
that the "language" can't be set either in such case, while I think it
is not a serious problem, and it could be fixed easily if needed.
(cherry picked from commit b0a405c5fad2055976747a1c8b2c48dfe2750c9f)
- Backport of https://codeberg.org/forgejo/forgejo/pulls/1433
- Currently the repository description uses the same sanitizer as a
normal markdown document. This means that element such as heading and
images are allowed and can be abused.
- Create a minimal restricted sanitizer for the repository description,
which only allows what the postprocessor currently allows, which are
links and emojis.
- Added unit testing.
- Resolves https://codeberg.org/forgejo/forgejo/issues/1202
- Resolves https://codeberg.org/Codeberg/Community/issues/1122
(cherry picked from commit a8afa4cd181d7c31f73d6a8fae4c6a4b9622a425)
Backport #26683 by @yp05327
Related to: #8312#26491
In migration v109, we only added a new column `CanCreateOrgRepo` in Team
table, but not initial the value of it.
This may cause bug like #26491.
Co-authored-by: yp05327 <576951401@qq.com>
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
(cherry picked from commit c3d323fd859539678ac10988945cbf7af057f766)
Backport #26634 by @delvh
Previously, `err` was defined above, checked for `err == nil` and used
nowhere else.
Hence, the result of `convertMinioErr` would always be `nil`.
This leads to a NPE further down the line.
That is not intentional, it should convert the error of the most recent
operation, not one of its predecessors.
Found through
https://discord.com/channels/322538954119184384/322538954119184384/1143185780206993550.
Co-authored-by: delvh <dev.lh@web.de>
(cherry picked from commit a4b14638b50593a94121d2dfbca1d848fbec79b9)
Backport #26599 by @yardenshoham
We now include the branch filler in the response.
- Closes#26591
Signed-off-by: Yarden Shoham <git@yardenshoham.com>
Co-authored-by: Yarden Shoham <git@yardenshoham.com>
(cherry picked from commit fe78aabc673daf36655f0cca7e83cf2b057b8361)
- Backport of https://codeberg.org/forgejo/forgejo/pulls/1284
- Databases are one of the most important parts of Forgejo, every
interaction with Forgejo uses the database in one way or another.
Therefore, it is important to maintain the database and recognize when
Forgejo is not doing well with the database. Forgejo already has the
option to log *every* SQL query along with its execution time, but
monitoring becomes impractical for larger instances and takes up
unnecessary storage in the logs.
- Add a QoL enhancement that allows instance administrators to specify a
threshold value beyond which query execution time is logged as a warning
in the xorm logger. The default value is a conservative five seconds to
avoid this becoming a source of spam in the logs.
- The use case for this patch is that with an instance the size of Codeberg, monitoring SQL logs is not very fruitful and most of them are uninteresting. Recently, in the context of persistent deadlock issues (https://codeberg.org/forgejo/forgejo/issues/220), I have noticed that certain queries hold locks on tables like comment and issue for several seconds. This patch helps to identify which queries these are and when they happen.
- Added unit test.
Backport #26494 by @wxiaoguang
"ogg" is just a "container" format for audio and video.
Golang's `DetectContentType` only reports "application/ogg" for
potential ogg files.
Actually it could do more "guess" to see whether it is a audio file or a
video file.
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
(cherry picked from commit 4bdb8dd9cc028a071819afd8f79cf29b30ab4d30)
Backport #26441 by @lunny
This PR rewrites the function `getStorage` and make it more clear.
Include tests from #26435, thanks @earl-warren
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
Co-authored-by: Earl Warren <contact@earl-warren.org>
(cherry picked from commit f1c5d33d3e8ec3dc5988c130b6899cffe6b0e651)
Backport #26470 by @wxiaoguang
Close stdout correctly for "git blame", otherwise the failed "git blame"
would cause the request hanging forever.
And "os.Stderr" should never (seldom) be used as git command's stderr
(there seems some similar problems in code, they could be fixed later).
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
(cherry picked from commit fe1b11b639792b601ac5ba78c0378058032ec8a8)
Backport #26468 by @wxiaoguang
When users put the secrets into a file (GITEA__sec__KEY__FILE), the
newline sometimes is different to avoid (eg: echo/vim/...)
So the last newline could be removed when reading, it makes the users
easier to maintain the secret files.
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
(cherry picked from commit 80d7288ea45ea69060eb41709f188c09b33ca266)
Backport #26420 by @lunny
For some reason, the permission of the client_id and secret may cannot
create bucket, so now we will check whether bucket does exist first and
then try to create a bucket if it doesn't exist.
Try to fix#25984
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
Co-authored-by: silverwind <me@silverwind.io>
(cherry picked from commit 2d1202b32c1a975fe6cf7b139d1ab4ed292bcca1)
Backport #26412 by @nekrondev
The MinIO client isn't redirecting to the correct AWS endpoint if a
non-default data center is used.
In my use case I created an AWS bucket at `eu-central-1` region. Because
of the missing region initialization of the client the default
`us-east-1` API endpoint is used returning a `301 Moved Permanently`
response that's not handled properly by MinIO client. This in return
aborts using S3 storage on AWS as the `BucketExists()` call will fail
with the http moved error.
MinIO client trace shows the issue:
```text
---------START-HTTP---------
HEAD / HTTP/1.1
Host: xxxxxxxxxxx-prod-gitea-data.s3.dualstack.us-east-1.amazonaws.com
User-Agent: MinIO (windows; amd64) minio-go/v7.0.61
Authorization: AWS4-HMAC-SHA256 Credential=**REDACTED**/20230809/accesspoint.eu-central-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=**REDACTED**
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20230809T141143Z
HTTP/1.1 301 Moved Permanently
Connection: close
Content-Type: application/xml
Date: Wed, 09 Aug 2023 14:11:43 GMT
Server: AmazonS3
X-Amz-Bucket-Region: eu-central-1
X-Amz-Id-2: UK7wfeYi0HcTcytNvQ3wTAZ5ZP1mOSMnvRZ9Fz4xXzeNsS47NB/KfFx2unFxo3L7XckHpMNPPVo=
X-Amz-Request-Id: S1V2MJV8SZ11GEVN
---------END-HTTP---------
```
Co-authored-by: nekrondev <heiko@noordsee.de>
Co-authored-by: Heiko Besemann <heiko.besemann@qbeyond.de>
(cherry picked from commit 981ab48503997bf3b2cefcfe623fba49c4317450)
Backport #26392 by @wxiaoguang
Fix#26389
And complete an old TODO: `ctx.Params does un-escaping,..., which is
incorrect.`
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
(cherry picked from commit 2d1a7e1cd42b31a62ca627423d088339809238c8)
Backport #26325 by @wxiaoguang
Fix#26064
Some git commands should use parent context, otherwise it would exit too
early (by the default timeout, 10m), and the "cmd.Wait" waits till the
pipes are closed.
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
(cherry picked from commit 9451781ebea0d9945127c142136e5b6ef373141c)
Backport #26271 by @lunny
This PR will fix#26264, caused by #23911.
The package configuration derive is totally wrong when storage type is
local in that PR.
This PR fixed the inherit logic when storage type is local with some
unit tests.
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
(cherry picked from commit 88f6f7579cdaa557333bc86b3e45bf6458d889b6)
Backport #26290 by @Zettat123
Fixes#26270.
Co-Author: @wxiaoguang
Thanks @lunny for providing this solution
As
https://github.com/go-gitea/gitea/issues/26270#issuecomment-1661695151
said, at present we cannot get the names of changed files correctly when
the `OldCommitID` is `EmptySHA`. In this PR, the `GetCommitFilesChanged`
method is added and will be used to get the changed files by commit ID.
References:
- https://stackoverflow.com/a/424142
Co-authored-by: Zettat123 <zettat123@gmail.com>
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
(cherry picked from commit a57568bad7dac3c818d01a5b75fe66ce83bf140f)
Backport #26267 by @wxiaoguang
1. Fix the wrong document (add the missing `MODE=`)
2. Add a more friendly log message to tell users to add `MODE=` in their
config
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
(cherry picked from commit a7583370462ecd409485606c3d0379adb320cd05)