Search Jobs

Use Search Jobs to search code at scale for large-scale organizations.

Search Jobs allows you to run search queries across your organization's codebase (all repositories, branches, and revisions) at scale. It enhances the existing Sourcegraph's search capabilities, enabling you to run searches without query timeouts or incomplete results.

With Search Jobs, you can start a search, let it run in the background, and then download the results from the Search Jobs UI when it's done. Site administrators can enable or disable the Search Jobs feature, making it accessible to all users on the Sourcegraph instance.

Using Search Jobs

To use Search Jobs, you need to:

Run a search query from your Sourcegraph instance
Click the result menu below the search bar to see if your query is supported by Search Jobs

If your query is valid, click Create a search job to initiate the search job
You will be redirected to the "Search Jobs UI" page at /search-jobs, where you can view all your created search jobs. If you're a site admin, you can also view search jobs from other users on the instance

Search results format

The downloaded results are formatted as JSON lines. The JSON objects have the same format as the (event-type) matches served by the Stream API.

Here is an example of a JSON lines results file for a search job that ran the query r:github.com/sourcegraph/sourcegraph handlePanic. The search returned 4 results in 2 files:

JSON
{"type":"content","path":"cmd/symbols/shared/main.go","repositoryID":399,"repository":"github.com/sourcegraph/sourcegraph","branches":["HEAD"],"commit":"9b47c6430cc3c9cb16695b4fa0a1a277cbbdb327","hunks":null,"chunkMatches":[{"content":"func handlePanic(logger log.Logger, next http.Handler) http.Handler {","contentStart":{"offset":4526,"line":129,"column":0},"ranges":[{"start":{"offset":4531,"line":129,"column":5},"end":{"offset":4542,"line":129,"column":16}}]},{"content":"\thandler = handlePanic(logger, handler)","contentStart":{"offset":3570,"line":98,"column":0},"ranges":[{"start":{"offset":3581,"line":98,"column":11},"end":{"offset":3592,"line":98,"column":22}}]}],"language":"Go"}
{"type":"content","path":"cmd/embeddings/shared/main.go","repositoryID":399,"repository":"github.com/sourcegraph/sourcegraph","branches":["HEAD"],"commit":"9b47c6430cc3c9cb16695b4fa0a1a277cbbdb327","hunks":null,"chunkMatches":[{"content":"func handlePanic(logger log.Logger, next http.Handler) http.Handler {","contentStart":{"offset":6585,"line":190,"column":0},"ranges":[{"start":{"offset":6590,"line":190,"column":5},"end":{"offset":6601,"line":190,"column":16}}]},{"content":"\thandler = handlePanic(logger, handler)","contentStart":{"offset":2844,"line":81,"column":0},"ranges":[{"start":{"offset":2855,"line":81,"column":11},"end":{"offset":2866,"line":81,"column":22}}]}],"language":"Go"}

Configure Search Jobs

Search Jobs requires an object storage to store the results of your search jobs. By default, Search Jobs stores results using our bundled blobstore service. If the blobstore service is deployed, and you want to use it to store results from Search Jobs, you don't need to configure anything.

To use a third party managed object storage service, you must set a handful of environment variables for configuration and authentication to the target service.

If you are running a sourcegraph/server deployment, set the environment variables on the server container
If you are running via Docker-compose or Kubernetes, set the environment variables on the frontend and worker containers

Using S3

Set the following environment variables to target an S3 bucket you've already provisioned. Authentication can be done through an access and secret key pair (and optionally through session token) or via the EC2 metadata API.

Never commit AWS access keys in Git. You should consider using a secret handling service offered by your cloud provider.

SEARCH_JOBS_UPLOAD_BACKEND=S3
SEARCH_JOBS_UPLOAD_BUCKET=<my bucket name>
SEARCH_JOBS_UPLOAD_AWS_ENDPOINT=https://s3.us-east-1.amazonaws.com
SEARCH_JOBS_UPLOAD_AWS_ACCESS_KEY_ID=<your access key>
SEARCH_JOBS_UPLOAD_AWS_SECRET_ACCESS_KEY=<your secret key>
SEARCH_JOBS_UPLOAD_AWS_SESSION_TOKEN=<your session token> (optional)
SEARCH_JOBS_UPLOAD_AWS_USE_EC2_ROLE_CREDENTIALS=true (optional; set to use EC2 metadata API over static credentials)
SEARCH_JOBS_UPLOAD_AWS_REGION=us-east-1 (default)

If a non-default region is supplied, ensure that the subdomain of the endpoint URL (the AWS_ENDPOINT value) matches the target region.

You don't need to set the SEARCH_JOBS_UPLOAD_AWS_ACCESS_KEY_ID environment variable when using SEARCH_JOBS_UPLOAD_AWS_USE_EC2_ROLE_CREDENTIALS=true because role credentials will be automatically resolved.

Using GCS

Set the following environment variables to target a GCS bucket you've already provisioned. Authentication is done through a service account key, either as a path to a volume-mounted file or the contents read in as an environment variable payload.

SEARCH_JOBS_UPLOAD_BACKEND=GCS
SEARCH_JOBS_UPLOAD_BUCKET=<my bucket name>
SEARCH_JOBS_UPLOAD_GCP_PROJECT_ID=<my project id>
SEARCH_JOBS_UPLOAD_GOOGLE_APPLICATION_CREDENTIALS_FILE=</path/to/file>
SEARCH_JOBS_UPLOAD_GOOGLE_APPLICATION_CREDENTIALS_FILE_CONTENT=<{"my": "content"}>

Provisioning buckets

If you would like to allow your Sourcegraph instance to control the creation and lifecycle configuration management of the target buckets, set the following environment variables:

SEARCH_JOBS_UPLOAD_MANAGE_BUCKET=true

Supported result types

Search jobs supports the following result types:

file
path
diff
commit

The following result types are not supported:

symbol
repo

Limitations

The following elements of our query language are not supported:

file predicates, such as file:has.content, file:has.owner, file:has.contributor, file:contains.content
catch-all .* regexp search
Multiple rev filters
Queries with index: filter

Alternatively, the search bar supports the count:all operator which increases result limits and timeouts. This works well if the search completes within a few minutes and the number of results is less than the configured display limit. For longer running searches and searches with huge result sets, Search Jobs is the better choice.

Disable Search Jobs

To disable Search Jobs, set DISABLE_SEARCH_JOBS=true in your frontend and worker services.

FAQ

How can I run a search job against all branches and all repositories?

You can use a combination of the repo: and rev: filter to search across all branches and repositories. For example, r: rev:*/refs/heads/* foo will search for foo across all branches and all repositories. See the search syntax documentation for more details.

Is there a time limit for a search job?

We break down a search job into smaller tasks and run them in parallel. Each task is limited to a single revision and a single repository. This allows us to run searches across large codebases. The chance of hitting timeouts is very low, but it's not impossible. If you do hit a timeout, please reach out to support and we will help you troubleshoot the issue. As an immediate measure, you can break down your search query into smaller queries and run them separately.

A search job failed, how can I find out what went wrong?

We mark a search job as failed if any of the tasks fail. You can access the log of a search job from the search jobs UI. The log contains one line per task. If a task fails, the log will contain the error message.