top of page

Repo Roulette: Spin the Wheel, Win a Credential

  • Writer: FAIR INTEL
    FAIR INTEL
  • 2 days ago
  • 14 min read

December 10, 2025

ree

Synopsis

The analysis shows that exposed credentials in public Git repositories represent a scalable, low-cost attack vector, as automated tools can enumerate millions of GitLab projects and reliably harvest long-lived API keys, database passwords, and cloud access tokens. Strategically, leadership must treat secret exposure as an enterprise risk, updating governance, funding secret-scanning, and prioritizing secure development practices across all business units. Operationally, teams need to integrate automated secret detection into CI/CD, enforce key rotation, and enhance monitoring for anomalous credential use. Tactically, security and DevOps staff should focus on rapid revocation of exposed keys, targeted threat hunting, and tighter repo hygiene. These conditions materially increase risk posture by raising threat event frequency, susceptibility, and potential impact, especially where controls for authenticator management and information location are weak. Financial resilience is pressured by anticipated incident-response costs, possible data exposure, cloud overages, and regulatory or contractual fallout. Still, it can be strengthened by proactively reducing secret density, shortening key lifetimes, and planning for realistic loss scenarios in FAIR-informed risk and budgeting processes.


Evaluated Source, Context, and Claim

Artifact Title

Public GitLab repositories exposed more than 17,000 secrets


Source Type

News article/web report (security/tech journalism)


Publication Date: November 28, 2025


Credibility Assessment

The article describes a named security researcher’s work, provides concrete methodology, volumes, and costs, and is consistent with prior secret-scanning research, which supports moderate-to-high credibility. However, quantitative findings rely primarily on a single researcher’s automated scans and have limited independent corroboration.


General Claim

A security researcher used automated TruffleHog scans across all 5.6 million public GitLab Cloud repositories and found 17,430 live secrets tied to 2,804 domains—including cloud, database, messaging, and OpenAI keys—showing that many organizations still expose long-lived credentials in public code despite some revocations after notification.


Narrative Reconstruction

A security engineer demonstrated that large numbers of valid credentials are exposed in public Git repositories by automating the enumeration of all 5.6 million GitLab Cloud projects via a public API, queuing them in AWS, and scanning each with TruffleHog for sensitive secrets such as API keys, database credentials, bot tokens, and cloud access keys. While the actor in the article is a benign researcher, the same low-cost, scalable approach could be adopted by low-to-moderately sophisticated, financially motivated, or opportunistic attackers to continuously mine public code for reusable secrets. The assets at risk include cloud platforms, databases, messaging bots, AI platforms, and GitLab itself, because exposed keys can grant direct access to production services, data, and infrastructure. The operational goal for a malicious actor using this method would likely be unauthorized access, data theft, service abuse, or monetization via fraud or bug bounties, leveraging public-source exposures rather than exploiting traditional software vulnerabilities.


Risk Scenario

Risk Scenario

External opportunistic threat actors systematically scan public Git hosting platforms for exposed credentials and, upon discovering valid secrets in an organization’s public GitLab repositories, use those credentials to gain unauthorized access to the organization’s cloud services and data, causing financial loss and operational disruption.


Threat

External, opportunistic threat actors (financially motivated or data-theft oriented) who systematically scan public Git-hosting platforms for exposed credentials.


Method

Actors automate the enumeration of public GitLab repositories via public APIs, use secret-scanning tools at scale to discover valid API keys, tokens, and passwords, and then use those credentials to authenticate directly to cloud services, databases, messaging platforms, or other systems.


Asset

Organization-owned cloud accounts, databases, APIs, and services whose authenticators (keys, tokens, passwords) are embedded in public Git repositories associated with the organization’s projects or developers.


Impact

Successful use of exposed secrets can lead to unauthorized access to infrastructure and data, service misuse, increased cloud or service costs, operational disruption, and potential secondary impacts, including regulatory exposure, customer impact, and incident response and remediation costs.

 

Evidentiary Basis for Synopsis and Recommendations

Supporting observations from the analysis help clarify how the threat landscape, control environment, and organizational behaviors interact to shape overall risk exposure. These insights provide the foundation for identifying where controls perform well, where gaps or weaknesses create unnecessary vulnerability, and how attacker methods intersect with real-world operational conditions. Building on these findings, the recommendations that follow focus on strengthening resilience, improving decision-making, and guiding readers toward practical steps that enhance both security posture and risk-informed governance.


FAIR Breakdown

Threat Event Frequency (TEF)

Because OSINT describes widespread, long-lived credential exposure and shows that scanning at GitLab scale is cheap and fast, TEF must be inferred from the prevalence of exposed secrets, the ease of automation, and the attractiveness to attackers. TEF is likely moderate to high for organizations with public repositories, since multiple actors (benign and malicious) can continuously scan public code for exploitable secrets.


Contact Frequency (CF)

Public code on large platforms is globally accessible and can be enumerated via APIs and search, supporting frequent automated scanning. Sector targeting is broad: any organization or individual using public GitLab repos is potentially exposed, with higher contact frequency for technology-heavy, cloud-native, and open-source active organizations.


Probably of Action (PoA)

Motivation is strong for both financially and opportunistically motivated actors because valid secrets provide direct, credential-based access without complex exploits. The low operational cost, high density of secrets, and evidence that some keys remain valid over many years suggest a high PoA for attempting to harvest and test exposed credentials once repositories are scanned.


Threat Capability (TCap)

TCap is moderate-to-high because the required skills and tooling are readily available, but effective exploitation still benefits from technical understanding of target services.


Exploit sophistication: Using TruffleHog or similar tools, public APIs, and cloud automation indicates moderate sophistication; no advanced exploitation is needed beyond scripting and secret validation.


Bypass ability: Attackers can bypass many perimeter defenses by using valid credentials directly against cloud APIs or services, avoiding traditional vulnerability-based detection paths.


Tooling maturity: Secret-scanning tools and cloud pipelines (e.g., queues, serverless functions) are mature, widely documented, and easy to adapt, enabling repeatable campaigns.


Campaign success rate: Given the large absolute number of valid secrets and long-lived keys described, campaigns targeting exposed credentials may achieve moderate success among exposed organizations, even if many secrets are revoked after discovery.


Attack path sophistication: The attack path—enumerate repos, scan for secrets, validate credentials, then access services—is conceptually simple but effective, leveraging cloud-native automation rather than complex exploit chains.


Cost to run attack: Infrastructure costs are low (on the order of hundreds to low thousands of dollars at large scale) and can be amortized across many potential victims, making the attack highly feasible for modestly resourced actors.


Control Strength (CS)

Typical environments combining public Git hosting and cloud services show mixed control strength: some organizations rotate and revoke secrets promptly, while others leave long-lived keys in public code.


Resistive Strength (RS) Effectiveness of preventive/detective controls:

  • Secure development practices and code review can catch secrets before commit, but the OSINT evidence of thousands of valid secrets indicates frequent failure of such controls.

  • Some organizations clearly respond to notifications and revoke compromised keys, showing that incident response processes exist but are reactive and unevenly applied.

  • Where dedicated secret-scanning and pre-commit hooks are deployed, resistance is higher, but the overall population’s high secret density suggests these controls are not consistently enforced.


Control Failure Rate

  • Developers routinely embed API keys, database passwords, and tokens directly in code or configuration files and push them to public repositories.

  • Organizations lack comprehensive, automated secret-scanning integrated into CI/CD pipelines and repository governance, allowing exposures to persist from 2018 back to 2009.

  • Credential lifecycle management (rotation, revocation, scoping, and least privilege) is inconsistent, leaving some secrets exposed for years despite being publicly accessible.


Susceptibility

Given moderate-to-high threat capability and uneven control strength, overall susceptibility is estimated at approximately 45–65 percent for organizations that store sensitive code in public Git repositories.

Probability the asset will be harmed is influenced by:


Exploitability: Likely in the 60–75 percent range once a valid credential is discovered, because testing and using an exposed key is trivial and often not blocked by additional factors like IP allowlists or MFA on service-to-service paths.

Attack surface: For organizations with public GitLab repositories, perhaps 30–50 percent of projects may contain configuration or integration code that could embed secrets, creating a sizable but not universal attack surface.

Exposure conditions: Where development teams frequently push code to public repositories without enforced secret-scanning and use long-lived static keys, exposure conditions may increase susceptibility to the higher end of the range.

Patch status: Traditional patching has limited impact because the primary weakness is credential hygiene and repository governance, not software flaws; improvements instead depend on secret rotation, scoping, and removal from code.


Numerical Frequencies and Magnitudes

All values relating to actual dollar amounts are for example/speculative purposes only. Organizations would need to take into account their own asset values, control strength, telemetry, etc., and adjust numbers accordingly.


Loss Event Frequency (LEF)

4/year (estimated)

  • Justification: Public repositories are continuously scannable, secret density is non-trivial, and long-lived valid secrets increase the chance that at least several exposed credentials per year are found and abused by external actors.

Vulnerability (probability of harm per contact): .35

  • Justification: Not every contact or scan yields a valid, exploitable secret for a given organization, and some secrets may be revoked or scoped; however, where long-lived static keys exist in public repos, the probability of harmful use upon discovery is material.


Secondary Loss Event Frequency

1.2/year (estimated)

  • Justification: A significant fraction of primary credential-abuse events can lead to secondary consequences, such as data exposure, regulatory notifications, or broader incident response, but not all abuse will reach that threshold (assumed to be ~30 percent of primary events).


Loss Magnitude

Estimated range:

  • Min: $10,000

  • Most Likely: $150,000

  • Maximum: $2,000,000

Justification:

  • Minimum covers investigation, revocation, and rotation of secrets, limited service disruption, and basic consulting or internal labor.

  • Most likely includes triage, broader credential rotations, remediation of cloud or database misuse, engineering rework, and possible customer communication.

  • Maximum contemplates compromise of sensitive data or critical services, significant overage in cloud costs or fraudulent use, and extended operational and legal response.


Secondary Loss Magnitude (SLM)

Estimated range:

  • Min: $25,000

  • Most Likely: $300,000

  • Maximum: $5,000,000

Justification:

  • Secondary losses may include legal and regulatory costs, customer notification and credit monitoring, reputational repair, contractual penalties, and extended forensic work.

  • Maximum values allow for higher-impact scenarios where exposed secrets enable access to high-value systems or regulated data, leading to substantial regulatory or contractual consequences.


Mapping, Controls, and Modeling


MITRE ATT&CK Mapping

Reconnaissance

T1593.003 – Search Open Technical Databases (Code Repositories)

Reference: “After scanning all 5.6 million public repositories on GitLab Cloud, a security engineer discovered more than 17,000 exposed secrets across over 2,800 unique domains… Luke Marshall used the TruffleHog open-source tool to check the code in the repositories for sensitive credentials like API keys, passwords, and tokens.”

T1593 – Search Open Websites/Domains (general open-web reconnaissance)

Reference: “He also checked the Common Crawl dataset that is used to train AI models, which exposed 12,000 valid secrets.”

Credential Access

T1552.001 – Unsecured Credentials: Credentials In Files

Reference: “The researcher found 17,430 verified live secrets… The largest number of leaked secrets, over 5,200 of them, were Google Cloud Platform (GCP) credentials, followed by MongoDB keys, Telegram bot tokens, and OpenAI keys.”


NIST 800-53 Affected Controls

IA-5(7) — Authenticator Management | No Embedded Unencrypted Static Authenticators Exposed API keys, passwords, and tokens stored directly in public repository code.

Reference: “Luke Marshall used the TruffleHog open-source tool to check the code in the repositories for sensitive credentials like API keys, passwords, and tokens… The largest number of leaked secrets, over 5,200 of them, were Google Cloud Platform (GCP) credentials, followed by MongoDB keys, Telegram bot tokens, and OpenAI keys.”This behavior directly conflicts with IA-5(7)’s requirement to avoid embedding unencrypted static authenticators in applications or other static storage.

IA-5(6) — Authenticator Management | Protection of Authenticators

Failure to protect authenticators at a level commensurate with the sensitivity of the systems they access.

Reference: “Historical data shows that most leaked secrets are newer than 2018. However, Marshall also found some very old secrets dating from 2009, which are still valid today.”Long-lived, valid secrets in public repositories indicate inadequate protection and lifecycle management for authenticators, undermining IA-5(6).

SC-12 — Cryptographic Key Establishment and Management

Poor management and storage of cryptographic keys used for cloud and API access.

Reference: “The largest number of leaked secrets, over 5,200 of them, were Google Cloud Platform (GCP) credentials, followed by MongoDB keys…”Leaving cryptographic and service keys in public code suggests weak key generation, storage, and destruction practices, contrary to SC-12’s requirements for managed key lifecycles and controlled access.

SC-28 — Protection of Information at Rest

Authentication information and other sensitive data at rest in code repositories is not protected from unauthorized disclosure.

Reference: “The researcher found 17,430 verified live secrets… associated with 2,804 unique domains…”Because public repositories are world-readable, storing secrets there undermines SC-28’s mandate to protect the confidentiality and integrity of information at rest, including authentication information.

CM-12 — Information Location

Inadequate awareness and tracking of where sensitive information (secrets and keys) resides within code and repositories.

Reference: “After scanning all 5.6 million public repositories on GitLab Cloud, a security engineer discovered more than 17,000 exposed secrets across over 2,800 unique domains.”The need for an external researcher to discover and report these secrets indicates that organizations lack effective mechanisms for locating and inventorying sensitive information, as envisioned by CM-12, and for using tools to support information location.


Monitoring, Hunting, Response, and Reversing

Monitoring

Monitoring should prioritize telemetry that detects misuse of exposed credentials across cloud, identity, and API layers, supported by network logs showing anomalous access patterns, endpoint logs capturing automated secret-scanning tools if run internally, and DNS or email telemetry for alerts triggered by credential-testing activity. Logging sufficiency requires ensuring high-fidelity logs of cloud authentication, API access, repo activity, CI/CD, and identity provider records that capture token use and key creation or revocation events. Key indicators include sudden spikes in API calls, access from new locations, use of legacy keys, service-to-service requests inconsistent with typical workloads, and unexplained increases in repo cloning or enumeration. Monitoring gaps include a lack of visibility into credential use against external cloud services, insufficient logging for secret-rotation workflows, and limited tracking of developer repository behavior. Correlation logic should combine repository exposure evidence with downstream authentication attempts, alerting on first-use-of-key, high-frequency misuse, or access patterns outside established baselines. Dashboards should highlight exposed secrets by age, service type, and revocation status, plus metrics tracking access attempts tied to compromised credentials. Monitoring validation should rely on simulated credential misuse in a controlled environment to ensure alerts trigger as expected.


Hunting

Hunting should begin with the hypothesis that exposed credentials in public repositories may have been discovered and tested by external actors, guiding queries for unusual authentication attempts using older or never-before-seen keys across cloud platforms and APIs. Telemetry sources include cloud authentication logs, API gateway logs, repo audit logs, CI/CD logs, and DNS or network metadata showing repeated low-volume credential-use patterns. Detection logic should focus on improbable geolocation access, rapid-sequence authentication attempts across multiple services, use of long-lived tokens, and activity occurring shortly after repository updates. Noise-to-signal considerations include legitimate developer automation, CI/CD pipelines, and service accounts, which require baselining typical usage patterns to avoid over-alerting on expected behavior.


Response

Response activities require collecting cloud authentication logs, API access records, repository audit logs, and CI/CD pipeline logs to determine whether exposed credentials were accessed or abused, along with artifacts such as timestamps of key use, service-account behaviors, and any anomalous cloud workload execution. Anti-forensic behavior is unlikely to be technical, but attackers could rotate or delete compromised keys after use, or blend access patterns into regular service traffic. Event reconstruction depends on correlating repository exposure with first-use timestamps and tracking downstream resource actions tied to the credential. DFIR evidence should inform FAIR loss estimates by quantifying the scope of unauthorized access, data touched, workloads triggered, and remediation labor. Likely containment measures include revoking exposed keys, forcing the regeneration of service credentials, conducting access reviews, and tightening repository governance. Priority artifacts include secret age, last-use metadata, access origin, and privilege level of exposed keys. Telemetry needs include enhanced cloud logs and repo governance instrumentation. IR gaps may stem from missing key-use timestamps or insufficient historical cloud logs. Validation strategies include red-teaming credential use in a controlled manner to confirm containment and the effectiveness of monitoring.


Reverse Engineering

Although no malware loader is present, reverse engineering considerations apply to understanding how automated scanners like TruffleHog detect embedded secrets, how they enumerate repositories, and how attackers could slightly modify or obscure key patterns to evade detection. Evasion may include encoding or fragmenting secrets within code files to avoid matching patterns used by scanning tools. At the same time, persistence would involve long-lived tokens and credentials that remain valid for years. Indicators include static API keys, tokens, or passwords in code repositories, especially in configuration files or scripts. Dynamic and static analysis would focus on repo contents, commit histories, and automated scanning workflows. Expected artifacts include discovery of long-lived keys, metadata about commit authorship, and patterns of repeated accidental secret inclusion. Further reverse engineering work may explore how adversaries chain secret misuse with cloud API behaviors or privilege escalation via overly permissive service roles.


CTI

CTI efforts should evaluate PIRs, focusing on whether opportunistic actors targeting exposed secrets are operating in the organization’s sector or those of partners, how frequently such scanning campaigns recur, which TTPs (public API enumeration, large-scale secret scanning, credential testing) consistently surface, and which assets (cloud APIs, databases, messaging services) appear repeatedly targeted. SIR work should identify missing IOCs, such as IPs or automation infrastructure used for secret testing, determine whether samples of code containing exposed keys require deeper analysis, understand gaps in infrastructure attribution, and confirm the necessary telemetry sources, such as cloud API usage logs and repo audit logs. The collection should emphasize OSINT monitoring of secret-scanning research, internal scanning results, cloud-service logs, ISAC/ISAO collaboration on related events, and routine scraping of malware or secret-exposure repositories to understand how attackers exploit public code. Mapping should cluster campaigns by scanning method, credential type, and service targeted, map observed behaviors to ATT&CK, compare with past exposures, assess confidence levels, identify recurring attacker patterns such as long-lived token exploitation, and validate hypotheses about actor motivation and capability through correlation with external reporting.


GRC and Testing

Governance

Governance should reassess policy adequacy for software development, cloud access, and repository management to explicitly prohibit hard-coded secrets in public code, require automated secret scanning, and mandate rapid revocation and rotation of exposed credentials. Oversight functions (CISO, DevSecOps, internal audit, risk management) should receive apparent authority to enforce repo governance and regularly review secret-exposure metrics. RA, PM, and PL family documents should be updated to treat exposed credentials as a defined risk scenario, to embed controls such as mandatory pre-commit scanning and CI/CD checks, and to ensure that planning artifacts capture dependencies on Git hosting and cloud keys. The risk register should add or refine entries for “public-source credential exposure,” with likelihood and impact informed by the described TEF/LM estimates and mapped to affected business services. Board and executive communication should summarize the systemic nature of secret exposure, the low cost and ease of automated scanning by adversaries, current exposure levels, and planned remediation milestones, framed in business-impact terms rather than technical detail.


Audit and Offensive Security Testing

Audit and offensive testing should focus on whether policies on credential handling, repository governance, and cloud access are enforced, and on identifying findings where secrets are still embedded in code or where no automated scanning is in place. Evidence gaps such as missing repo audit trails, incomplete cloud key-use logging, or absent secret-rotation records should be documented as vulnerabilities that enable long-lived exposures. Policies and controls for authenticator management, information protection, and configuration management need validation through targeted tests that attempt to commit secrets and confirm they are blocked or immediately detected. Compliance obligations related to key management, access control, and data protection should be mapped to the scenario to show how exposed secrets could lead to non-compliance. Red team exercises should simulate external actors enumerating public repos and testing discovered keys against cloud services. In contrast, the purple team should verify that detection and response fire appropriately when test secrets are used. The penetration testing scope should explicitly include public and internal Git repositories, CI/CD pipelines, and cloud entry points accessible via credentials. Exploit reproduction should use non-production, intentionally exposed test keys to validate how quickly such exposures can be discovered and abused, and whether existing controls limit damage. Control validation should close the loop by confirming that secret scanning, key rotation, and access restrictions measurably reduce exposure over time.


Awareness Training

Awareness training should emphasize the pattern that credentials placed in code, even temporarily, can be harvested at scale from public repositories and used directly against cloud, database, and messaging services, shifting focus from only phishing-style social engineering to developer-centric hygiene. Human failure modes to address include convenience copying of keys into scripts, misunderstanding of “private vs public” repo status, over-reliance on obscurity, and lax rotation of long-lived tokens. Role-specific training should prioritize developers, DevOps, and admins with practical examples of how secrets are discovered and abused, while finance, customer-facing staff, and executives receive higher-level briefings on business impact and the importance of funding secret-scanning and remediation efforts. Behavioral indicators to teach include spotting credentials in code reviews, recognizing risky configuration patterns, and escalating any suspected key or token exposure. Updated simulations should complement phishing drills with exercises where planted “canary” secrets are used to demonstrate how quickly they could be found, reinforcing the message that any committed secret is effectively public. Communication guidelines should stress careful handling of configuration snippets in tickets, chats, and documentation, and discourage sharing keys through email or informal channels. Training effectiveness should be measured through reductions in discovered secrets per scan, improved time-to-revoke exposed keys, participation rates in secure coding courses, and periodic refresh cycles aligned with release cadences and significant platform changes.

Comments


Commenting on this post isn't available anymore. Contact the site owner for more info.
bottom of page