Reading view

How Are You Using AI?

Besides asking it for help debugging issues or providing code templates, is anyone here using AI in a meaningful way at their jobs? I see a lot of posts on AI agents and their capabilities but i havent seen any real world examples of people using AI other than a search engine on steroids.

submitted by /u/DeLoMioFoodie to r/devops
[link] [comments]

DevSecOps Roadmap

I’m working toward a DevSecOps role and put together this roadmap to guide my learning across cloud, security, automation, and CI/CD. Trying to be intentional about building real-world skills and projects along the way—would love feedback.


🧭 DevOps / Cloud / Security Roadmap (Phased Plan)


Phase 0 – Foundations

Linux + Bash scripting

Git + GitHub

PowerShell (Windows / AD environment)

Python (automation / scripting)

Logging (Linux syslog / Windows Event Logs)

Git commits (clear messages / branches)

Real-world Git usage (code reviews)

Pull request / branching strategies (Git flow)

Linux process management (ps / top / htop)

Linux permissions & users

Linux systemd

Linux networking tools (netstat / ss / curl / tcpdump)

👉 Milestone Project


Phase I – Identity & Access Management + Security

Active Directory

Azure AD (Entra ID)

Okta

Google Workspace

Jira / ServiceNow

IAM fundamentals

MFA + Conditional Access

Zero Trust principles

Security + certs

SC-300 cert

IAM misconfiguration scenarios (privilege escalation)

Practice logging / alerting

👉 Milestone Project

🎓 Certifications

CCNA

AZ-104 / SC-300

AZ-500

Terraform Associate

AWS Cloud Practitioner / DevOps Engineer

CKA


Phase II – Databases + Automation + IaC

PostgreSQL (queries, joins, ~150MB datasets)

pgvector (vector DB + text search)

Python (boto3, psycopg2)

Terraform (IaC fundamentals)

Store DB creds securely (no hardcoding)

Secrets management (env vars / Vault intro)

Deeper Python (clean code / advanced scripts)

Build small app (Flask / FastAPI)

Cost awareness (AWS cost elimination)

Use tags in Terraform

👉 Milestone Project


Phase III – Containers & AWS

Docker (Dockerfile / Compose)

Kubernetes (Pods / Deployments / Services)

AWS:

IAM

EC2

S3

VPC

CloudWatch

CI/CD pipeline

Least-privilege IAM roles

CloudWatch for suspicious activity

Networking Fundamentals:

DNS

HTTP / HTTPS

TLS

Load balancers (ALB / NLB)

NAT

Routing

Subnets

How traffic flows in Kubernetes

👉 Milestone Project


Phase IV – Automation & Configuration

Ansible (playbooks / roles)

Terraform + Ansible integration

Configuration drift detection

Immutable infrastructure concepts

👉 Milestone Project


Phase V – CI/CD Pipelines + DevSecOps

Jenkins / GitHub Actions

CI/CD pipelines (build → test → deploy)

Trivy (container scanning)

Snyk / Checkov / tfsec (IaC scanning)

HashiCorp Vault (secrets)

OPA / Kyverno (policy as code)

Azure Security (Defender / Key Vault)

AWS pipelines

LLM security (prompt injection / PII protection)

Pipeline Security:

Fail pipelines on vulnerabilities

Block deploys if insecure

Generate security reports automatically

Observability:

Prometheus + Grafana

Logs: ELK stack / Loki

Alerting & IR:

Alerting basics

Incident response basics

Runbooks (incident scenario → response steps)

👉 Milestone Project


Phase VI – Integration + Job Prep

3–5 portfolio projects

Practice Jira-style documentation

Combine everything:

Terraform (AWS + Azure)

Docker + Kubernetes

CI/CD pipelines

IAM

Security scanning

👉 Milestone Project


⏱️ Weekly Structure

Day 1–4: Learning + Labs

Day 5: Build project

Weekend: Documentation + GitHub


submitted by /u/AnalystFew5888 to r/devops
[link] [comments]

How are you handling AI quality checks in your deployment pipeline?

Wanted to see if anyone at a Seed - Series A startup has found success with AI eval platforms? We’re shipping new/improving existing AI features pretty regularly and our existing workflows are pretty solid except we don’t have much testing or tracing for our AI-generated outputs.

We’re find that even small prompt tweaks or swapping to the newest model can quietly break output quality in ways that don't surface until a user notices. And right now we’ve got nothing automated that catches that before it ships. I've started looking into eval checks as an actual CI step with the hopes we can block merges if outputs fall below some threshold. Obviously a lot of eval platforms out there but haven’t seen many startups our size adopting those tools yet.

Not trying to add a bunch of work to the team but just hoping to get some core testing in place.

submitted by /u/TangerineTrue8757 to r/devops
[link] [comments]

Is it a problem if I'm only learning on-prem Kubernetes and never touch AWS/Azure?

I'm a junior DevOps engineer and I'm a bit worried about the direction I'm learning in, so I wanted to get some outside opinions.

At my job (and in my personal projects) I work almost entirely with on-prem / self-managed infrastructure. The stack I'm learning is roughly:

  • K3s (self-managed Kubernetes on VMs)
  • Cilium as the CNI (incl. Gateway API)
  • ArgoCD for GitOps
  • Ansible for provisioning
  • Terraform
  • Longhorn for storage, CloudNativePG for Postgres
  • etc...

The thing is, I've never used a public cloud — no AWS, Azure, or GCP. No EKS/AKS/GKE, no managed databases, no Terraform against a cloud provider. Everything I do is bare VMs and self-hosted components.

My question: is this a problem? A few things I'm wondering:

  1. Will I be at a disadvantage in the job market by not knowing the big clouds?
  2. Are the concepts I'm learning (Kubernetes internals, networking, GitOps, storage, etc.) transferable to cloud-managed setups, or is it a different world?
  3. Should I make an effort to learn a cloud on the side, or is deep on-prem experience valuable enough on its own?

I genuinely enjoy the on-prem / "build it yourself" side of things, I just don't want to accidentally box myself in. Any honest perspective from people who've been in the field longer would be really appreciated. Thanks

submitted by /u/Low-Response-5711 to r/devops
[link] [comments]

Is Azure capacity this constrained or am I doing it wrong?

I'm working with AWS for many years, and currently I'm working in product with suppose to be cloud agnostic.

I started with AWS and now it's time to spin up it into Azure (because many enterprises using azure for some reason).

I started in US EAST region in azure and at beginning I had an issue with Postgres Flexible, raised a support ticket, and in the result they recommended me to move to another region. The overall conversation to say this takes about 1 day.

I've moved to US EAST 2, and after AKS deployment I stuck with vCPU (Standard Dasv7 Family vCPUs) quote (100) and here we go again... They send me the same message template as they do for previous ticket...

> ...
> Your ask for quota has been reviewed and backlogged at this time. It will be reviewed again when additional capacity becomes available. We do not have an ETA for when your request can be fulfilled but please be assured that we will continue working on it and update you as soon as we have more details to share and/or process the request.
> ...

I'm already waiting for more then 1 day, and there is no responses from their support.

Long Story Short: Because I don't want to wait for days, weeks and months to be able to test infrastructure on Azure. If it will be my decision I just stop and forget about this nightmare. Please suggest the regions and instance types with which I will not have issues.

submitted by /u/lanycrost to r/devops
[link] [comments]

GitHub - protect Actions yml file from devs

Quick background: we are using Azure DevOps, but migrating to GitHub enterprise for both code repos and deployments. In DevOps all files related to the deployment pipeline are located in the same project, but separate repo. This allows me to control who can modify pipeline files and developers are excluded.
I am having issues achieving the same in GitHub with Actions. There is a .github folder in the repo that I would like to protect. I tried using CODEOWNERS with rules and branch policies. It works, but not as clean as in DevOps. I would like to avoid requiring pull requests for any commit, which is so far the only way I was able to achieve what I want.

Please share how you designed this in your setup.

submitted by /u/pneteng to r/devops
[link] [comments]

I Built a Retro Terminal Game to Make Kubernetes Less Boring

I Built a Retro Terminal Game to Make Kubernetes Less Boring

Hi lovely people of r/devops,

Hope you all are doing well. I’ve posted here before about Project Yellow Olive - my small attempt at making Kubernetes practice feel less boring and more game-like.

I’m learning Kubernetes myself for CKAD/CKA, and staring at YAML all day can get tiring. So I built a retro terminal game where you solve Kubernetes challenges inside a story.

The latest update adds Signal Town, a new section focused on Kubernetes Services. Team Evil has cut the signals between Pokepods, and your job is to fix them using concepts like ClusterIP, NodePort, Ingress, and selectors.

It’s open source and runs locally.

Would love for you to try it and share feedback. Pls star the repo, if you find it interesting :).
Thanks !

Repo URL: https://github.com/Anubhav9/Yellow-Olive

It can also be installed via PyPi ( pip ) by typing in the following command :

pip install yellow-olive

Thanks !

submitted by /u/Content_Ad_4153 to r/devops
[link] [comments]

cracked job interview - applied for dev role, got hired for DevOps skills

cracked job interview - applied for dev role, got hired for DevOps skills

I have recently been interviewed by product company for a Full-Stack dev role. They required building demo assignment.

Though I initially planned to build a conventional monolithic app and deploy it on Render or Railway but I had learned decent level of AWS Serverless in my current role so I thought why not leverage that.

The company planned to test code quality but got more interested in knowing about my DevOps skills since I had put special level of emphasis on it.

- GitHub actions CICD
- AWS CloudFormation IaC
- OIDC for secrets
- kill switch for DDoS
- guardrails for DoW

Surprisingly, the demo assignment + explanatory rounds impressed them enough that I landed the job.

I have open sourced the entire codebase for any newbies to learn.

submitted by /u/harsh611 to r/devops
[link] [comments]

Controlling Telemetry explosion at the Edge with OtelCol and OTTL

Telemetry has been exploding due to all these new AI workloads and I feel like there hasn’t been a lot of guidance around controlling this. Everybody’s observability bill is up and these backend vendors are raking it in; datadog stock went up almost 100% in the last 30 days (yes, some of the rise is due to their new AI observability tooling, but if you read the earnings report, their revenue from their backend business is booming even more. They call it non-AI revenue). And all these vendors are selling you a paid solution for it. They’re giving you levers and knobs to drop/sample telemetry after ingest. But it’s baked in to the price, because, of course it is! They have to make their money somehow, and after your telemetry is shipped and landed in their backend and then deleted, you’ve undoubtedly paid for it. Edge reduction itself isn't new. cribl, vector, and collector processors have done it for years, but doing it in the collector with OTTL means no proprietary agent and no lock-in.

With otel graduating last month and opamp becoming a very real thing, it’s so easy to drop/sample telemetry on the edge. It saves you egress, shipping, and ingestion. Not to mention, you are not using a vendor’s propriety tooling to control your telemetry, meaning you’re not locked in. Wana switch backends tomorrow? You can--all your config is based on OSS standards. Anyways, I wrote up a practical guide on how to actually do it, with real config examples, if anyone's interested

submitted by /u/Broad_Technology_531 to r/devops
[link] [comments]

After the tj-actions supply chain attack I wrote up the 7 hardening techniques that would have prevented it

The March 2025 tj-actions incident where 23,000 repos had their secrets exposed through one compromised Action stuck with me. Here are the 7 specific things that would have prevented it.

1. Pin Actions to commit SHAs not tags

A tag like u/v4 can be silently moved to malicious code.

A SHA cannot be faked. This one change protected every team that had done it during CVE-2025-30066.

2. Use OIDC instead of stored secrets

Long lived credentials stay valid until manually rotated.

OIDC tokens expire when the job ends. Nothing to steal.

3. Lock down GITHUB_TOKEN permissions

Add permissions: {} at the top of every workflow and grant each job only what it specifically needs.

4. Treat workflow files like production code

Use CODEOWNERS to require security team review on every .github/workflows/ change before it merges.

5. Scan with Zizmor

pip install zizmor && zizmor .github/workflows/ Catches dangerous pull_request_target configs and script injection risks automatically. Free and takes 2 minutes.

6. Mirror critical Actions into your own org

Fork the Actions you depend on so you are not trusting a stranger's account security.

7. Enforce environment gates

Even a compromised workflow needs human approval before reaching production. That pause catches anomalies.

I wrote a full breakdown with before and after YAML examples for each technique here if anyone needs.

Happy to answer questions in the comments.

submitted by /u/wizvinay to r/devops
[link] [comments]

Any native Harness templates for OpenClaw or Hermes yet?

Not sure if there is a better subreddit for this but, we are trying to set up an automated release pipeline where an AI agent can review our Terraform plan outputs, check them against our internal security policies, and automatically approve staging deployments.

The problem is we need the agent to run natively within our CI/CD context so it can securely read the repository state and secrets without exposing our infrastructure code to an external API wrapper. I know Harness has some AI features built in now, but does anyone know if there are official pipeline templates or integrations specifically for OpenClaw or Hermes?

Right now we are considering just using gitagent as the runtime to execute the loop inside a standard Harness step. It seems like the cleanest fallback because it lets you structure the agent purely as code and handles the OpenTelemetry tracing. But I would much rather use a native Harness template if one exists to avoid maintaining the custom step ourselves(unless its simpler than I think please correct me there too).

This is a new field with a lot of white gaps and not a lot of material online so any expert advice would help tremendously.

submitted by /u/Vedantagarwal120 to r/devops
[link] [comments]

What should I do to be taken seriously in the job market?

I'm an European developer with 6 years of development experience who started coding for fun. One day, I wanted to know how computers do stuff, and, since then, I've been developing my personal projects and just doing stuff because I like to do so.

Naturally, I´ve learnt a lot of 'sysadmin'/'devops(?)' regarding 'skills'. Like, first with a gh action that cloned and restarted my repos in a VPS. Then, I started using Linux, distro-hopping and learning how ilinux/computer work more deeply.

Eventually, I got into OSS and got a home-server. Deployed some stuff in it with docker on debian. Then, I switched to proxmox and started hosting some of my own stuff in it containerized. After that, I got into Nix(OS) and started declaratively defining my systems in my desktop and some of my VMs...
And, for the last year and a half, I've been doing some 'volunteer' developer work at a non-profit which has made me touch high-avaiability/k8s stuff.

I really never did this looking for a job. I really like learning by myself.

But now, I would like to get into the job market, and devops seem like a great path. I mean, I also like development but there's something intrinsically nice about deploying stuff and managing machines.

For the last few weeks, I've tried applying for development jobs but all the replies I get are: either nothing, ignored or a rejection because of my lack of 'real job' experience. I guess my lack of formal education in development also affects these outcomes.

And idk why, I get a feeling that no matter if I had a giant IaC orchestration system with 20 of the most relevant technologies repo in my GH profile, this wouldn't change the outcome.

So, yeah. What could I do about it?

submitted by /u/Victorioxd to r/devops
[link] [comments]

I have 4 yrs .Net dev Experience how to get into DevsOps

I really want to become a DevOps Engineer. I’m planning to shift careers because I feel like I have become stagnant in my current role as desktop and wed app dev.

The passion I once had for developing applications is gradually fading, and I want to try something new in the IT industry.

However, I’m not sure how to start or how to land a career in DevOps.

Thank you in advance.
Peace. Yow

submitted by /u/Zues_1997 to r/devops
[link] [comments]

Need suggestion on CKA certification

Hi guys, I'm planning to switch in next few months and have been preparing from last 3 4 months. I got very handful of calls in last 3months like 5 or 6 and only for 2 interviews were scheduled.

Now I'm planning to get CKA certificate this month.

By adding this certificate in my profile will the chance to get calls increase?

Anyone experienced this before?

submitted by /u/Honest_Respond_2973 to r/devops
[link] [comments]

TLS certs are dropping to 47 days

The CA/Browser Forum voted to cut TLS certificate lifespans down to 47 days by 2029, with shorter limits already rolling in before that.

Certbot + Let's Encrypt is the obvious answer for automation, but that still leaves a blind spot — you don't always know when a renewal silently fails until a client is already down.

For those of you managing infrastructure across multiple domains or clients: how are you actually staying on top of this? Is there a tool that gives you a proper overview, or have you cobbled something together yourself?

Asking because I'm validating whether this is a problem worth solving properly. Would love to hear how people are handling it today.

EDIT: Thanks for the info, guys. I wasn't aware of enough tools for this, I guess.

submitted by /u/mrehanabbasi to r/devops
[link] [comments]

Projects to practice manifest files

Recently came across mother of all demo app . It promised that it is a large blog app where multiple frontend and backend works intertwined .
But found out it to be maintainability fever dream. No two frontend and backend works properly if backend works properly, frontend is not configured . The last maintained project is of angular and is directly baked to use a hardcoded a backend url.
If you guys have some stable three tier app publicly available doesn’t even need to be dockerized It will be service of mine . I just want a stable app with few user flow which I can later do few of stress and smoke test . Thank you

submitted by /u/EnvironmentalRun4163 to r/devops
[link] [comments]
❌