Hacker News new | past | comments | ask | show | jobs | submit login
DevOps is a failure (leebriggs.co.uk)
291 points by blopeur on June 26, 2022 | hide | past | favorite | 391 comments



2005: your infrastructure is automated using a handful of Bash, Perl and Python scripts written by two system administrators. They are custom, sometimes brittle and get rewritten every 5 years.

2022: your infrastructure is automated using 10 extremely complex devops tools. You automated the two system administrators away - but then had to hire 5 DevOps engineers paid 2x more. The total complexity is 10x.

They wrote YAML, TOML, plus Ansible, Pulumi, Terraform scripts. They are custom, sometimes brittle and get rewritten every 3 years.

EDIT: to the people claiming that today's infra does more things... No, I'm comparing stuff with the same levels of availability, same deployment times, same security updates.


Not pictured:

2005 your average box served some php and static assets, connecting to some generic relational database. Reading logs means grepping files over ssh.

2022 your architecture runs in the cloud, has multiple flavors of databases, queues, caches, and so on. You have at least two orders of magnitude more complexity because you aren’t just serving a web page anymore - you handle payments, integrate with other services, queue tasks for later, and so on. Your automation may be an order of magnitude more complex than 2005, but it enables two orders of magnitude more functionality.


My classic C programmer curmudgeon take is that the root of the problem, as with everything else in this industry, is bad software built on top of bad software built on top of bad software... and on it goes.

The systems in your 2022 world are hard to test and maintain because they are bad, and the tools we built to test and maintain them are largely built on the same foundational ideas and technologies, so they are even worse.

We're going to have to rip everything back down to the foundation in order to make progress beyond finger-pointing (X is DevOps but Y is software engineering and Z is IT admin).


> My classic C programmer curmudgeon take is that the root of the problem, as with everything else in this industry, is bad software built on top of bad software built on top of bad software... and on it goes.

No, not really. Nowadays things are way way better than how they were a decade ago. Nowadays you can get pipelines that build and test on multiple target platforms with a dozen or so lines of code, and things run so well that they became reliable infrastructure integrated with your prod system.

Back then you had none of this, and had to pay the salary for a "build engineer" who worked like a wizard stirring a cauldron just to get a single production build out of the door.


I’ve been cultivating this opinion that it’s the other way around. In any other domain, few people use crap tools to make something amazing, and I think it comes down to what you surround yourself with informs what you make.

We need better tools if we want better things.


Back in the late 90's I worked on a system that shipped on 4 different chip architectures, 4 different (unix-based) operating systems, dealt with different endianness and was more reliable and easier to understand. And it was more responsive to users with 1990's hardware than stuff is today. :shrug:


The bad things are still around in your infra even if you manage to never see them because you only see the "dozens lines of code" at the front.


Someone who really buys that "dozens lines of code" will definitely has no idea what kind of complexity it was about.


Oh sure. Can you try to set up ANY CI pipelines to make it more or less working, don't worry about "pipelines that build and test on multiple target platforms" and get back to share with us your experiences?


What exactly do you want to hear? Not op, but I’ve been doing it for some time.


How much code did you write to get it barely working all by yourself? Or you were just working on top of someone else's work? I did it a couple of times, it was never something like "dozens lines of code".


I don't think we need to muddy the water by saying bad software. Just write the word software. It's multiple layers upon layers. This isn't automatically bad.

There's so much more going on. I don't know if the complexity is all required but we're doing more now. There's just more going on. That's how it is. We need the abstractions. Maybe not all of them. But we need more abstractions and tools running now. There's so much to manage.

I don't see us reducing this complexity that much other than consolidation and some simplification to clean things as they get sorted further. But there is still a need for the layers. It's not 2005. Things are more complex.

The nearest analogy is would you like to code everything in line numbered BASIC or would you like to do it using some modern equivalent? Even with the additional layers there has been real progress in the tools. Improvements across a multitude of metrics. It's not all complexity for the sake of complexity.

Smile. It's another stage of progress. The old mess will fall away. At the next stage what we see as an improvement now will be the next stage's mess. There's a whole slew of problems that will need even newer tools.

We haven't even scratched the surface of the automation we'll require in ten years time.


What software is bad precisely? The modern browsers? The tools we use to build modern browser apps? Perhaps your gripe is with something else entirely, like modern databases? Or is it the OS that you don't like?

Really not sure what you're commenting about here


Some people feel anything not written or created by them is "bad".

It doesn't matter to me who turned the wrench, but in my career I've encountered these types where it's only good enough if it's theirs. And IMHO their output is generally quite atrocious. Root cause: Psychological issues.


Shit most of the stuff I write is bad too.

What’s missing a lot is empathy, and that takes slowing down and observing.


We have way too much empathy for bad software. Honestly, it's almost all total shit. If my care was as bad as the median level of software, it would last 30 days and then be broken down in my driveway until the scrapper hauls it off.


What part of the stack you think of? Because my experience is that the underlying stuff Just Works^TM. Sure, it does have the occasional hiccup, but 1 out of 1000, it is some higher level tool that has a bug.

Just to share the terminology, under underlying stuff I mean the linux kernel, the JVM, etc, while higher level stuff could be certain build tools (khm, looking at you npm), and of course end user tools that are unfortunately way too buggy. Like I have a hard time listing end user applications where I didn’t notice a bug yet.


I feel like there's an opportunity missed here with the transition to clustering for us to build a small kernel for running simple services and moving things over to run on top of these things.

Give me a tight, highly coherent API, with a culture of avoiding feature factory work. At least get "Make it work, make it right, make it fast" to not skip "make it right" anymore. That's sort of the core illness in software today.

Open source doesn't have to be pushed by business concerns. There's a degree of regulatory capture, but the biggest problem is that we just don't know any better. We repeat what we see, and make sure our own pain points are handled. I've seen this play out in API design where an awful API is introduced, and each person who fixes it only fixes 20% of the pain, and so it's 10 years before we get from a mediocre library to one you could actually call 'good', because nobody made 'good', they made better than better than better than ridiculous.


The sad part is when encountering this situation and the person who you wish could develop some empathy is completely unreachable and hostile. Terminal.


It’s fine. Works as expected. Closed due to no activity for 30 days. File a PR if you’re so smart.


I wish I got to work with you everyday!

In my past at $medalyaorwherever it was a fucking nightmare everyday, the Architect wouldn't let the PR close.


Honestly it's not always a gift to have someone point out to you the 'movie continuity errors' to you all the time.

I tell my boss you want 2-3 people like me on the project. If everyone was like me we'd make each other miserable. I have worked on trying to frame these things in a positive manner. Unfortunately the guy who would have been the best mentor in that regard went into semi-retirement.


They said that the world is bad software on top of bad software, so it’s not one thing. It’s everything being bad and working around everything else being bad.


> They said that the world is bad software on top of bad software, so it’s not one thing.

I don't know about OP, but I personally feel like software like GCC/clang/MSVC and Debian/Ubuntu and Docker and SSH and Git are pretty great, and were never better.

Heck, the whole dotnet ecosystem is turning some significant problems in developer experience into at most minor nuisances.

Even Firefox+Chrome are stellar, and extremely solid as-is.

Where exactly is this bad software OP is talking about?


In my experience it's the custom scripts in an attempt to separate infra from developers.

If you are using one of the major cloud providers and instead of just using a native IaC scripts, you have decided that it would be better if the developers only had access to your custom kubernetes operators to manage their infrastructure then you are the problem.

If you are using a deployment pipeline that developers have zero involvement in, and they just "import your Jenkins scripts" then you are the problem.

If you are using major cloud provider and rather than just using their managed kubernetes, you have decided to deploy your own, then the chances are that you are the problem.

Rarely have these approaches ever been stable, in my 20+ year experience they have never been stable. And they are the reason that "real DevOps" came along because developers could do a better job with less downtime if people actually engaged them.


Yea but they didn’t explain why its bad. Software devs complaining about “bad software” is basically an industry trope at this point but often it just means “Its too complex and I don’t understand why that complexity is probably necessary” or “Its not written the way I would have written it”.


Why not all of the above?


To whom does "we" refer. The classic C programmer curmudgeon. Those commenting on HN frequently use the term "we" but reading HN one can see that, amongst those commenting, there is a divergence of opinions about software. Who is "we".

Asking prospective "DevOps", "software engineer" and "IT admin" candidates to program in C is an effective way to weed out those who are not truly capable of writing "good" software. It is easy to find mistakes and expose the incompetent. That may offend the incompetent who believe they are "good" programmers. Thus there are new programming languages created every year, more people using them to write "bad" software, and more people who feel emboldened to attack C as the source of problems, instead of "bad" programmers.

C is truth serum. It is difficult for people programming in C to pretend they are "good" programmmers. They will be found out. It is funny how people on HN attack the language for exposing so many "bad" programmers. Under this theory, programmers are absolved of all responsibility for their mistakes. The language is at fault. "Good" programmers do not blame languages for their own mistakes.

IMHO, there is sometimes a benefit to languages that make it more difficult, not easier, to create and sustain Rube Goldberg complexity. Not to mention languages that compile to smaller, faster programs. "Programmer productivity" is ambiguous. For example, it could be a positive for a programmer trying to justify the expense their salary presents to an employer or it could be a negative to people who are forced to use an ever-increasing quantity of "bad" software. It could be a positive to a programmer who wants to keep implementing new "features" or it could be a negative to a software user who dislikes "feature creep".

The dichotomy of good versus bad software is such a subjective topic that the term "we" really needs to be defined. Different groups of people have different interests and therefore different opinions.


Hey, I've written more C code than anything else and was generally considered pretty competent.

> The language is at fault. "Good" programmers do not blame languages for their mistakes.

I also know that my time to get something functional in higher level languages is often 10x less, and my probability of having very subtle bugs to hunt is much lower.

There's a super-weird tradeoff here. There's all kinds of modern, high-level techniques that improve programmer productivity and reliability of bigger systems.. They convert people who can't be productive C programmers into doing OK work.

But they're also slow and a bit opaque in how they work. And it's really easy to run out of performance and have to do exotic things to get them to scale, in which case you have to build bigger systems still and cede a lot of those advantages.

Whereas, if you write to the metal, you can get a lot of performance out of a single large computer.


Reminder that even absolute expert C programmers can get better performance from completely naive Rust than optimised C: [1].

[1]: http://dtrace.org/blogs/bmc/2018/09/28/the-relative-performa...


I kinda knew this comment would show up. It's completely irrelevant / orthogonal to what I'm saying. But I knew someone would have to beat their favorite horse :D

Yes, Rust may have somewhat better programmer productivity than C, but it's not really at a massively higher level of abstraction.


Not requiring pointer arithmetic in a day-to-day basis and having a proper module system absolutely puts it at a higher level of abstraction.


That truth serum sure has a long list of memory vulnerabilities causing tremendous amount of damage.

Come on, C is not an exceptional language from any point of view. It just pretends that everything is a fast PDP-11, and thanks to the insane amount of work and mental gymnastics done by the compiler, it will output good code. It has shitty abstractability that hinders performance if anything (just look at the small string optimizations done by C++) and hurts maintainability and productivity. At least it should have a proper macro system, but preprocessor macros are just a disgusting hack.

It is if not outright a “bad language”, it should definitely be left behind outside of maintaining the litany of programs written in it, and one should start new programs in either managed languages (if absolute control over the hardware is not required), or at least in C++ or Rust.


Interpreters for popular languages such as Perl, Python, Ruby, etc. are all written in C. The original Go compiler was written in C. Even the people developing Rust used Flex and Bison when trying to formalise a grammar.

The question is whether C is useful. For example, in building things from the ground up. Or vetting programmers.

Another way to vet programmers is to observe how many negatives they try to cram into their sentences. For example, "Not being... doesn't... non-popular."

If they are unable to make clear, concise, positive statements the same deficiency is likely to carry over to writing programs.


UNIX and C are still being taught at Stanford.

https://web.stanford.edu/class/archive/cs/cs107/cs107.1222


Not being an exceptional language doesn’t make it non-popular. There is plenty of programs written in it and certain domains are simply 100% C (particularly operating systems)


The reason is this: To make money in a gold rush, sell shovels.

Everyone wants to bank on the software craze, and so everyone develops pointless middleware solutions.


That's a bit disingenuous to actual engineers trying to solve problems and genuinely believing their solution can make things better.

Also, looking at principles over tooling, DevOps has a lot to offer that trivially make sense.


You're forgetting the conferences, books, trainings, consulting gigs.


Talk about a low-effort take on the state of modern software. You know what’s also “just bad”? Most systems software written in C.


I was put off by the low effort take as well.

Personally, I’ve worked through having this mindset myself. I “grew” up with the C and tots descendant family of languages. I wrote shellcode in C and ASM as an exploit dev. I later learned python and now ruby, arguably some of the most abstracted languages.

At each iteration I openly opined at how “C would let me do X without the handcuffs I put myself in by using Language Y”. It’s really just senseless complaining. There is a reason why someone chose Ruby over PHP. PHP over pascal, Pascal over Ada and so on. The point is, to scoff at the product as a bystander is just immature. Throwing shade on an entire industry based off some weird pet purist outlook is just immature :/.

Of course, the irony here is that this type of mindset is no different from the apprentice carpenter who scoffs at every new house he walks through that he didn’t build. It’s no different from the wine snob who cringes at what his mom puts out when company is over. It’s no different from the tens of submissions HN accumulated in a week with something of the tune “I rewrote X in Rust”.


We could do a case study or something, but that would be a blog post or a book, not an HN comment. How much effort are you looking for here?


Your classic C programmer take comes from a time and place where servers were pets and nobody had to, or could, scale very big.

The fact is that businesses either are or are converting to cloud-first because why incur the risks of maintaining the one special-snowflake server that runs your business when you can turn a crank and spin up as many instances as you need -- theoretically very many as you take on more customers and transactions and need to handle the load? And ypu can just restart them if they go down?

Stop thinking technology and start thinking BUSINESS. The cloud may be more janky and complicated than what you're used to but it serves the business's needs better.


Lots of companies excel at playing big business without actually being a big business (or having a remote chance of becoming one). They simply lug around the kind of infrastructure that might be suitable for a company ten times their size. Ironically, it isn't rare that that extra infrastructure and the cost and effort to maintain it limit them in the marketplace.


> The fact is that businesses either are or are converting to cloud-first because why incur the risks of maintaining the one special-snowflake server that runs your business when you can turn a crank and spin up as many instances as you need -- theoretically very many as you take on more customers and transactions and need to handle the load? And ypu can just restart them if they go down?

I'm pretty certain you can do the same with software written in C (or any other language) without resorting to AWS/Azure/GCP/etc.

You can do the same cranking on your own hardware, except you keep the hardware after the peak has passed and you need to plan for the peak.

Or, you can simply get a DO droplet, and crank out as many new instances as you need, and turn them off when you don't, with software written in C (or anything else).

There's no need to go all-in on the AWS/Azure/GCP solution, with the ability to orchestrate instances as cattle, when you have, at peak, 3 instances[1].

[1]If you wrote your software in C, or Rust, or Go instead of Python, or Ruby, you could potentially get away with fewer instances than you need, thereby reducing to treating the system as a system of pets because there are so few of them.


> We're going to have to rip everything back down to the foundation in order to make progress beyond finger-pointing (X is DevOps but Y is software engineering and Z is IT admin).

1. Never going to happen unless everything collapses, which is extremely unlikely.

2. We are progressing. AR, VR, GPUs with raytracing, complex web applications and global services, global cloud, etc.


You are not alone in thinking that, see 'The Thirty Million Line' talk by Casey Muratori.

(Long, but worth it, assuming you haven't seen it already)

https://www.youtube.com/watch?v=kZRE7HIO3vk


Expecting all software in a properly written stack to be "good" is a particularly bad take, and one that leads to brittle software. Best to accept reality, deal with it and move on.


The big question is: Do you need that kind of functionality? I agree that very large and complex infrastructures have their place - the problem is just that they import a ton of complexity and usually cost a lot.

People are always suprised when they see a minimal webserver instance on a ten year old debian [0] handling tens of thousands of requests, without issue. It might go down once a decade because the disk failed, but it cost 1200$ to run over that decade. I don't think that's the perfect way, but modern infrastructures love to include a lot of complexity when it's not needed. The problem is that including something is very easy and the cost is only payed once it breaks. Also, hardware is really cheap (you might not think so, but if you compare it to western IT salaries, it is).

[0] I know outdated instances with no update strategy are not really a benchmark, but I encourage you to go out and ask people what their container base image upgrade strategy is - the situation really did not change that much.


> The big question is: Do you need that kind of functionality? I agree that very large and complex infrastructures have their place - the problem is just that they import a ton of complexity and usually cost a lot.

I think that's the key. Most of us aren't Google/Facebook/whatever. Yet lots of people want to mimic those. A simple stack with a relation database on a single machine still goes a long way. In fact, it goes a lot longer way today than it did 20 years ago. It's ok to scale when you need to, but I think premature scaling is the wrong way to go, especially since most things never get to FB/Google/etc size.


you stick with an old and known since its getting the job done, customer is happy and you can get a good night sleep too


I understand trying to keep complexity low. But I don’t understand how a relational database would be less complex than a simple key value store.

Every time this argument pops up, I end up wondering if it’s really about architectural complexity, or if it’s about sticking with the old and known.


First, I'd say the key value store is actually older than the relational database, by at least a couple of decades. Of course, not the current ones, but we're also not using the relational database from the 80s so ...

Now to try and answer your question, I think a relational database is a lot more complex than a simple key value store, but if it's a good implementation, it will hide this complexity from the user, and using the database will be rather simple. It can be as simple as a key value store since you can just use a relational database like one. Note that this hiding of complexity is by design, it is one of the main goals of the relational model, it's not accidental.


A lot of relational databases offer a better (or more useful) consistency model than k-v stores, assuming your data isn’t just a bunch of keys and values. Having to worry about consistency can significantly your worsen complexity of your application code.


MongoDB is complex, that’s true. But DynamoDB or GCP Bigtable are pretty simple.


1. MongoDB is not complex, the driver structure sucks.

Example Go's mgo driver by Gustavo Niemeyer was simple and effective, but abandoned. The official drivers is unnecessarily complicated. And Go's need to have "context" everywhere adds to this, but MongoDB is not complex. Idk I'm not some super genius and picking up MongoDB was really easy. Querying (aka aggregation pipelines), you have to think of that as "pipes in bash". It's in the words aggregation pipelines. find | groupby | sort | filter | select. Something like that. It's not SQL it's different, but not complex, sorry.


Simple indeed, they don't even offer anything that matches SQL flexibility.


Exactly my point. If the argument is to avoid unnecessary complexity, then how come a SQL database is the better choice?


Because not having proper query language turns simple wishes into complex solutions.


But it does have way to query, if you of course only know SQL and didn't bother to learn the MongoDB way to query then for you, the uninformed outsider, it might seem complex. But so does ArangoDB or Neo4j or GraphQL.

Like, if you were never exposed to rxjs and are now trying to build things with it doing

  $stream.pipe(
    switchmap(),
    filter(),
    etc(),
  )
It does seem more complex than

  stream.map().filter().etc()
but it's only so because you haven't put in the effort to learn that way.


Sure, whatever.

Now write DBA scale ISO SQL 2016 / Transact SQL / PL/SQL in that interesting Mongo flavoured language, including database engine debugger integration, and JIT compilation.


In what context? A hosted database might be "simpler" to operatoe, but BigTable is quite complicated and does a lot of complex, low-level, work to be very fast.


Throw ACID requirements on top of a simple kv store and it is not a simple kv store any more


> tens of thousands of requests,

a raspberry pi can pretty much serve 100k req. / second. An average dev laptop should be able to handle 1M req. / second without much issues


> a raspberry pi can pretty much serve 100k req. / second. An average dev laptop should be able to handle 1M req. / second without much issues

Depends on what you do. It's quite easy to hit these numbers with a low number of static pages, but apps that can easily work with static pages usually don't have DevOps people. That's why I went with thousands, to have a reasonable number for an app that actually does some complex processing.


> a raspberry pi can pretty much serve 100k req. / second

I appreciate the sentiment, but my intuition says that number is too high. What would that look like? I can't picture nginx or haproxy serving up a static 'hello world' response at that volume on a Pi3.

And if you do anything with a slow-ish 3rd party API you're probably going to hit the TCP socket limit before you can respond to 100k/s.

Legitimately curious about whether my intuition is wrong. I don't have an unused Pi handy to test it myself.

Edit: It's possible you're citing https://levelup.gitconnected.com/serving-100k-requests-secon... ? Pretty interesting article in any case, especially this claim:

> This test pitted a native C++ server against a scripted Node.js one, so of course the outcome was given. Well, not really. If I run the same test using µWebSockets.js for Node.js, the numbers are a stable 75k req/sec (for cleartext). That’s still 8.5x that of Node.js with Fastify and just shy of 75% of µWebSockets itself.

Edit 2: Doing a bit more research and finding some other benchmarks online, it seems like 100k with µWebSockets is plausible. I recant my skepticism.


That doesnt seem right. https://www.diva-portal.org/smash/get/diva2:1439759/FULLTEXT... According to this paper, looks like 1000s of requests is more realistic, both for pi and laptop.


Still enough for 99% of websites.


> Still enough for 99% of websites.

I'm not sure you have much experience working on websites. Multiple deployments are required for performance reasons, which is the main driving force for paying hefty bills for CDNs, and their recent investment in edge computing.

You deploy to multiple deployments in multiple regions in a good part to drive down those 300ms requests to sub 50ms. Otherwise your user gets fed up and drops you in favor of a competitor.

But if you work on a service who has only a dozen users in the same region and can wait in turns to see who can make a request, go right ahead with your production raspberry pi setup.


> You deploy to multiple deployments in multiple regions in a good part to drive down those 300ms requests to sub 50ms

one raspberry pi per continent :p

e.g. I'm in a small rural village in southern france right now and I get 2GB down / 1GB up more or less with a 9ms speedtest ping (which is slow, it's often 2ms) - anyone from europe would get < 50ms ping notwithstanding the latency between them and their isp

by the time my website has, say, > 500k req/second I can definitely invest in, idk, some used 2015 i7 laptop to give myself a x10. For reference, 1M requests per second is Instagram in 2015, one of the most used websites in the world - by the time you're at that scale you've been bought by Facebook anyways.


Those 99% of websites don’t have the performance problem you’re talking about. You’re talking about the 1%.

I bought artisanal knitted socks for Christmas. If their website is faster than their knitting that’s good enough. That website is part of the 99%.


https://en.m.wikipedia.org/wiki/Name_calling

I ultimately agree with you, but in between a raspberry pi and “hefty bills CDNs” is something like a half rack at a Colo, a f5, and a couple PowerEdges.


And then you serve megabytes of js libs that will cause multi-second load times and log every single mouse movement, autostart a video with sound and pop up 10 different bullshit, so.. you would be at the exact same spot from a shitty dev laptop.

(Half kidding)


Heyo, I worked at a CDN for a few years on edge cache stuff. Almost every web site would be well served with a single Pi and a CDN. For almost every web site, the CDN is free.


My user gets fed up because the page takes a second to load?

Not every page loads 50 different resources. Not every page serves people globally. Not every page has competitors.


> I'm not sure you have much experience working on websites.

I might say the same about you. Be respectful.


> a raspberry pi can pretty much serve 100k req. / second. An average dev laptop should be able to handle 1M req. / second without much issues

How big are these requests? If you're returning more than 1kB including all the headers, you're not beating 100kreq/s on a standard consumer-level 1Gbit link.


The argument is about the power of a single machine. If your enterprise is somehow limited by 1Gb links, no amount of scaling is going to address that.


I think the parent's argument is that a Raspberry Pi is limited to 1Gb link, so how can do 100k reqs/sec


That, plus if I open dev tools and look at requests for even this site, the only things that small are three of the images (the orange Y in the corner, the vote arrow, and a 1x1 somewhere).

It wouldn't be possible to get that request rate outside of a microbenchmark, just because of the network speed the pi supports.


it's not, people have been able to put >1G network cards on a Pi:

- 2.5G: https://www.jeffgeerling.com/blog/2020/testing-25-gbps-ether...

- 10G (in practice 3.6G max which is more than what most can expect from their ISP): https://www.hackster.io/news/10-gbps-ethernet-on-a-raspberry...


you can buy faster network cards for a RPi but this is missing the point: 100k req/sec is not a huge volume and a decent server should be able to handle it without issues.


The thing is that sometimes you do.

At my current company we're at cusp when we need to address scalability to able to grow as a business, and the business is growing. It took several years to get to this point, but here we are.

I prefer simplicity as much as the next guy, but at some moment it becomes simpler overall to use cloud services controlled by k8s than to maintain large amounts Ansible scripts and Jenkins jobs.

With that, by the way, comes an inevitable untangling of various parts in the software, and simplification of it. To be fast you need the processing path to be simple.


> The big question is: Do you need that kind of functionality?

What functionality? Automated unit tests to ensure you're not shipping code that is so broken that it even breaks basic invariants? Automated integration tests that ensure you're not shipping code that fails to call other components/services? UI tests to ensure that your latest commit didn't made your prod service unusable for the end user? Performance tests that help you determine if your latest change doesn't suddenly exhaust your computational resources in peak demand?


Kind of depends on what you’re building, right? Avoiding building new features in order to limit complexity is probably a good idea in select cases but not in others. Using more complex, purpose-built technology over standard boilerplate is often very cost-efficient, but only if you run at a scale where the additional complexity is worth the savings…


The people paying my salary sure seem to think that we do need it.


I partially agree with what you're saying but let's not pretend we weren't handling payments in 2005, some of us where anyway. I think what changed is the scale of things: we had a lot less people online back then.

I think the increased complexity in our architectures is correlated with the DevOps role coming to the picture, but I'm not sure there's a cause link there. My recollection from having lived through the initial years of DevOps (I began working professionally in 2000) is that an important goal was to reduce the terrible friction between dev and ops (the latter including sysadmins, DBAs, etc). Whatever the extra complexity we have now, I would not want to go back to the days when I carried a pager and responded to find out the problem was caused by an application I had no insight into, and there was no corresponding on call person on the side of the dev team. Another important goal was to manage infrastructure a bit closer to how we develop software, including the big step of having the configuration (or the code that generates it) in SCM. Another thing I don't miss is logging into a server to troubleshoot something and finding /etc/my.cnf, /etc/my.cnf.<some date>, /etc/my.cnf.test, etc. It's much easier to just run tig on either the file or the ansible/chef/whatever that generates it IMHO.


This times one million. Back in 2005 there would have been so much shit you wouldn't even dream of doing and a lot of things you did do, you sure as shit wouldn't do now. . . we used to write our cache systems. Migrations were all custom scripts. Fucking nightly builds were a thing because we didn't kick of builds on commit.

Unit tests weren't even common practice back then. Yeah, most places had tests but there was no common language to describe them.

And as much as git can be a big complex pain. . . merging was a BIG thing back then too. I seldom deal with long lived branches and the nightmarish merges they often needed.

Also, to all the young folks who "want to work with physical servers" again. Have fun with the next set of device driver patches you need to roll out.

I heart my containerized IaC buzzword laden cloud existence. #FuckThatNoiseOps


It's as much a cultural problem as a 2005 vs now problem because I have experience working at a joint that uses AWS yet chooses to roll their own Kubernetes, uses GitHub yet flails around with long lived branches and 20 minutes CI stages, uses containers but still has 20 minutes build times, uses ArgoCD yet has tiresome "release events" that generally can't be rolled back and take at least an hour to "fix-fowards" after a bad release. Sometimes you can lead a horse to water and even push it into the trough, but it would rather eat dust than let go of its last-decade ideas about operations.


2005 Your app takes some user data, stores it in a database, and presents a filtered view of it to the user on request.

2022 Your app takes some user data, stores it in a database, and presents a filtered view of it to the user on request.


Well if that’s all you’re doing in 2022, there’s nothing wrong with deploying your code and grepping logs over ssh. Bash scripts and offloading some of the complexity of management to AWS (e.g. use autoscaling) goes pretty far


In 2002 I built a website that got about 3m page views a week, with about 50k concurrent users, and a whole bunch of fun things like payments, outbound email, etc. It ran on a single server.

Since then I've working on things that use several servers, things that use no servers with edge functions instead (which have servers, but still), and that use autoscaling across lots of servers.

No doubt some businesses need autoscaling but most don't. It definitely shouldn't be something you reach for before you actually anticipate a need for it.


Autoscaling? Do you mean facing so much cloud management complexity that it invites automation, in order to spend more money for a herd of excitingly ephemeral server instances than for a reasonably overprovisioned boring actual computer?


The fun part is that the 2005 architecture is still plenty sufficient for 99% of deployments but everybody architects for 100x the scale they actually need.


This is a good point, but I'd say that's just the ops version of premature optimization.


No, it's the ops version of premature abstraction - when devs build a whole extensible framework around a simple use case just because "requirements might change later" (which 90% of the time they never do, so on average the work is wasted much more often than not).


I mean yes. . . premature abstraction driven by premature optimization. Or more simply, solving a problem you don't have now, and might not ever have.

Everyone talks FAANG scale on day 1. It's common and absurd.


Usually because most business say they need highly available, scalable infra from day 1.


I typically see it happen in the other direction myself. Business folks describe the problem/product and engineering team says “oh yea we can do that, we’ll need K8s” when they do not, in fact, need K8s


> you handle payments,

Yeah, because obviously nobody handled any payment in 2005 … The same can be said for everything in your list.

And more importantly, it's unlikely that your business is more complex than what it used to be in 2005. And if you're spending more resource to deliver the same business value, you're wasting them. (That's why PHP is still ubiquitous btw, it sucks by most standards, but it's good enough to do business and that's what matters).


Extremely complicated architectures are a liability, not a feature to be proud of; increasing complication (i.e. costs) doesn't mean increasing functionality (i.e. value).

For example, why do you "queue tasks for later"? Do you have exceptionally punishing latency and throughput requirements that can only be met at a reasonable cost by not answering synchronously, or it's because your database doesn't do transactions?

Similarly, what do you do with "queues, caches, and so on"? Meet further extreme performance and availability requirements, or attempt to mask with additional complications the poor performance of inefficient complicated components?

In 2005, but also in 2000, web applications had already progressed past "just serving a web page", mostly without complicated architectures and therefore without the accompanying tools.

I think tool improvements made cargo-culting actually advanced state of the art software architectures and processes easy and affordable, creating a long term feedback loop between unnecessary demand for complexity (amateurs dreaming of "scaling up") and unnecessary development and marketing of advanced (but not necessarily good) complexity-management tools, often by the same amateurs.


I was working on large scale deployments with multiple datacenters, multiple databases, message passing networks and running multiple customer-facing products.

Load balancers and HA setups were already in use. LVS existed. VMs were popular. "CI/CD" existed, without that name.

> 2005 your average box ...

Speak for yourself.


And I had all this in place by 1999. Including the multiple databases, the caches, the load balancing, edge processing nodes all over the world, etc.

Sure we built a bunch of it ourselves, and we had 4 sysadmins, but please don't tell me that this wasn't already going on over 20 years ago.


The future is already here, it’s just unevenly distributed.

By 2004 we were doing perforce and CI. No Continuous Deployment because we were building on-prem software, but by 2008 I was doing svn and Continuous Delivery (similar problem, not hosting).

We would have been if it had been appropriate.

To be fair though, I think a lot of my coworkers suspect i’m talking out of my ass because they’ve been doing this stuff for 5-8 years and think everyone else has too. As tempting as it sometimes is to launch into a lecture., it wouldn’t really help. Everybody knows what they know, and they’re going to know it until they watch it catch on fire and then they will either want to hear about alternatives or they’ll share ideas for fixing it, not realizing they’re repeating something you told them two years ago. Hey that’s a great idea, wish I’d thought of it.


[flagged]


Indeed all of those things existed and they weren’t in widespread use because 99% of organization did not need them. And it hasn't changed.

> 2 sysadmins were absolutely not managing “large scale deployments with multiple data centers”

Please don't make things up because I was there.

> But today it’s actually super plausible that 2 or 3 Devops with modern tooling can manage all of that.

Not really.


[flagged]


I was there too and some things were better, some thing were worse.

The problem is that we’re being sold a lie and it’s hard to swallow.

“Cloud will save you on staffing costs” - no, it just means you have specialists working on proprietary solutions, just like IBM mainframes or storage appliances or load-balancers/traffic-shapers of yore.

“This technology will make rollouts easier” - until it breaks down and is not easy to put together again, you’re at the mercy of your upstream and you had better hope you keep a really solid update cadence and not introduce something you can’t consume later.

“Layers on layers means things are composable” - sometimes, but you need more people to know more things to put it together properly. Running everything from QuickStarts is doomed to fail, but I see so much of it.

Our config management tools back in the day were crummy as hell, cfengine being the only notable one (which was awful), our monitoring systems required a lot more handholding and truthfully: people were not happy to talk through requirements; which was probably why “devs run production” was appealing.

these things have gotten better, but nearly everything else has definitely gotten worse.


All you did is describe the increase in complexity of the infrastructure. Which does not rebut the point, because that is the point. The infrastructure people are using now is much more complicated than what was common 15 years ago.

If you want to rebut this you need to demonstrate the kind of capabilities we have now thanks to this complexity that we could not have before.

> you handle payments

No you don't. Most people handle payments by integrating with a 3rd party service, most likely stripe or paypal.


Good point re payments.

Your system today doesn’t even handle left pad, outsourcing even that to a non deterministic tree of geo-distributed modules.


I find this take apologetic. "But we're running on the cloud, we need this complexity", is a wrong take. Yes, the underlying services may need this, but the infrastructure managing these services don't need to be this complex.

Every complex tool can be configured to work in a simpler manner, and all configuration repositories directing these tools can be organized much better. It's akin to a codebase after all. Many code quality processes and best practices to apply to these, but sysadmins don't like to think like developers, and things get hairy (I'm a sysadmin who does development, I see both sides of the fence).

The sad part is, these allegedly more robust platforms do not provide the transparency neither to developers, nor to sysadmins, nor to users. In the name of faster development, logging/debugging gets awry, management becomes more complex and users can't find the features they have used to have.

Why? We decoupled systems and decided to add this most used feature at a later date, because it's not something making money. Now my account page even doesn't show my e-mail address, I can't change my subscription renewal date, or see my past purchases, why? They're at distant tables or these features are managed by other picoservices which are not worthy to extend or add more communication channels.

Result? Allegedly mature web applications with weird latencies and awkwardly barren pages, with no user preferences or details.

Adding layers and layers of complexity to patch/hide shortcomings of your architecture doesn't warrant a more complex toolset to manage it.


Classic over-engineered DevOps, adding complexity for complexity sake. When you have a hammer...

Everything you described was 2005.


In 2005 there were, for instance, fancy J2EE application servers instead of fancy container roulette tools.


Also not pictured:

* 2005: ship it if it builds.

* 2022: run unit testing, deployment to beta, run integration testing, run UI testing, deployment to preprod, run internationalization testing, run regional/marketplace-specific testing, run consistency checks, run performance tests, deployment to prod.


I think the variable here isn't year, but where you work(ed).


Maybe, but the pipeline required to do all that in 2005 would be considerably more complex and brittle than the 2022 version using tools and infra designed for the job.


2005 was fullon enterprise websphere/weblogic era, so no, if anything it was much more complex (from architectural side) than today's python/nodejs or even spring boot solutions. Automation (bash/ant) plus ci like cruise control already there.


> you handle payments, integrate with other services, queue tasks for later

I distinctly remember buying things on Amazon in 2005. Actually come to think of it, I did everything you list in 2005, at multiple companies.


you handle payments, integrate with other services, queue tasks for later, and so on

This just means “use APIs”, adds {base_url, secret_key} pairs to a config file. Where are orders of magnitude devops-wise?


Integrating with other services makes your software more brittle as there are more points of failure from hardware and networking on your end, to API gateways, networking, and hardware on their end. Periodically those things tend to go sideways which means whatever it was your system was trying to do doesn't happen, or worse it only partially happens. Usually some poor engineer is then tasked with figuring out if this is an on-going problem or a one-off problem? Did it affect a single customer, or is it affecting all customers? Is it affecting a single instance, or is the problem happening across all instances? Is it a problem on our end, or is the 3rd party API misbehaving (again) today?

Generally you need solid APM and logging to support the operations of this.

Then there's queueing tasks for later and making sure you never lose messages between systems. All sorts of fun happens when you hit a capacity limit in your queueing system due to unintended/unnoticed design flaw and all of a sudden system performance starts to tank as things aren't getting processed due to too much stuff in the queue that isn't draining at a sufficient rate. You need to figure out is the queue actually draining or is it still increasing in size? Are all the worker nodes still running or did one (or several) of them hang and didn't die but simply isn't processing any jobs out of the queue but still consuming a spot in your auto-scaling group blocking a new healthy instance from coming online? Is it a problem in your software or is it actually a blip in operations from your cloud-provider?

It's way, way, way more complex than simply "chuck some API keys in a file and she'll be right!". That's hobby project level stuff, not realistic ops of a production system.


I’m literally working at automated finance and we are offloading these cases to client support, and late/next day status requests for b2b peers. The only devops-level technicality there is an in-process keep-alive system which prevents overwhelming the support repeatedly. What you’re describing tries to deal with these issues at the level where it is hard, and it is, but that’s exactly the road to complexity and infra costs (and infra issues as well). As a result, it trades simple client issues for hard technical ones, and you still need support. It’s cool to have an ideal incident-free system, but only when you can cover its costs - and more importantly its evolution barriers - by exactly that quality. Iow, don’t replace humans with machines either until machines learn to wipe their own ass, or until it’s just too much ass to manage efficiently anyway.

I’d fight at your side few years ago, but since starting to work in this field and asking around I realized nobody cares about that ideal way, cause it’s so expensive and hard to maintain, and the cost of change is prohibitive. It even left few scars on my mental health, exactly because I was too anxious to go this way after rotating in environments where devops is praised as something inevitable to survive. If you squint at us at the right angle you still can see devops, but I think we are just expressing two opposite sets of ideas, each applicable in its own type of a business env. You may still call that a hobby-level, and I’d even agree (cause it essentially is) if that hobby didn’t bring as much as to sustain and profit the company at the levels many companies would wish they were at. If it works and brings revenue who cares how it’s categorized.


Also not pictured--

2005: two system administrators wrote all automation using a handful of Bash, Perl and Python scripts AND LEAVE THE COMPANY a couple of years laters

20xx: new hired system administrators continue to rewrite scrips. no shared knowledge because scortched-earth policy in effect

2022: HN: DevOps is a failure -- you just need two system administrators...


Wanted to add, todays pipelines include a whole new level of security that was pretty much ignored in 2005.

Security adds complexity


> todays pipelines include a whole new level of security that was pretty much ignored in 2005.

The very opposite.

> Security adds complexity

If anything, complexity is the enemy of security


The question is, do you need all that or was/is PhP and Mysql enough for the job.


> Your automation may be an order of magnitude more complex than 2005, but it enables two orders of magnitude more functionality.

And can create more than 2 orders of magnitude in terms of economic value (i.e. profit) as a result.


If that was true, why has the economy since 2000 never hit the speed of growth that it had in the 1990s? It seems the big economic gains of the Internet came early, and what we've seen since is the Law Of Diminishing Returns: increasing inputs for decreasing outputs. Bigger investments for less economic growth.


> hit the speed of growth that it had in the 1990s? It seems the big economic gains of the Internet came early

It did, just not in the way you're describing it. This is how it's manifested itself:

> Apple, Amazon, Alphabet, Microsoft and Facebook all account for just under 20 percent of the market value for the entire S&P 500. With a collective value of nearly $5 trillion, these top tech companies easily dwarf other entire industries in the index, with companies like Berkshire Hathaway and JPMorgan Chase falling well short. Currently, the total valuation of the S&P 500 is almost $27 trillion.

[0] - https://www.statista.com/chart/20794/tech-companies-highly-v...


That is not economic growth. A valuation of some companies on the stock market has noting to do with economic growth. "Create more than 2 orders of magnitude in terms of economic value" never actually happened.


> 2 orders of magnitude in terms of economic value (i.e. profit)

Then you clearly misread my initial point. economic value != overall economy growth


>> Your automation may be an order of magnitude more complex than 2005, but it enables two orders of magnitude more functionality.

This!

The primary problems with DevOps are...

1. It is still in its infancy therefore it's changing quickly.

2. Bad (or no) documentation is much more painful than before. A single family house without blueprints can, usually, be adequately serviced; whereas, a 27 story office building cannot.


Missing the point:

That code in 2005 was better in every way.

Context: just learning loopback API framework. What a steaming pile of overengineered garbage.


So someone overengineered something in 2022, and therefore, nothing's better?

How about my anecdata:

2010: your infrastructure is automated using a handful of Bash, Perl and Python scripts written by two system administrators. They are custom, brittle in the face of scaling needs, and get rewritten continuously as your market share and resulting traffic grows. Outages happen far too often, you think, but you would, because you're someone who gets paged for this shit, because you know a core system well... ...that got broken by a Perl script you didn't wrote.

2019: your infrastructure runs on EKS, applications are continuously deployed as soon as they're ready using Jenkins and Flux. You wrote some YAML, but it's far better than that Perl stuff you used to have to do. The IDE support is like night vs day. You have two sysops, or devops, or whatever, who watch over the infra. You've had to attend to a system outage once in the past two years, because an AWS datacentre overheated, and the sysops just wanted to be sure.

You write some YAML, the sysops write some CDK. Your system is far more dynamically scalable, auditable, and reliable.

My anecdote can totally beat up your anecdote.(In other words, this is a silly line of argument)


My 2020 looks like this:

no docker no k8s

1 server

  git repo
  /var/www/domain_name
  git clone git_url /var/www/domain_name/backend/
  cd /var/www/domain_name/backend/
  go build
  
Updates

  git pull
  go build
  systemctl restart domain_name.backend.service
I pay 46€/month and I'm looking forward to halve those costs. Server load is mostly <0.5 I call this the incubation server. If a project takes off I rent a more expensive, but dedicated, server. It's very unlikely that I ever need more than 1 single server per project.

I will never write microservices, I can scale fine with a monolith. Lately I even moved away from JS frontends to render everything with Go on the server. Yeah it requires more resources but I'll gladly offer those resources for lower response times and a consistent experience.

Sadly companies that are hiring don't see it that way. That's ok. I'll just stay unemployed and try building my own stuff until something succeeds again.

I had a 7 year long project that brought in 5-7k€/m. The server costed 60€/m. I can do that again. I know it's not your kind of scale or income level, but it allowed me to have a good life living it my way.


I think it's somewhat disingenuous to compare DevOps requirements of 5-7k/m projects with systems run and operated by companies in the mid market.

That said, something I often wonder about is if you could minus out 100% of the cruft systems run by realistic sized companies, exactly how cheaply could you run them and with what DX? Half of the problem is things built by 100 people with competing and shifting priorities will never result in a clean, tidy, sensible system and it's mighty difficult to minus out the effects that the organization scale has on the end result.

I'm currently working through building a hobby project on that as far as I know will only ever have one user, but I'm enjoying the total freedom to take my sweet time building it exactly as nice as I wish the systems I wrangle in my day job would be and I'm 100% looking to run it for free or as close to free but with as much performance as I can get because why the hell not? It's a totally different ballgame.


I didn't understand half the things you wrote. What's a company in the mid market? (nm I looked it up) What cruft? Not a native English speaker, you way of expressing yourself is hard to understand for me. DX?

Are there really things built by 100 people, if so, why do you need 100 people? Why can't you do with 1 lead 1-2 database guys 1-2 code monkeys? Why can't this be done in a monorepo and a monolith? Why does it have to run -in-the-cloud- on other people's computers?

I had a project that was ranked in the top 1000, according to Alexa, on a single server. But it didn't bring in enough revenue and took up most of my time so I shut it down. I re-launched it 15 years later but it's dead. No one knows or remembers the domain anymore save for bots who keep hitting those "fkk-sex-nude-porn" etc spam links.

Back then you could make a site popular with 1€/day on Adwords, nowadays ... lol, this won't even lead to your ads being displayed, anywhere.

Go can handle 500 million visitors per month on 10 year old hardware (= 1 server). Which "mid market" company has 500 million visitors per month?

Writing websites/apps isn't complicated. For me anyway. You figure out the data models, write the use cases and you're half way. The other half is writing the display layer to make use of that data.

You make it all sound like the requirements or the product is different. It's not. It's all the same. You can have observability without k8s. You can scale without k8s. You don't need a managed database. Man this stuff is simple. It's people how are trying to sell you cloud and microservices and whatnot that make it all sound so hard. A good software developer if spending his knowledge and lifetime to build something for you that is built to last, because you can't apparently do it yourself or don't want to. It will last even when he isn't part of your company anymore. He could've built the same for himself and monetize it, instead of bowed down to you (not your personally) and opted for steady income.

I understand how, let's be honest, when we talk DevOps we mean k8s, so I understand how sweet a siren's song k8s sings. But it's ultimately a waste of resources. It's a solution asking for a problem. Until you reach proportions that require k8s you'll be completely satisfied with a 3 server setup, that is Go and pick a database. I promise, I guarantee.

That 5-7k/m project had 30 people concurrent at most. It had 30TB/m outgoing traffic at most. How much does 30TB egress cost in GCP,AWS etc? It used to be about 3k, I believe it's 1/3 of it now.

Why would the principles that are valid for a "small" project not apply to a "mid market company"? More features? So what? The principle remains the same. Boring, same old. Data models, wrappers, display layer. It's the same for all, your beloved FAANG, mid market, small business, single owner.

Whatever.


No one ever said a VPS with a shell script is terrible. You think of scale the wrong way. Scaling is not only about increasing from 10 requests / second to 1000 requests / second. Scaling is about organizational scale too, i.e. how do you ensure going from 2 to 20 developers increases productivity by at least 20x and not 1.5x?

Tools like Docker, Kubernetes and whatever absolutely help in that regard.


This is a build deployment perspective.

I for one do not miss hosts never being patched because all those slight modifications to systems files that were tweaked several builds ago and now everyone is too scare to touch.

I won't miss the 12 month projects to upgrade some dated software to a slightly less dated version of that same software.

From my perspective in Security, DevOps has made life much better.


The ability to spin up a box, have it run insecure code, and then spin it down; and the ability to do that all day long, is worth it for the security benefits that all this complexity entails.


> The ability to spin up a box, have it run insecure code, and then spin it down; and the ability to do that all day long

What's the best way to do that? I have some insecure code that needs to run about 6x a day, and so far my best thought has been an isolated box outside my network that does the internet based fetches, translates the data and then submits them over the web to another service that verifies/checks the output.


At my first company, our builds happened whenever the release engineer (he was friends with the milk man and chimney sweep) felt like "doing a build".

As another example, CI/CD adds a lot more work and maintenance but it results in better overall hygiene.


I run 50+ smallish applications on AWS using Bitbucket Pipelines, Fargate, Aurora MySQL, S3, Cloudfront and a few other services. Most of the setup is scripted using very simple Cloudformation scripts. I estimate that I spend maybe 10% of my time on this and the rest of my time on ordinary dev/architecture tasks.

Before Docker and AWS this would have taken me so much more time.

The only drawback is that we have a hard time finding other developers in the company that want and have the time to learn the setup. It's not very complicated, but require some familiarity with the AWS ecosystem. It can seem daunting to someone who has to learn it from scratch.


> The only drawback is that we have a hard time finding other developers in the company that want and have the time to learn the setup

This is my experience as well in enterprise cloud. I don't get it. Have these people seen what cloud jobs pay for less work?


1997: a team of 5 sys admins, 3 of them functional alcoholics, writing only in Bash scripts, manage over 2000 internet facing SPARC machines.

A lot of DevOps is actually CVOps - stuff that people get familiar with so they can put it in their resume.


Those bash scripts probably still in place and working, meanwhile modern ways of managing the servers have been recreated a dozen times over the last 25 years as every time a new person comes in there's always a better way of doing things.

The difference is now you're a failure if you stay at the same job for more than 2 or 3 years.


CVOps - hahahaha. I've always heard and used the term "Resume Driven Development" but I guess CVOps is a nice term too.


What was wrong with the other 2?


Dysfunctional alcoholics.


It's usually always a solution with an overly complex chain of tools that only do a small part of deployment/security tasks because it's a food chain, where each vendor can eat a part of the company's budget consistently, based on a problem that really isn't consistently solved...

Apps are still no more secure because there are several points where they can be compromised, rather than just a few involved in a less automated, but more easily replicate-able process. Also, I don't need "flavor of the month" skills to get things done. There is always a revolving door of fly-by-night-hype tools and brands that regularly rise and fall in the IT world... I avoid them (new hyped products) like the plague. I'm fine with being the stubborn middle aged IT guy now. :P

It's all a food chain based on making money. What matters to me most is whether money is being made from the product that is deployed, and if it's simple, reliable, and secure enough to be worth development. I don't do my job to make a bunch of companies money by using their DevOps tools.

Screw impressing other engineers with solution complexity every time. Functional reliability always wins at the end of the day. Leveraging a massive list of Ops tools only creates a huge backlog of update work, designing efficiency and simplicity in most of my solutions is what ultimately pleases most of my clients.


I think you mean:

> your infrastructure is automated using 10 extremely complex devops tools

... held together by ...

> a handful of Bash, Perl and Python scripts written by two system administrators. They are custom, sometimes brittle and get rewritten every 5 years.

We learned nothing in the last decades. If at all, complexity for the same things multiplied.


In 2005 your infrastructure provisioning wasn’t automated. The complexity has increased, but so has what we get. Being able to provision new hardware stacks like software is amazing, in 2005 I had to get quotes from hosting providers.


Whereas now nobody cares about the cost because it's all hidden away in a massive bill at the end of the month?


True but to be fair modern infra does a hell of a lot more


2005: no cloud, had to order 1Us, wait, rack up. Needed a DBA for the database, a network sysadmin for the networking, all to serve a simple website with not the same level of HA. We are doing way more now, which needs some more complexity and yes, in many cases we are overengineering it.


2005 Linode sold me VMs, no need for a DBA or network admin for a simple main/reserve website

17 years later and getting more than 364 days uptime out of AWS is apparently "not worth the cost"


2005: you had 4 people that understand what everything did

2022: you have a team of monkeys clicking buttons

Joking aside, it seems like the developers these days don't have the understanding that they did a while back. Not being involved with the nitty-gritty causes them to just write code willy-nilly.


Today's shit has 10x or 100x more throughput, it makes sense that upgrading data response and availability requires more people.

But, todays devops has become a proprietary mix of aws protocols, constantly changing standards and languages.

I still use bash scripting wherever I can, it is much more simple and has been ultimately unchanged for decades, which is nice for compatibility


Author claim is more about how to make people work together on deployment. Rather than a rant on devops tools.

You can do simple things with modern devops tools. You can go off rails with simple scripts. It's not the tooling, it's about engineering maturity and the requirements of what you're building.


IaC wasn't even prevelant or a production ready thing back in 2005. I'm unsure what magical bash scripting would do any of that, maybe it the data centres too!


IaC wasn't a thing because you honestly didn't need code to solve the vast majority of deployment problems. It was a configuration issue.

Not 2005, but a year later in 2006 I was using cfengine to deploy code and configuration to servers from an svn repository. The same svn repository had dhcpd configs that described pretty much every device on the network. The dhcp configs also pointed to a tftp service from which new nodes pxe booted to an installer which pulled down the node specific kickstart, and provisioned the machine.

We didn't call it infrastructure as code, but it sure fucking smells the same.


Perl/Bash/Python? Maybe for non-mission-critical things. By 2005 everyone who needs stability/scalability were using J2EE.


2005: your infrastructure powers a webshop with $1m annual revenue

2022: your infrastructure powers a $5b startup


All you’ve demonstrated is that some folks are bad at SRE.


>to the people claiming that today's infra does more things... No, I'm comparing stuff with the same levels of availability, same deployment times, same security updates.

ok but in my experience it seems like more things are being done in places I see with devops nowadays versus back then. I mean I know you say that it's the same, but it's hard to believe your statement in a comment versus my lying eyes. It seems more likely to me that your two examples are both actually fictitious and thus it is easy for you to say that they are exactly the same in what gets output - or have you been at the same place for 17 years, seen the changes, yet have had no input on the company to stop the madness? Because if the latter that would also seem... weird.


Were microservices a thing back in 2005? Honest question, I always assumed that SOA was more of a newer philosophy in web software. The scale of what we build has changed a lot over the years, as well as the need to handle the variance of scale through techniques like auto-scaling. All of that adds an incredible amount of complexity in systems that surely didn't exist 18 years ago.


I don't think SOA was in common discussion in 2005, I think more about 2007 would sound right, Rest was pretty much the winner by 2009. But perhaps my memories here are warped by my personal career and not having to have the arguments after 2009.

No I don't think microservices was anywhere at that time, although one could argue that they are a repackaging of the ideas of Small Pieces, Loosely Joined https://en.wikipedia.org/wiki/Small_Pieces_Loosely_Joined or an architectural expression of loose coupling https://en.wikipedia.org/wiki/Loose_coupling for corporate data.


> If you look at a DevOps engineer job description, it looks remarkably similar to a System Administrator role from 2013, but...

> If DevOps was supposed to be about changing the overall culture, it can’t be seen as a successful movement. People on the operations side of the fence will...

As someone who was keenly watching this stuff back 15 years ago, parts of this article connect with my understanding, but the core problem I have is that this article itself is somehow bought into the mistake that led to the failure and so almost can't see the failure for what it is: the entire point of DevOps was that "operations" isn't a job or role anymore and has instead become a task that should be done by the developers.

Ergo, if you even still have operations people to comment on it--or certainly if you are somehow hiring dedicated "DevOps" people--you aren't doing DevOps and have already failed. The way to do DevOps is to fire all of the Ops and then tell all of the Devs that they are now doing DevOps; you simply can't have it both ways, as that's just renaming the same two camps instead of merging them into a single unified group.


I've worked in ops roles since about 2000 after a few years in backend corp IT stuff.

I agree that what you've described was the original intent and goal of "devops" but in light of that failure, the "cross functional team" definition took over and then in light of that failure, the SRE was born and we're basically back where we started but now the ops people use git instead of rcs.

In my experience and opinion, developers are really bad at ops and sysadmins/ops are really bad a development. Anyone that is truly good at both is a unicorn that is probably carrying their team.


Hey, this comment inspired me to create a pool. I'd like to know what is the distribution of unicorns between HN crowd :)

Pool: https://news.ycombinator.com/item?id=31891675


Why not sit down and figure out how we can train more unicorns instead.

And maybe make the tooling helps at this


because it's not "profitable", it just doesn't make sense for 95+% of teams.

most teams are not building the next facebook. most teams have at least some stability, most teams benefit from delegating specific tasks to specialists (eg at project kickoff they talk to the various leads they sketch out a design, agree, and get back to their respective terf, and when it comes time to deploy it they again talk to whoever and in a few iterations it gets deployed, it goes into testing and then into production, and that's it)

sure, there's always bitching about how it's not agile, but ... like I said, they don't have next-facebook-like money.

maybe Netflix is the best example for this. everyone was in awe of them for how they are going all in with Cassandra on AWS, microservices, flamegraphs, circuit breakers, 30% of the Internet traffic, and many groups/startups started copying them. but forgot a few tiny details like paying half a million dollars a year to new recruits and having a cashflow that can sustain all the aforementioned things.

almost every group would benefit from a more holistic knowledge of whatever they are doing as a whole. but just as there seems to be a natural limit to how many peers one can comfortably have at the same time (eg Dunbar's number) it seems people naturally like to set up softer or harder boundaries for their IT knowledge. ¯\_(ツ)_/¯

... and yes, tooling seems to be the place where this kind of complexity should live, but then maintaining that tool becomes the real challenge :)


I would argue that it is profitable, as it is far less expensive than the current shitshow. In particular for smaller companies. I have seen team of "unicorns" and you can do the equivalent of 100 devs organisation with a couple team. WhatsApp or Discord come to mind.

Also that tooling could be opensourced and shared you know :D It does not have to be a price paid by everyone building their owns.

But it would indeed have to handle the real problem. Not like the current tooling that our ops people force down everyone throat :cough: K8s :cough:


> the entire point of DevOps was that "operations" shouldn't exist, and that operations is a task that should be done by the developers. Ergo, if you even have operations people to comment on it, or if you are hiring dedicated "DevOps" people, you aren't doing DevOps and have already failed.

This. My first thought when I was reading the article. Spot on


> the entire point of DevOps was that "operations" isn't a job or role anymore and has instead become a task that should be done by the developers.

This akin to saying "frontent developer isn't a role anymore - both frontend and backend should be handled by a full-stack developer". This works for small companies/projects, but bigger ones can benefit from specialization and division of labor. Body of knowledge required to be a decent software developer and a decent ops engineer is too big to fit into one head. I've seen ops work being done by developers without ops experience and more often than not it was ugly - they didn't had enough experience/knowledge (and time/incentives to gain them) to do ops work well.

To me the best part of DevOps isn't about roles but about team structure - splitting all Dev in one department and all Ops into another usually is a bad idea. And a failure of this split was a motivation to start DevOps movement. Having Ops embedded into Dev teams in my experience works much better.


> but bigger ones can benefit from specialization and division of labor

I think it is central to the DevOps concept is that dev vs. ops segregation—at least on the small team level, perhaps not at the individual level—is a counterproductive division of labor that inherently fosters micro-optimizations on both sides of the divide that are counterproductive to effective value delivery. On a continuously available software service, the lowest-level product team should own it's components soup to nuts rather than having a dev team throwing hopefully-deployable code over the wall to an ops team.


And that's great until you have three-ish "lowest-level" product teams and they're all managing not only their own production systems, but their own deployments and testing pipelines and monitoring stacks and secrets management. At some point, you need to start unifying that stuff to keep things manageable, and if those systems are everyone's responsibility, they're no one's. So you make another team for whom the internal shared infrastructure is the product, and the rest of engineering are the customers - they still write their own Terraform modules and whatnot, but they run them on DevOps-supplied platforms. That seems to be what modern DevOps is becoming.


these are not mutually exclusive. if those product teams are under one big umbrella then nothing stops the one holding the umbrella (C-level or whoever) to set the standard. you can use these languages, these CI tools, these package repositories, etc. if you need something exceptional ask.

the important aspect is that there should be enough working knowledge about these tools/processes/systems in those teams that they can work efficiently, and they can respond as the whole business evolves. (eg. scale up/down, extract and hand over or accept and integrate components, integrate other APIs, etc)

> and if those systems are everyone's responsibility, they're no one's.

again, there's a cut off eventually. for example many companies just use GitLab for CI. at that point it's up to the big umbrella to decide whether they want to be in one or many GL orgs.

> That seems to be what modern DevOps is becoming.

sure, but that cutoff seems to be infrastructure vs product. which seems a bit more healthier than cutting the software lifecycle in half.


One can have both - team(s) which own shared infra like k8s, monitoring servers, CI/CD servers e. t. c. and ops embedded into development teams.


> Body of knowledge required to be a decent software developer and a decent ops engineer is too big to fit into one head.

the usual answer is that in a good agile team everyone should be a little bit T-shaped

https://www.cybermedian.com/scrum-team-i-shaped-vs-t-shaped-...

of course domain experts are real (or what's the term nowadays?), so specialization makes sense. comparative advantage and all. but the idea is to lower the (coordination, communication, conflict due to inevitable misalignment between separate teams) overhead by "onshoring" the basics (eg. writing tests, basic CI stuff, deploying)

the devops manifesto (which allegedly does not exists, but you get the point) basically calls for giving people tools, permissions and authority to do these basic things, giving teams ownership of their stuff. and of course this doesn't mean fire every sysadmin on sight :D (even if that would definitely help with the process of re-owning some ops taks to dev people)


> The way to do DevOps is to fire all of the Ops and then tell all of the Devs that they are now doing DevOps

That's like saying "agile is firing your scrum masters and tell your developers you're agile now".

The idea behind DevOps is that applications are provisioned by the people who know the application best, the developers. If everything works out as it should, this gives additional load with creating the deployments, but also removes the overhead when dealing with operations when deploying, updating and debugging - so a net zero in workload, but a gain in the way the application is hosted better and fixing bugs is easier. You still need operations, both for providing the underlying platform (getting a server ready is not a developers core business and it shouldn't be) and for guiding the developers. It should be leaner, but you still need it.

Of course, you can also fire all of infra and tell the developers "that's your job now", but that's like calling biweekly deadlines scrum (and leads to equally bad outcomes).


> That's like saying "agile is firing your scrum masters and tell your developers you're agile now".

This would be an excellent idea in many teams.


Yes - and it's also true for many DevOps people. Some would argue that's even true for many managers. It's just not a good idea if you want to get where these people where supposed to get you when you hired them.


I’d say that is absolutely untrue in the case of agile.

Those promoting scrum (and especially those with certifications in it) are going to lead to scrum, not an agile process that values producing useful, working software each iteration.


This can be true, but I would argue not always. Some DevOps teams work in the old mode of “throwing code over to Ops to run” - this isn’t what DevOps intended, but happens.

When they work well, they’re doing things like authoring reusable (by product eng. teams) infrastructure modules, or helping to build “you build it, you run it” tooling like monitoring stacks etc. They’re also helpfully/hopefully subject matter experts on CI/CD, your cloud/hosting of choice, security stuff - things that general developers have mixed levels of interest or competence in.


That is utter BS. DevOps means Ops and Devs working hand in hand in a crossfunctional team. Nothing more, nothing less. The main idea was to tear down silos.


Well, you can't simultaneously tear down silos and continue to have two silos... I would hope that would be obvious? The new cross-functional team is made up of the Dev people who are now doing Ops and the Ops people who are now doing Dev, with everyone else no longer fitting into the new DevOps reality, which was itself born from the premise that cloud computing was simply obsoleting the floor of dedicated systems administrators you previously had building machines and coordinating workloads and replacing it with a new deployment paradigm where a developer could develop their operations as easily as as they can develop anywhere else in the product's stack. If you have a special DevOps team you hire people into that is simply a renaming of the people you previously had doing Ops, you either haven't internalized this future or are actively rejecting it (which, I will emphatically state, is a perfectly fair position to maintain) and of course are going to "fail" at DevOps.


> Well, you can't simultaneously tear down silos and continue to have two silos... I would hope that would be obvious? The new cross-functional team is made up of the Dev people who are now doing Ops and the Ops people who are now doing Dev, with everyone else no longer fitting into the new DevOps reality,

There's a project I'm tangentially involved in, that has app tiers written in different languages where the devs in either tier don't actually know the language of the other tier. They're still on the same team; they talk at the morning meeting, they work together to get things done, they have the same overall set of goals.

Despite not doing eachother's work (or even being able to), they're not in silos. There's no throwing stuff over the wall.


> DevOps means Ops and Devs working hand in hand in a crossfunctional team.

Not sure how typical is my experience, but in my experience it always meant "ops is yet another task for the Devs".

Seems like this is a good interview question: "What does 'DevOps' specifically mean in your company?"


Coming from the other direction, I've hired for DevOps roles a couple times and as a hiring manager one of my very first first questions to a candidate is to ask them to define DevOps, then ask them to define their previous role(s) against that ideal because DevOps is so broadly used as a title.

It's a great question because it finds misalignment rapidly. Post a job for DevOps and you'll get everything from old-school sysadmins to software developers of all flavors to AWS Certified Somethings applying.


> That is utter BS.

This is not conducive to the desired environment.

On a more even note I would prefer you spend some time looking at the origins of devops and what it means, it’s a contentious term because it means different things to different people.

The original “Patrick Dubois” (founder of the term) meaning was Systems Administration in an agile fashion.

I suspect that you’re repeating what someone else told you and you’ve just adopted their definition, which is fine, but part of the issue I have with the term myself is that everyone has another meaning than everyone else.


At all the places I worked previously in the last ~20 years there was always a sharp separation of development/testing and production environments. I as a developer never had access to any system in production, apart from one place which had a very sophisticated security system in place which could grant you temporary access during deployments. Just think about customer data, and you'll understand why.

So when I hear that someone thinks devops is developers running their own systems in production I always wonder where this is actually possible, let alone whether it is a good idea at all.


I have the opposite experience.

Given that I've been perfectly capable of doing ops work in the previous 5 companies I've worked at, suddenly being unable to do so in my current company because I'm classified as a 'dev', is supremely frustrating.

Especially when you have more experience by yourself than the entire ops team combined.


You're just replacing one set of people with access to customer data with another.

In either case you should be implementing least-privileges. Only the access to data that a person needs to get their job done. More frequently this is developers than operations people.


> The way to do DevOps is to fire all of the Ops and then tell all of the Devs that they are now doing DevOps

That's a way to say you're doing DevOps, but it's not going to work very well.

> merging them into a single unified group

That's the right way to describe it. Just like the old "programmers build it to spec, then throw it over the wall to QA" is out of style, and now good teams have testing specialists in the same room (conceptually) as developers. the goal with DevOps was to stop doing the old "here's a build, deploy it" that caused so much wailing and gnashing of teeth and instead bring ops skills into the team as a first-class expertise. The CI/CD pipeline can mean that development, testing, and release are all together, rapidly iterating and responding to change.

By the way, pretty soon the AppSec/CyberSec people will be folded in, to, so instead of the old "it's done/deployed, run your pentests/analysis tools" secure by design will require those skills to be integrated, too.

Little by little, chipping away at the waterfall


> the goal with DevOps was to stop doing the old "here's a build, deploy it" that caused so much wailing and gnashing of teeth

From an ops point of view that might be the main selling point. From a dev point of view a key selling point was not having to wait a month for every minor configuration change to go though change management processes.


True. It's been a while since I worked at a place where there was a traditional change management process.


I think you’re onto something. From the article…

> our role is to enable and facilitate developers in getting features into the hands of customers

The problem here is this creates the wrong kind of incentives for developers… somehow elevating the to a level where they don’t have to care about how their code works in production.

As someone that remembers being a developer back in the days of sysadmins, we were AFRAID of upsetting the operations people. If your code brought a server down, you were at least going to face some very awkward conversations. The cartoon “The Bastard Operator from Hell” immortalized that era.

Meanwhile at one company I worked at years ago - an airline - the development team was responsible for keeping the system running 24/7. Nothing makes you think more carefully about your code in production than meeting a colleague on Monday morning who got woken up at 2am by your code failing.

While I’m not arguing for hostility in the workplace, giving developers incentives to care about their code in production seems to me to be one of the things devops got wrong


Why is getting yelled at the next day more of an incentive than actually getting paged at 2am?


Well either way there are direct consequences that the author of the code will feel - which is the point here.

And usually pager duty is done in rotation, rather than you only get paged for your own code. It's one thing to ruin your own sleep, but if you ruin the sleep of the person that sits next to you, you start to think about consequences in production.

It amazes me, for example, how often I've seen developers leaving their application logging, via something like log4j, in a default configuration where eventually it WILL fill up a disk, bringing down a server, rather than investing the 10 minutes it takes to switch the configuration to say "rotation" so that only a finite amount of space will be used, or just writing a bash script to clean up old files. And this is something that's very hard to pass on as best practice without sounding condescending - like the only way to learn to take stuff like this seriously is by dealing with the consequences in production.


There's a reason people end up doing plain ops and calling it devops: it's often too costly to handle this as “a task that should be done by the developers”.

Specialization makes people much more productive, because they face the same kind of issues over and over and know how to fix them quickly. When you distribute the load in your organization, everybody is going to face problems, struggle, learn and never reuse that knowledge again.


Developers should have some grasp of ops work, and be able to deal with some ops-related issues, and take part of the design of the ops-side of the software delivery, so as to make sure the infra and deployment workflow they deal with works for them.

But yes it makes sense to me to still have people specialized in ops and infra in teams, collaborating with developers.

Basically, instead of having developers doing everything or just developing and throwing code out to an ops team, we should have developers educated in operations, working in teams with at least one operation specialist (or "DevOps engineer"). That way, you should end up with infra, deployment workflow that really works for the team and is optimized for the needs of the team.


Exactly. It's not about making developers to do more work, it's about having some "DevOps engineer" or Ops/Infra guy working close with the Dev team, by being involved on the day to day development and decision making. Instead of having to open tickets and waiting X time for it to be resolved by the Infra team (who by the way, a lot of time doesn't fully understand the applications, it's constrains, tech debts, needs, etc).


I was just reading the "Building Scalable Websites" book [1] released in 2006. At that time, "DevOps" was called SysAdmins. And there were also DBAs, Network engineers, among others.

> The way to do DevOps is to fire all of the Ops and then tell all of the Devs that they are now doing DevOps; you simply can't have it both ways,

I think this points at what happened: Startup scrappy culture started permeating new technology companies, which meant no budget for DBAs, QAs, SysAdmins and other similar roles. So decision-makers fired all those roles and ask Programmers to fill the voids. At the same time "cloud computing" started to mature, so there was a change from hardware/operating-systems tinkering to software related tinkering.

One just has to see the decline of "SlashDot" which was a very SysAdmin/Operating-System focused website, in favor of news.ycombinator and similar more software-oriented forums.

[1] https://www.oreilly.com/library/view/building-scalable-web/0...


You're right but in reality DevOps teams in 2022 are managing Kubernetes clusters and gatekeepers to all kinds of cloud services to facilitate development.


Yes. I was confused reading this article because author seems to miss an important point. Devops culture is about "you build it, you run it". Not having a dedicated devops team that tries to make developers do things. I Read a book lately on that topic : Team topologies. It explains this concept pretty well


> The way to do DevOps is to fire all of the Ops and then tell all of the Devs that they are now doing DevOps

That's slightly hyperbolic and I'd also argue that there's a fundamental error there since you throw away all your platform operations. Now you have dozens of operations engineers working in separate silos and no coordination and you've thrown away platform operations engineering.

What you really need to do is give all your developer teams pagers and point all their monitoring alerts at those pagers. If they setup an oncall rotation of the developers for their own software or they panic and hire a small team of operations engineers and hand them the pager it doesn't really matter. Then you have the problems of coordinating platform ops engineering with the ops team members in the software teams. Whoever handles Ops for the software teams acts as a kind of PM to interface with the centralized platform engineering roles which is responsible for coordinating across development teams to make things look consistent across the Enterprise.

The problem with the bad old ways was that software would write code and then toss if over the wall for operations to run, and all the shitty disk full pages and whatever other crashing software badness fell onto operations, and dev teams would each individually choose to ship software that was shitty to run and all the monitoring alerts would fall under a silo under a completely separate VP where they weren't responsible for their ops metrics and would chose to work on features to deliver for their management. Give the dev teams pagers and make them accurately feel the pain of running their software and then they can make choices about how much they want to abuse their own embedded operations people that they have to chat face to face with every morning at a standup. If you then fire centralized ops, though, you wind up with dozens of different operational fiefdoms inside of one company with everyone repeating the same mistakes and nobody doing the exact same thing anywhere.

I learned that back in 2006 before "DevOps" was a "word" and I don't know what you call it or if that is DevOps.

And it seems like Kubernetes is trying to be Conway's law applied to that. So you have DevOps embedded in Dev teams shipping containers which run on Kube clusters that provide compute as a service to the enterprise (often another company entirely) and the platform ops teams maintain those clusters. Except now you're entirely missing the communication that I outlined needed to happen between the operations virtual team composed of the platform ops and the embedded ops in every dev team. SREs will claim they don't need that any more and that old school SA operations is a dinosaur except that every now and then I see some SRE begging to know how to ship tcpdump to one of their containers to do some debugging and I know that they're dirty little fucking liars...


It was not hyperbolic, that was what happened to my team. We had two system administrators, one who specialized in Unix and one in Windows. I was the lone programmer, having escaped a previous sysadmin position. I can do it, I don't like it, but I can do it.

Then one day I was told that our two sysadmins had been traded off and we were getting two new programmers, but they wouldn't be programmers and neither would I, we were going to be doing DevOps. I had just escaped that!


The fact that someone calls that DevOps doesn't make that DevOps. That is a good example of how DevOps as a term has failed though.


I just quit a job partly because we lost our key DevOps guy and no serious effort was made to replace them. As a result I ended up wasting huge amounts of my time dealing with operations-level stuff that made it impossible to focus on the key parts of my role (feature development etc.). I subsequently turned down a job offer from elsewhere that explained their policy was not to have dedicated DevOps resources for their SaaS platform (devs themselves being responsible for all deployment and system maintenance), and would do so again. Good DevOps people are worth their weight in gold, and at least in many verticals (e.g. those involving payments ) it's virtually mandated that there is a separation of responsibilities between those writing the code and those responsible for delivering the product to customers. I can't see the need for dedicated DevOps resources going away any time soon.


> I can't see the need for dedicated DevOps resources going away any time soon.

But is what you're describing just.... "ops" without any of the "dev"? I'm not saying that there is not a need for dedicated infrastructure and operations teams at a certain size (and in some industries), but that's not an excuse for devs to feel like they can chuck a new feature over the wall and say "Well, I'm done my job. Please run it and make sure it doesn't break".


No silly, that's what the QA team are for! Anyway, our DevOps guy spent probably most of his time "developing" - just not application-level features.


Agreed. It also goes back to the sys admin problem which devops optimised in that the Dev team owns there code in production.

I've seen success with a dedicated devops member on a team but having a dedicated devops team just introduces delays and latency when fixing pipelines or releasing. The very same problem we had with sys admins.


At a certain point, hiring dedicated Devops frees up x number of developers to continue to develop features depending on the amount of time each developer is spending on performing those devops duties. It’s just another area management can split up job roles to capture more value and allow deeper specialization among professionals.


Sure, but if you're hiring people in a different role that does all of the ops work for developers, this isn't "devops". This is just "ops".


Sounds like a terminology issue then, I consider it "devops" because they're mostly writing code that goes into our git repo etc. etc., they still have to do PRs and code reviews etc. But the code is to handle deployments, not to implement features.


Yes: the terminology problem is the is you are using the terminology wrong.

DevOps is about cross-functional teams, and has been co-opted by vendors to sell products. And delivery of software is the ultimate feature: without it, nothing of value can be produced.


Then everybody else I've worked with over the last 6 or 7 years is also using it wrongly.


Quite likely.


Well then we're just getting into debates about what determines the "correct" meanings of word. I read the original article and most of the comments here using my understanding of the term, and it made sense that way...


(not to beat a dead horse, but I just finished some interviews I was involved in as an advisor and the term DevOps was used a lot in all of them. And in every single case the term was used exactly the way I've understood to mean - i.e., DevOps engineers are those who primarily look after the CI/CD pipeline, generally don't write application-level code or develop features, but do very much spend the majority of their time developing scripts and tools to enable CI/CD).


I thougt the "dev" part was because the "ops" were using code to build the infrastructure.

In my experience that leads to meh code if the "devops" comes from a former sysadmin, or meh infrastructure if s/he comes from the dev side.


I keep hearing that "we are all devops" from my PM. The real kicker is that there is a dedicated DevOps team in my organization, they "just have too much to do already".


“Dedicated devops” is not devops, never has been. Devops is a culture where, super simplified, you run what you built. “Dedicated devops” is just ops


I get some people use the term that way, but in our case, our guy really did spend ~50-60% of his time developing IaC and other deployment scripts/tools (in various languages), and the rest handling the operational side of things. He occasionally touched the application code as needed (renaming configuration variables etc.), but he didn't do feature-level development, nor was he interested in doing so.


This was also my experience.

I used to work in an org with about 200 engineers, supported by a DevOps team of ~6 devops engineers.

They spent (very roughly) half of their time doing operations, and half of their time writing software to make devops easier.

Every other month they’d announce some new tool for us to run automated tests or speed up builds. It was awesome, and made our developers more likely to do their own portion of the ops work.


> They spent (very roughly) half of their time doing operations, and half of their time writing software to make devops easier.

This is what happens in my (non-tech-industry, but heavily invested in tech stacks) employer. A previous manager decided we were no longer System Administrators but now DevOps Engineers. But none of us develops code for our features and none of us wants to develop code for features. We want to build and maintain the infrastructure on which the code runs. Some of it has to be run in-house because we deal with medical information and all that entails.

Our current manager describes not as Developer Operations but Developing Operations. We develop the infrastructure and tooling that our code writers need and do everything we can to get the moving parts out of their way. If the code written by the code writers fails, it is their responsibility to deal with it but if the infrastructure underpinning that code fails, my team has failed.


Ime this unravels quickly as the team and stack complexity grows and product engineering starts shipping software that’s very difficult/expensive to operate. The whole idea behind devops was to avoid this death spiral by giving engineers ownership and responsibility to run their own software. The problem was (still is) that off-the-shelf tooling is just not there and requires very skilled/experience devs to use it effectively. Hence dedicated teams got created and the whole thing died on the vine. Devops as a movement simply doesn’t scale.


I don't see it as massively different to specialisation between front and back-end developers though - we're all devs, we're all comfortable with reading/writing code, but whereas my specialty and focus is on application-level (typically backend) code, the DevOps guys are specialised in writing IaC scripts etc. Yes, in principle, any dev could do it all, but there becomes a point that the mental load from juggling too many technologies outweighs your ability to be productive. And some types of development just take different mindsets - I've done my share of front-end/UI-type development in the past, and can easily pick it up again if needed, but I can't say I find it super satisfying, and it's nearly always going to be produce a better result for everyone to hand it off to someone who does. Likewise for CI/CD scripting. And it also happens that if your primary focus is on the scripts and tools necessary to get software deployed to a particular hosting environment, you're likely to have the skills necessary to take care of the day-to-day sysadmin side of things for that environment (indeed, most "manual" changes to the environment are only done as a last resort, and would be wiped by the next deployment anyway, unless the scripts are modified appropriately too).


'devops' means I'm going to hire you ostensibly to develop software, but in reality thats just going to be your '10%' time if there aren't any operational fires burning too brightly.


“Devops” means build engineering to some

Sysadmins who know a bit of python to others

Developers who can configure nginx to others.

Or a culture; to yet more people.

Clearly it doesn’t have meaning if it’s so undefined.

FWIW the progenitor of the word defined it as “agile systems administration”: what you’re talking about is the 10+ deploys a day talk from flickr; which doesn’t mention devops at all (despite the conference existing prior).


If you are hired as DevOps, or if your organisation does DevOps without any dedicated personnel, and all the devs are now DevOps, you are damn right that everyone'd be fighting all the fires first.

The idea being that the people building and running the application are incentivized to prevent fires as much as possible.

As opposed to completely separate teams, where the devs have zero incentive to prevent fires. Everyone judges them on features/sprint, so optimizing for that is perfectly logical.


That's not been my experience at all. Developers absolutely do need to be roped in to help put out fires when the application is misbehaving, and most of us quite assuredly want to keep that to a minimum. And having reliable, smooth and well-regulated DevOps processes is a huge enabler for ensuring robustness and minimizing the chance of bad code getting deployed to production.


What percentage that works up to is totally up to you and that’s the whole point


I think what it is is that nobody wants to force more responsibilities and complexity on Developers so they hire DevOps people. Then the DevOps systems are so complex that it needs a ultra high quality engineer to run them.

The whole seed of the DevOps movement was that developers needed to do more or the company would fail. Over time, management lost their conviction when developers push back and didn't want to risk losing devs so they "outsource" the devops skills.


Not hiring a dedicated DevOps resource and making your developers do it. Guess what? You've just made your developers do operations.

The work doesn't go away just because you've shifted it. I've seen those places too and worked in some, the developers aren't very productive let's just say.


If the dedicated guy helps devs write deployment scripts, write monitoring scripts, set up backup-restore-verify cycles, etc.. then it's devops. if the devs proclaim that the devops guy should do it, then it's just the old siloed workflow again.

note, that the old flow was not a total shitshow with absolute zero productivity ... it worked for quite a while in many places, but it was bad enough in enough places that a whole "movement" grew out of the recommended solution. it's about keeping the communications/coordination/responsibility-tennis overhead down. sometimes that's best done by saying that you deploy what you wrote in any way you see fit but here's the SLA, and so on. sometimes it makes sense to create infrastructure teams and let dev teams use internal tools to deploy, sometimes this require experts at the team level, sometimes not. and ... of course this can be implemented in the most employee hostile possible way and sometimes in better ways too :)


This is a hot take.

The reason that you need to do things the ops way is because ops knows how to run applications in production. There's a reason the meme "worked in dev, ops problem now" exists. You need to meet all of the requirements of an app that's running in production from a technical, availability, security, and policy point-of-view. It's not easy and that's why this will never work.

Software is hard, it's just that a lot of developers used to cut their code, run it on their laptop, and let someone else worry about it. It's different these days (although not as much as I'd like).

We don't make you use these tools because we want to, we use these tools because we're required too. No one cared about ISO27001, SOC2, or PCIDSS compliance for your crappy PHP app you ran on your cpanel. They didn't care back then you were using md5 hashes to "secure" passwords. The world is fundamentally different to what it used to be 10-15 years ago, and the requirements from business are astronomically different.

Edit: and to people saying "oh you could just run it on a single server", no you can't because certifications like ISO27001 require certain levels of availability and DR. You're not going to be able to guarantee that with a single server running in a rack somewhere.


> This is a hot take.

I'm assuming you mean your comment, not the post itself.

> The reason that you need to do things the ops way is because ops knows how to run applications in production

Stability in production is one metric. Ops overindexing on this metric is exactly what causes the friction with developers.

Developers are trying to ship value to customers.

Uptime is only one part of that equation and for most businesses, it's not even a very important one.

The author points this out near the end. DevOps can't convince devs to use ops techniques if all the reasons for using those techniques are based on the flawed assumption that development velocity isn't important.


> DevOps can’t convince devs to use ops techniques

If “DevOps” is the name of a role, and part of the funtion of that role is “convince devs to use ops techniques”, then I feel like the concept of DevOps is lost. Devs need to own ops, including its costs, which is what convinces them to use ops-appropriate techniques, not some outsider jawing at them.


> ops-appropriate techniques

The true DevOps.


> “Developers are trying to ship value to customers.”

I’ve also seen this be fairly rare. Devs shipping nothing - not even aware if what they're merging will turn on.

They write something they haven’t really tested, merge it, and call it done - a user may never see it and they don’t have any knowledge about how the thing actually gets built and shipped.

Obviously this is worst-case, but in my experience this is a common default. The complaints about friction are because they’re actually forced to reason about how the machine works in order to ship something beyond merge.


> They write something they haven’t really tested, merge it, and call it done

This is a problem of incentives. For all intents and purposes, the dev organization ceases to care the moment something is merged. Nobody is rewarded for making sure everything is fine all the way to production.

Now if you ship two extra Jira tickets this sprint however...


I don't generally make a habit of basing my software development lifecycle methodologies on the lowest common denominator engineering org. Some shops ship value to customers with tests and metrics every day.


> You're not going to be able to guarantee that with a single server running in a rack somewhere.

The way most “enterprises” deploy distributed systems, I’d be surprised if a single server didn’t typically result in better uptime to be honest.


Certifications are a very good point because afaik ISO 27001 is now far more achievable for far more companies of smaller sizes with not that many IT staff. Sometimes even 3 good engineers can set up everything needed to pass ISO in a small company in like half a year or something.


Meh…DevOps is just System Administration, and Systems Administration is just Sys Ops. They keep changing the title/role but the work remains largely the same. I think is a bit disingenuous to throw “dev” in title, as a “DevOps Engineer” myself I don’t consider anything I ever do “dev”. Ansible is not “dev”, terraform is not “dev”, ci/cd pipelines are not “dev”, helm charts aren’t “dev”. But for some reason companies seem to love the term.


> But for some reason companies seem to love the term.

It's possible that I'm just getting very pessimistic, but at this point I'm fairly confident that companies love it because it makes it way easier to attract candidates and describe one set of responsibilities/position in an interview process, and then bait-and-switch it into what is effectively a systems administrator role.


I've certainly had interviews like that. In fact my first full time job out of uni was one of those and I made the error (in hindsight) of sticking it out until I could transfer into another role. Now I'm much more careful to screen for sys admin keywords in job descriptions.


Just happened to me. Now I'm trying to find a way to make an internal transfer happen or decide how long to stick around before applying elsewhere.


Depends on what you think Developer Operations should be. Our developers instantiate their buckets, databases, cache instances etc. themselves, deploy microservices themselves and update configuration, traffic management and scaling parameters themselves. No 'system' people required. The system people are mostly just keeping the automation running and add features as needed.

The work also really isn't the same. Unless you're stuck in the 90's we aren't building servers, installing operating systems, installing applications and installing patches anymore.


> Depends on what you think Developer Operations should be. Our developers instantiate their buckets, databases, cache instances etc. themselves, deploy microservices themselves and update configuration, traffic management and scaling parameters themselves. No 'system' people required. The system people are mostly just keeping the automation running and add features as needed.

When I read this though, I just think about how much time your developers are not actually developing because they're doing operational-side work.

I have the situation where my developers do this stuff, then things break or need debugging and they don't really know how to dig into any of this stack in any meaningful way, so the problems tend to compound. Meanwhile, they're not writing code. The cadence of development seems massively slower to me (coming from a traditional background where they're writing to a clear Ops-set target environment).

The logical outcome is to hire someone who is an expert in all this infrastructure stuff to help manage it - ostensibly, a "DevOps" person, but really, a classic Operations person, just for cloud.


> When I read this though, I just think about how much time your developers are not actually developing because they're doing operational-side work.

Cool. Cool cool cool.

Now instead of doing these things that first time takes a few days, second time a few hours and after that are barely noticeable you dev your thing, then it goes to an Ops queue, which (hopefully next day) comes back with a big fat "heute leider nicht" because you did not understand how the queue works or something. You fix your thing, it goes back to a queue, etc etc.

So your flow is still messed up and yout Time to First Eyeball has increased quite a bit as you have high latency round-trips with external parties.

Obviously there is a tradeoff somewhere, does the dev know all the opsy things or does she know enough to only need to ops when the paved road is not enough or is it the good old "over the wall it goes".


So instead of having a team of people own something (a product) you are splitting ownership up in so many ways, nobody really owns it, and now you have to make everyone waste their time on KPI tracking because somehow that's better than someone just owning their small product as part of a larger business?

This is pretty much a waterfall vs. iterations discussion where the reality is that the world doesn't stop moving while you're working.

If a product team needs to deliver something (i.e. "allow users to select a delivery window for an order"), they might update their microsite with the components and API calls to do this, and then update their API microservice to exchange that information based on the customer's identity (so they can only update their own order delivery data). None of that knowledge is going to exist in a classic ops team, nor should it be if such a team were still relevant.

At the same time, that team might need to take into account that there might be order peaks during the day and they have to collect and act on the right metrics to know how well their code works, how many resources they are consuming and what their scaling policy should be. None of that has anything to do with ops either.

Then, at the end of the day, they'll use all of this information and domain specific knowledge to decide:

  - is the feature complete
  - can users make proper use of it 
  - do we need to invest time and effort in code optimisations
  - do we need to extract this feature into a separate scalable entity to prevent disruptions to neighbouring features
None of this is relevant or knowledge for ops either. Nobody will be able to manage a feature like that better than a product team with a few developers, a few features, a (shared) product owner, and perhaps a (shared) domain expert or tester. But the people who wrote the code own the implementation (but usually they don't own the business requirements).

What ops (or devops enablers) would be doing in this case is:

  - make sure metrics aggregation is working as expected
  - make sure scaling is working as expected
  - make sure deployments and configuration updates are working
  - construct new deployment options if needed
  - construct new metric aggregation systems if needed
  - construct new scaling and traffic management systems if needed
  - notify the teams/owners if they are using deprecated resources or features for too long

Perhaps a simple but too vague summary of the above would be: small feature-owning teams build and maintain their features, devops people enable many small teams to do their work by providing shared systems.

A developer isn't just a 'writes some code, clocks out' type of value to a company. If it was that simple, robots could do it, and there would not be a shortage of skilled people. There are many more dimensions to work, and unless you work at a very large scale company, or a very restrictive company, it's highly unlikely that you'll be isolated from the world and just write a bit of anonymous source code with no idea of what happens before or after.


> What ops (or devops enablers) would be doing in this case is:

> small feature-owning teams build and maintain their features, devops people enable many small teams to do their work by providing shared systems.

So I don't think I get your point. The stuff you've listed here just looks like stuff straight old Ops would do, set the parameters for Dev, and then things would just roll on. Saying "ops (or devops enablers)" like that is confusing me because the whole point of the original article (I thought) was to note that DevOps has basically become Ops, so what have we actually achieved?

i.e., that as soon as you distinguish between "devs" and "devops" people, you've basically already drawn a line between traditional dev and ops roles.


You are indeed not getting the point. Ops people do not 'set the parameters'.

DevOps doesn't become ops, and devs don't become ops either. The only ops left is perhaps contract management that really doesn't have anything to do with the developers.

DevOps can mean the practise of intersecting tasks that classically would be categorised as either development or operations, but it can also mean the people with enough skills to understand the needs of the developed features as well as the requirements of a useful infrastructure.

Take a very dated example:

  - Operations might request new server hardware purchase orders
  - Then they rack it, install an OS
  - Someone else, say, application management, installs Jenkins on it
  - Someone else yet again configures Jenkins so Developers can use it
  - A developer now logs in to Jenkins, and clicks build on their product or entire project
  - Operations gets a fax from Jenkins about a new build being ready to be installed
  - Operations sends the build to a different application management person
  - Different application management installs the new build
  - Developer can finally see the end result
If at any step something goes wrong dependency-wise, the person involved is not allowed or not able to do anything about it because it is outside of their scope. This is classic operations. On top of that, all those layers are annoying and inefficient.

Instead of all that we have new tools, and new grouping of activities which run in parallel:

DevOps-enabling person:

  - Notices a shared CI system would benefit everyone (developers, customers, finance)
  - Proposes such a system and perhaps presents a demonstration
  - Configures a virtual or container system from any XaaS provider to provide this CI system to anyone who wants it
  - Adds integration with SCM
  - Tries out a few builds to test SCM lifecycle

Developer:

  - Writes code, commits to SCM
  - SCM fires a CI pipeline
  - Developer can see the end result
  - Makes additional modifications, immediately sees the result again because nobody else is required to perform those tasks
  - Maybe additional CI options are needed, create commit with CI changes and update CI to have those extra options available
  - Instant results yet again

Classical operations only existed because there were no tools and no shared responsibilities. The world has changed and now shared responsibilities and tools to act on those exist. Net result: a whole lot less people waiting on each other because they all operate in their own fiefdom.

Operations is dead, unless you are still playing datacenter at the office. The closest remaining thing is the service desk and perhaps desktop management for end-users that are locked into some Citrix hell. But we were talking about DevOps which is the intersection of Development tasks and Operational tasks, not end-users and office equipment ;-)


> The work also really isn't the same. Unless you're stuck in the 90's we aren't building servers, installing operating systems, installing applications and installing patches anymore.

I guess I am stuck in the 90’s then, I absolutely do all of the those still.

Not every thing is “sever less” you know.


I agree with most of what you say, and in particular with companies' love for the term, but I disagree that "the work remains largely the same". When I got started on this line of work, we used cvs to track program code and we used backups to 'track' infrastructure, including code used to manage the infra (mostly shell scripts, though not only that).

There's a long path from that to ansible and terraform on SCM.

Another big difference I have experienced: we used to literally celebrate server uptime (I mean as a celebration, I have a distinct memory of gathering around an IBM "fridge" to celebrate the uptime birthday of a particular RS/6000) while now a piece of infra with too much uptime is a red flag about potential vulnerabilities.

What does largely remain the same, I think, are the skills needed to be good at this. Then, and now, we need people who don't mind reading manuals, searching online (this was already a thing when I started, I guess you'd have to go back to the mid 90s for this to not be the case?) , who can keep track of where they've been during a debugging/troubleshooting session, that sort of thing.

Another thing that changed is that in the past some people considered it a badge of honor to be assholes to others not in sysadmin, even more to others not in IT (remember those "select * from users where clue > 0" t-shirts, or the BOFH stories?), while now that's typically frowned upon and quite a few companies are explicit about a no assholes policy in their hiring material (or perhaps I've just been lucky with my teammates and smarter when picking where to work at than when I was younger).


>but I disagree that "the work remains largely the same

I meant at a very high level. The basic responsibilities haven't changed.

    * Deploy/configure infrastructure
    * Deploy applications into infrastructure
    * Monitor/Secure/Maintain infrastructure
    * Scale infrastructure as needed
Sure in the 90's there was no Terraform, and deploying infrastructure meant getting physical hardware and racking it up. Now, you can use Terraform to deploy infrastructure to the cloud, on hardware you rent. So yeah, of course over the years the tools have changed. And sure, as you pointed out, even mentalities have changed (being proud to have a server with 300 days uptime, vs. being ashamed of that).

You can call it "Sys Admin", "DevOps", "Site Reliability Engineer", or whatever, these are all largely the same "Make sure the infrastructure works, is secure, scalable, and help deploy to it." Even with with "cloud managed" things, you still need to setup, config, and secure it. You can have "cloud managed" k8s, it isn't going to stop developers for using bad practices, like running containers as root, and not having a standard deployment process (because each dev is just doing their own thing).


I think the main problem is that the "DevOps" role significantly differs company to company. A company that desperately needs a solid administrator might not be able to attract the right talent and as a result, end up classifying the open positions as "DevOps engineer". At the same time, there are companies out there legitimately trying to bridge the divide between the two — software and system administrators — job families.


From my understanding, DevOps was never about technical solutions/processes. It was about giving the same business goals to Dev and Ops. The idea being to eliminate the tension netween these departments because the goal of Dev was to ship features and the goal of Ops was to ensure the system stayed up.


I think devops means something here, sysops would be about running infrastructure not dev infrastructure, devops would focus on producing dev envs, test envs, CI/CD. Not just setting up the runtime hardware / os configuration.


Depends on the job role. I find myself writing more go then anything else...


So you’re going to use the wrong definition then complain that name doesn’t work anymore?


90+% of companies with "DevOps" mean what the GP is describing.

The biggest tip off - any company with a "DevOps" group or dept isn't doing DevOps. Yet there's a ton of DevOps departments out there


CEOs like to boast how many Engineers they have.


Meh... Dev is just clicking on whatever your IDE autocompletes for you. With copilot you do even less. I think some programmers out there have some big heads that need popping.

Building out and automating cloud infrastructure so your simple code can work is way more complex than most things you do every day. But ya, keep telling yourself how smart you are as your write "connect to database, return a value" for the 1000th time.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: