Screaming in the Cloud

Corey Quinn

About

Screaming in the Cloud with Corey Quinn features conversations with domain experts in the world of Cloud Computing. Topics discussed include AWS, GCP, Azure, Oracle Cloud, and the "why" behind how businesses are coming to think about the Cloud.

Available on

Community

546 episodes

The Current State of Serverless with Kristi Perreault

On this week’s episode of Screaming in the Cloud, Corey is joined by Kristi Perreault. Given Kristi’s title of AWS Serverless Hero, Corey and Kristi discuss the origins and current state of the serverless world, the similarities between AI and serverless as the tech world moves into this next era, and why she emphasizes that serverless is not always the right solution for every issue. Kristi also opens up about her role as Principal Software Engineer at Liberty Mutual, and what she enjoys most about jet setting around the globe giving speeches. HIGHLIGHTS: (00:00) - Introducing Kristi Perreault (00:39) - The Unconventional Path to Becoming an AWS Serverless Hero (05:05) - Exploring the Boundaries of Cloud Education (10:53) - The Challenges of Keeping Up with Rapid Tech Changes (11:51) - Redefining Serverless: Beyond the Hype (13:12) - The Evolution of Serverless and Its Impact (21:55) - Staying Grounded Amidst Technological Zealotry (27:18) - Python Development in the Cloud (29:31) - Upcoming Talks and Where to Connect with Kristi ABOUT KRISTI Kristi Perreault is an AWS Serverless Hero and a Principal Software Engineer at Liberty Mutual Insurance, where her focus is serverless-first cloud enablement. She has over 5 years of industry experience, holds an M.S. in Electrical & Computer Engineering, and is very passionate about promoting women in technology. She is an established speaker, appearing in over 35 conferences, podcasts, panels, and more. Kristi founded the Serverless Denver meetup, and currently co-organizes the Portsmouth, NH AWS User Group and CDK Day. Outside of work and the serverless tech space, Kristi can be found reading a good book in her tiny home, enjoying a good poke bowl, or jet setting all over the world. LINKS: __ __

33m
Mar 27, 2024
Networks and Sustainability in Computing with George Porter

George Porter, a computer science professor at the University of California, San Diego, talks to us about advanced networking and the effects of computing on the environment In this episode of Screaming in the cloud. George explores the shift towards optical networking in data centers to meet growing bandwidth needs and discusses the significant carbon footprint associated with computing, from data centers to device production. In addition to providing a look into the future of scalable, sustainable computing systems, George mentions the difficulties and benefits of incorporating cloud computing into academic research.  Show Highlights: (00:00) - Introduction  (03:15) - The Shift to Optical Networking (07:50) - The Efficiency of Cloud Networks (12:06) - Adaptable Networks for Different Uses (16:19) - Reducing Computing's Carbon Footprint (20:25) - Highlighting Computing's Environmental Impact Through Art (26:51) - Cloud Computing Challenges in Academia (31:18) - The benefits of cloud computing for academic research (34:14) - Closing thoughts  About George: A Computer Science Professor at UC San Diego focusing on high-performance and sustainable computer systems Links: __ __

35m
Mar 21, 2024
Open Source, AI, and Business Insights with AB Periasamy

Join Corey Quinn and MinIO's co-founder and CEO, AB Periasamy, for a look into MinIO's strategic approach to integrating open-source contributions with its business objectives amidst the AI evolution. They discuss the effect of AI on data management, highlight the critical role of data replication, and advocate for the adoption of cloud-native architecture. Their conversation examines the insights of data replication, mentioning its pivotal role in ensuring efficient data management and storage. Overall, a recurring theme throughout the episode is the importance of simplifying technology to catalyze a broader understanding and utilization that can remain accessible and beneficial to all. SHOW HIGHLIGHTS: (00:00) - Intro (03:40) - MinIO's evolution and commitment to simplicity and scalability. (07:25) - The significance of data replication and object storage's versatility. (12:12) - Challenges and innovations in data backup and disaster recovery. (15:21) - Launch of MinIO's Enterprise Object Store and its comprehensive features. (20:50) - Balancing open-source contributions and commercial objectives. (30:32) - AI's growing influence on data storage strategies and MinIO's role. (34:33) - The shift towards software-defined data infrastructure driven by AI and cloud technologies. (39:40) - Resources and the future of tech  (43:31) - Closing thoughts  ABOUT A.B PERIASAMY: AB Periasamy is the CEO and co-founder of MinIO. One of the leading thinkers and technologists in the open source software movement, AB was a co-founder and CTO of GlusterFS which was acquired by RedHat in 2011. Following the acquisition, he served in the office of the CTO at RedHat prior to founding MinIO in late 2015. AB is an active angel investor and serves on the board of H2O.ai and the Free Software Foundation of India. He earned his BE in Computer Science and Engineering from Annamalai University. LINKS: __ __

44m
Mar 14, 2024
A Beginner's Guide to Surviving AWS re:Invent with Chris Hill

EPISODE SUMMARY Corey Quinn is joined by HumblePod CEO Chris Hill to dissect Chris's debut experience at AWS re:Invent. Together, they tackle the challenges of attending one of the biggest conferences in the IT industry, discussing its immense reach, logistical hurdles, and invaluable insights for anyone considering attending in the future. Beyond the event itself, Chris provides an intimate glimpse into the crucial behind-the-scenes efforts involved in producing exceptional content amid the chaos of AWS re:Invent, emphasizing the importance of kindness, professionalism, and superior audio quality. Discover how partnering with an experienced podcast production team can elevate any content to new heights of polish and engagement. FULL DESCRIPTION / SHOW NOTES (00:00) - Introduction to the Episode (01:25) - Chris's First Impressions of AWS re:Invent (02:09) - The Surprising Scale of AWS re:Invent (04:13) - Lessons Learned and Things Chris Would Do Differently at Future AWS re:Invent Events (07:52) - Balancing Content Creation, Networking, and Professionalism Under Stress (13:42) - Chris and Corey’s Humorous Encounters with Security While Filming at AWS re:Invent (15:35) - Exploring AWS Services and Billing Surprises (21:12) - Significance of Professional Podcast Production (25:04) - Closing Thoughts & HumblePod Contact Information (26:19) - Closing Thoughts ABOUT CHRIS: Chris Hill is a Knoxville, TN native and owner of the podcast production company, HumblePod. He helps his customers create, develop, and produce podcasts and is working with clients in Knoxville as well as startups and entrepreneurs across the United States, Silicon Valley, and the world.In addition to producing podcasts for nationally-recognized thought leaders, Chris is the co-host and producer of the award-winning Our Humble Beer Podcast.  He also lectures at the University of Tennessee, where he leads non-credit courses on podcasts and marketing.  He received his undergraduate degree in business at the University of Tennessee at Chattanooga where he majored in Marketing & Entrepreneurship, and he later received his MBA from King University.  Chris currently serves his community as the President of the American Marketing Association in Knoxville. In his spare time, he enjoys hanging out with the local craft beer community, international travel, exploring the great outdoors, and his many creative pursuits. LINKS: __ __

28m
Mar 07, 2024
The Nuanced Power of Headless Browsers with Joel Griffith

Episode Summary On this week’s episode of Screaming in the Cloud, Corey Quinn is joined by Joel Griffith. Joel is  the CEO of Browserless.io, a company focused on providing headless browser automation without the pains of hosting. Corey and Joel discuss the most common use cases for headless browsers, the spectrum of web scraping ethics over the last decade, and why it’s so important to always do what you are passionate about no matter how high you climb on the corporate ladder. Joel also gives us his insight into why so many engineers come from creative backgrounds and shares his story of moving from jazz trumpet player to CEO. FULL DESCRIPTION / SHOW NOTES __ __ ABOUT JOEL Master of puppets and the browsers they run! I'm Joel Griffith, and for over a decade I've helped run, destroy, and make manageable things related to browser automation. I've had the pleasure of working on this in big companies and small, and more recently started Browserless to bring the power of automation to teams of all sizes. LINKS: __ __

30m
Mar 05, 2024
The Complexities of Cloud Networking with William Collins

EPISODE SUMMARY Corey is joined by William Collins, Alkira's head cloud architect, to discuss the obstacles and possibilities of cloud networking. They discuss the evolution, challenges, and necessity of cloud networking, highlighting why this fundamental part of cloud design often goes unrecognized yet truly deserves attention. From William's early days of cloud skepticism to the incredible influence of services such as AWS Transit Gateway, William shares his experiences and insights into how network planning can make a big difference in cloud installations in this episode of Screaming in the Cloud. Show Notes: About William Collins: William Collins is a principal cloud architect at Alkira, where he plays a pivotal role in evangelizing the company's vision, building customer relationships, and leading thought in the network, security, and automation spaces within the cloud ecosystem. With a rich background in enterprise technology across financial services and healthcare, including a significant tenure as Director of Cloud Architecture at Humana, William has made substantial contributions to cloud adoption and network modernization. Beyond his professional pursuits, William is passionate about content creation, hosting The Cloud Gambit Podcast, and teaching as a LinkedIn Learning Instructor. His expertise spans automation, cloud computing, and network engineering. An advocate for continuous learning and innovation, William's outside interests include woodworking, playing ice hockey, and guitar. While his insights are influential, they reflect his personal views and not those of his employer. SHOW HIGHLIGHTS:  (00:00) Introduction (03:24) William Collins shares his initial skepticism towards cloud computing  (07:28) The evolution of cloud networking (13:50) The role of upfront planning in cloud network deployment to avoid scalability and complexity issues. (21:10) The shift from complicated, manual network setups to simple, effective cloud systems . (24:13) William uses Netflix's network design as an example of how cloud networking powers seamless user experiences  (27:44) The future of cloud networking and the ongoing need for innovation (30:23)  Closing remarks  LINKS: __ __

30m
Feb 29, 2024
The Hidden Costs of Cloud Computing with Jack Ellis

On this week’s episode of Screaming in the Cloud, Corey Quinn is joined by Jack Ellis. He is the technical co-founder of Fathom Analytics, a privacy-first alternative to Google Analytics. Corey and Jack talk in-depth about a wide variety of AWS services, which ones have a habit of subtly hiking the monthly bill, and why Jack has moved towards working with consultants instead of hiring a costly DevOps team. This episode is truly a deep dive into everything AWS and billing-related led by one of the best in the industry. Tune in. Show Highlights __ __ ABOUT JACK ELLIS Technical co-founder of Fathom Analytics, the simple, privacy-first alternative to Google Analytics. LINKS: __ __

35m
Feb 27, 2024
How Scaling Turns Rare Occurrences Into Common Ones with Jason Cohen

Today Corey Quinn is joined by Founder and Chief Innovation Officer at WP Engine, Jason Cohen. Jason breaks down the biggest issues he has seen throughout his career hosting millions of websites including why seemingly rare problems should be expected at scale, how moving on after attaining a “good enough” metric can save time and money, and what it means to be proud of your work in the world of cybersecurity. Check it out! Show Highlights __ __ About Jason Founder of unicorn WP Engine (200,000 customers, 1,200 employees). Previously founder of bootstrapped Smart Bear (sold 2008; re-sold in 2021 at ~$2B) and ITWatchDogs (sold 2004). Original mentor and angel investor with Austin-based Capital Factory since 2009. Written about startups for seventeen years, most recently at https://longform.asmartbear.com; Twitter: @asmartbear. LINKS REFERENCED: __ __

52m
Feb 22, 2024
Overcoming Cloud Development Obstacles with Elad Ben-Israel

Corey Quinn talks with Elad Ben-Israel, CEO and Co-founder of Wing Cloud, about the creation of Wing, a revolutionary programming language designed to simplify cloud application development. Elad shares his experiences at AWS and the journey to developing Wing Cloud, highlighting the challenges developers face with existing cloud paradigms and how Wing aims to seamlessly integrate infrastructure and application code. The conversation goes further into Wing's open-source nature, its design philosophy focused on making cloud development more accessible, and the delicate balance between commercial interests and open-source contributions. SHOW HIGHLIGHTS:  (00:17) - Corey Quinn introduces Elad Ben-Israel (02:27) - Elad Ben-Israel discusses the motivation behind creating Wing, (06:28) - Elad presents Wing as a programming language designed to add an architectural dimension to cloud programming (09:45) - The demarcation between application and platform is explored (13:27) - Introduction of the "platform provider" within Wing (22:18) - The Importance of Choice in Cloud Development (31:22) - Getting started on Wing  (33:14) - Closing remarks  ABOUT ELAD BEN-ISRAEL: Elad has been coding since he remembers himself, which is quite a long time ago, and always had an unexplained attraction to developer tools. He created the AWS CDK when working at AWS and is now the co-founder and CEO of Wing Cloud, which is building Winglang, a programming language for the cloud. LINKS REFERENCED: __ __

34m
Feb 20, 2024
A Conversation on Cloud WAN with Kris Gillespie

Kris Gillespie, lead platform engineer for Silverflow, joins Corey Quinn on "Screaming in the Cloud" to talk about Cloud WAN's exciting new role in cloud networking. Kris explains Silverflow's journey, from the original problems with network scalability and the resolution of IP conflicts, to fully utilizing Cloud WAN for global connectivity and easier network management. Kris, who enjoys simplifying complex network architectures, discusses how Cloud WAN has enabled Silverflow to seamlessly integrate between regions and cloud providers, meeting their mission-critical needs for low latency and reliable transaction processing. Listen in to see how Cloud WAN has transformed the approach to solving fundamental network problems, demonstrating the importance for companies and engineers of knowing how to navigate the constantly evolving cloud landscape.  Show Highlights:  __ __ ABOUT KRIS Kris is a 28-year industry veteran. He started in '95 back in Australia on the help desk for the first ISP in the country. Since then has moved to the Netherlands, switching roles between network, systems and storage engineering. During this time has been involved in developing certifications for both IBM and (the now defunct) EMC, among others. Worked heavily in the finance/banking sector. The last 10 years has been keenly focused on the cloud space and as is the term these days, combined these skills into what's popularly coined, a "Platform Engineer" Currently works for a payments processing startup, Silverflow, as their Principal Platform Engineer, leading their Platform team and ensuring the platform can scale globally. LINKS REFERENCED: __ __

38m
Feb 15, 2024
Understanding the Future of Cloud Technology with Anthony Esper

From a systems admin to a cloud computing pioneer, Anthony Esper illustrates the dynamic landscape of cloud technology and its impact on businesses in this episode of Screaming in the Cloud. Using his vast experience and extensive expertise, Anthony shares his insights on developing the Golden VPC module, the intricacies of cloud consulting across various industries, and the pivotal role of strategic planning in cloud adoption. Tune in for practical advice and expert insights! ABOUT ANTHONY Anthony Esper is a seasoned Chief Technology Officer with over two decades in technology consulting. His pioneering work includes developing self-showing real estate technology with Occupi Inc and leading over 20 AWS projects across major US corporations. Esper's expertise spans cloud computing, security, and big data, contributing to his reputation as a tech industry influencer. Show highlights:  __ __ LINKS REFERENCED: __ __

30m
Feb 13, 2024
SmugMug's Cloud Adventure with Andrew Shieh

Andrew Shieh shares the thrilling story of SmugMug’s bold leap into AWS’s cloud technology, marking it as one of the pioneering companies to harness the cloud for digital photography storage. This episode offers a unique perspective into the type of strategy and groundbreaking tech advancements that catapulted SmugMug’s success. Listen to the full episode for a masterclass in innovation and adaptation! SHOW HIGHLIGHTS:  (00:00) Corey introduces the show & Guest Andrew Shieh (00:54)Andrew shares the story of how SmugMug became AWS's first enterprise customer.  (02:17) Discussion on the evolution of AWS's customer service (04:31) Reflections on the expansion of AWS services.  (06:08) The critical role of Amazon S3 in SmugMug's operations (12:24) AWS's interest in unique customer stories and feedback  (09:32) SmugMug's cloud strategy and optimization (13:50) Andrew discusses challenges and solutions in cloud adoption (17:38) Andrew shares his experiences at AWS re:Invent, offering thoughts on the conference's evolution (21:09) A look into AWS's pricing formulas and business insights  (31:55) Closing thoughts ABOUT ANDREW Andrew "shandrew" Shieh is a multidisciplinary engineer, focused today on making the AWS cloud do what it promises to. Andrew started as an environmental engineer, focused on energy efficiency and air pollution modeling, but quickly got dragged into tech after spending most of college at the help desk of the Unix computer cluster. Andrew's current interests include sustainability, cost efficiency, and economics. Most AWS service teams are his friends and he enjoys (a bit too much) talking to his SmugMug and Flickr coworkers about AWS. He recently spoke at AWS re:Invent about how his children (9 and 11) helped to teach him the value of trivia as a means of learning backwards. He also wrote a keynote for re:Invent's pandemic year, and has rescued billions of precious photos from extinction. LINKS REFERENCED: __ __

32m
Feb 08, 2024
Exploring Advanced Cybersecurity with Michael Isbitski

Cybersecurity leader Mike Isbitski explores the intricacies of cloud-native security and vulnerability management in today's technological landscape. With over 25 years of experience, he provides valuable insights into the challenges and complexities organizations face in securing ephemeral infrastructure and machine identities in the cloud. This episode also explores the cautious adoption of AI in cybersecurity, emphasizing the need for a balanced approach that maintains operational functionality while addressing evolving security concerns. Key Points with Timestamp __ __ ABOUT MICHEAL Michael Isbitski is a former Gartner analyst, cybersecurity leader, and practitioner with more than 25 years of experience, specializing in application, cloud, and container security. Michael learned many hard lessons on the front lines of IT working on application security, vulnerability management, enterprise architecture, and systems engineering. He's guided countless organizations globally in their security initiatives as they support their businesses. LINKS REFERENCED: __ __

35m
Feb 06, 2024
Empowering Economic Growth Through Tech Innovations with Angie Jones

Technology meets economic empowerment in this episode featuring Angie Jones, Global Vice President of Developer Relations at TBD, a Block division. Angie sheds light on the role of decentralized technologies in shaping the future of digital identity and cross-border payments. Her journey from software engineering to a leadership role in tech innovation illustrates her profound impact on the industry. This episode offers valuable insights into how technological advancements are driving economic growth and changing the financial landscape. Angie's expertise and unique perspective make this a must-listen for anyone interested in the cutting-edge intersection of technology, finance, and innovation. ABOUT ANGIE Angie Jones is the Global Vice President of Developer Relations for TBD, Block’s new business unit focused on decentralized technologies. She is an award-winning teacher and international keynote speaker who shares her wealth of knowledge at software companies and conferences all over the world. As a Master Inventor, Angie is known for her innovative and out-of-the-box thinking style which has resulted in 27 patented inventions in the areas of metaverses, collaboration software, social networking, smarter planet, and software development processes. SHOW NOTES: (00:25) Introduction to Angie Jones and Her Role at TBD (01:25) Angie’s Recognition in a USA Today Crossword (02:50) Career Journey and Transition into Developer Relations (06:04) Block’s Mission and Services in Economic Empowerment (10:09) Convenience vs. Decentralization in Technology (16:49) Innovations in Cross-Border Payments (25:01) Decentralized Tech Stories and Reflections on Tech Innovation (30:22) Challenging Tech Industry Norms and Global Perspectives LINKS REFERENCED: __ __

37m
Feb 01, 2024
Mastering Tech Transitions with Ceora Ford

Join us for a fascinating talk with Ceora Ford, a Developer Advocate at Okta, as she explores the changing world of tech. Ceora shares her unique journey through different tech roles and talks about the importance of keeping technical skills sharp, even when focusing on advocacy. She also gives us a sneak peek into the exciting AI developments happening at Okta. Tune in to this episode to get a better understanding of the fast-paced tech industry and what's coming next. ABOUT CEORA Ceora Ford is a Developer Advocate from Philadelphia, renowned for her expertise in making complex computer science concepts accessible to a broad audience. With a rich history of creating educational content, she has significantly contributed to the tech community, working with leading companies like CodeSandbox, DigitalOcean, egghead.io, and Apollo GraphQL. Ceora's career is marked by her unique ability to simplify technical topics, making them understandable for everyone, from students to professionals in tech-adjacent roles. Her non-traditional path into tech and her current role at Okta showcase her commitment to making the tech industry more inclusive and approachable for all.  LINKS REFERENCED: __ __

32m
Jan 30, 2024
Working to Live Instead of Living to Work with Jeremy Tanner

Jeremy Tanner joins Corey on Screaming in the Cloud to discuss why his career in tech is the least interesting thing about himself, and why he feels everyone should be able to say the same thing. Corey and Jeremy discuss raising kids, their antics on motorcycles, and much more throughout this episode. Jeremy reveals what truly gives his life fulfillment, meaning, and what drives him in his career. Jeremy and Corey also discuss the importance of engaging your online audience the right way. ABOUT JEREMY Jeremy is a motorcyclist. An advocate (Developer, Community, BBQ). Not Questlove. LINKS REFERENCED: __ __

33m
Jan 25, 2024
How Snyk Gets Buy-In to Improve Security with Chen Gour Arie

Chen Gour Arie, Director of Engineering at Snyk, joins Corey on Screaming in the Cloud to discuss how his company, Enso Security, got acquired by Snyk and what drew him to Snyk’s mission as a partner. Chen expands on the challenges currently facing the security space, and shares what he feels are likely outcomes for challenges like improving compliance across value-add on security tools and the increasing scope of cybersecurity at such a relatively early phase of the industry’s development. Corey and Chen also discuss what makes Snyk so appealing to developers and why that was an important part of their growth strategy, as well as Chen’s take on recent security incidents that have hit the news.  ABOUT CHEN Chen is the Co-founder of Enso Security (part of Snyk) - the world's 1st ASPM platform. With decades of hands-on experience in cybersecurity and software development, Chen has focused his career on building effective application security tools and practices. LINKS REFERENCED: __ __ TRANSCRIPT Announcer: Hello, and welcome to with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is . Corey: Welcome to . I’m Corey Quinn. This promoted guest episode is brought to us by our friends at Snyk https://snyk.io, and as a part of that they have given me someone rather distinct as far as career paths and trajectories go. Chen Gour Arie is currently a director of engineering over at Snyk, but in a previous life—read as about six months or so ago—he was a co-founder of Enso Security, which got acquired. Chen, thank you for joining me. Chen: Thank you for having me, Corey. Corey: So, I guess an interesting place to begin is, what has the past couple of years been like? And let’s dive in with, what is or was Enso Security? Chen: Yeah. So, Enso started for me first as friendship because I joined the team that I was working with as a contractor for a while. There was such an excellent and interesting team with a very interesting environment. And then after a while, they asked me to join that team, and then I became part of the security team of a company called Wix.com. It’s quite a large company, web do-it-yourself kind of platform, that you can build your own website with a presentation style kind of interface, and our job was to secure that. And we formed a very, very nice friendship throughout it, but we also gained a lot of experience because you work with such a large company, and you experience many challenges, including real-time attempts to penetrate, and the complexity of social engineering at large scale. You go through a lot of things. So, this was the start. And after a couple of years, we decided that we have some interesting ideas that can do good to the community in the cybersecurity industry, and we embarked on a new journey together to start Enso. Corey: I can see why you aligned with Snyk. It sounds like a lot of what you were aimed at is very much in step with how they tend to approach things. I have a number of sponsors that I can say this about, but Snyk is a particularly fun one, in that, obviously, you folks pay me to run advertisements and featured guest episodes like this, which is appreciated, but we also pay you as a customer of Snyk because it does a lot of things that we find both incredibly useful and incredibly valuable. The thread that I’ve seen running through everything coming out of Snyk has been this concept of, I think, what some folks would say shifting left, but it comes down to the idea of flagging issues as early in the process as possible rather than trying to get someone to remember what they did three months ago, and oh, yeah, go back and address that. That alone has made it one of the best approaches to things that are truly important—and yes, I consider security to be one of those things—that I’ve seen in a while on the dev tool space. Chen: Yeah, and this has been the mission of Snyk for a very long time. And when we started Enso, our mission was to help in some additional elements of the same problem space in introducing additional tools to help drive this shift left, this democratization of the security effort around and in the organization, and resolving some of the friction that is created with the, kind of, confusing ownership of security and software development. So, this was kind of the mission of Enso. The category introduced by it and the ASPM category to bring the notion of postural security, postural management to applications. And it really is a huge fit with the journey of Snyk, and we were very excited to be approached by them to join their journey and help them do further shift left and extend on problem space on the complexity of this collaboration between security and developers. Corey: A question I have around this is that it seems to me that viewing security posture management from an application perspective, and then viewing other parts of it from a cloud provider perspective and other parts of it from a variety of different things—you know, go to RSA and walk up and down the endless rows of booths, and you know, look at the 12 different things that they’re all selling because it’s all the same stuff around 12 categories or so, with different companies and logos and the rest—it feels like, on some level, that can lead very quickly to a fractured security posture where, well this is the app side of the security, and then we have the infrastructure security folks, but those groups don’t really collaborate because they’re separate and distinct. How do you square that circle? Chen: Yeah, it’s not an easy problem, and I think that the North Star of many vendors exists this notion of sometimes I think we call it CNAP or something that will unify all of it. Cloud as a solution, and the offering that exists with cloud computing enables a lot of it, enables a lot of this unification, but we have to remember that the industry is young. The software security industry in general is young. If we will look at any other industry with that size, all of them have much more history and time to mature. And inside this industry, the security itself is even younger. It has become a real problem much later than then when software started. It has become a huge problem when cloud emerged and became, like, the huge deal that it is now. And when more and more businesses are based on digital services, and more people are writing software, a lot of it is young, and it needs time to mature, and it’s time to get to—to accomplish some big parts like this unification that you are pointing out missing. Corey: I have to confess my own bias here. A lot of the stuff that I build is very small-scale, leverages serverless technologies heavily, and even when I’m dealing with things like the CDK, where I start to have my application and the infrastructure that powers it coalesce into the same sort of thing, it becomes increasingly difficult, if not outright impossible for some of these co...

28m
Jan 23, 2024
Continuing to Market After the Product Has Sold with Kim Harrison

Kim Harrison, a freelance content marketing strategist and author, joins Corey on Screaming in the Cloud to talk about asking the right questions to find your target demographic, why she has such a deep love for story telling, and how marketing extends after the product has been sold. Kim shares her unique experiences with solving urgently painful problems that customers are experiencing and subsequently building a relationship with those customers that allows her to solve more pain points down the line.  ABOUT KIM Kim is a professional storyteller focused on strategic communications. She translates complex ideas into compelling narratives, helping teams share their perspectives. She enjoys building impactful stories, and using a range of mediums and channels to reach specific audiences. For 10+ years Kim has worked closely with teams focused on big data and developer tooling. They have brought new methodologies forward, impacted the language used to describe technologies, and even established new industry categories. LINKS REFERENCED: __ __ Transcript Announcer: Hello, and welcome to with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is . Corey: Welcome to , I’m Corey Quinn. One of the unpleasant-to-some-folk realizations that people sometimes have is, “Wait a minute. Corey, you’ve been doing marketing all this time.” To which the only response I can come up with is a slightly more professional version of, “Well, duh.” And I think that’s because people misunderstand what marketing is and what it means. Here to talk about that, and presumably other things as well, is Kim Harrison https://www.kimber.kim/, a freelance content marketing strategist. Kim, thank you for agreeing to listen to me. Kim: [laugh] Thank you for having me, Corey. It’s great connecting with you today. Corey: You’ve worked at a number of different places over the course of your career, the joys of freelancing. You have periodically been involved in getting folks from the companies at which you’ve been working onto this show, but it’s sort of the ‘always a bridesmaid, never a bride’ type of philosophy. You were somewhat surprised when I reached out and said, “Hey, why don’t you come on the show yourself?” Which is always the sign it’s going to be a fascinating episode because some of the most valuable conversations that I find I have here are with people who don’t think at first that they have much to say. And then I love proving them wrong. But you’re in marketing. Presumably, you have many things to say. Kim: [laugh] It’s funny, you say that I feel like in marketing, we’re always behind the scenes, we are the ones building and crafting the image, and bringing that story forward of, who is this? What is this company? What is this product? What do they do? Why should I care about it? And, “Wow, those are amazing stickers. I want five of them, please.” So, I’m kind of used to being behind the curtain rather than in the foreground talking about what I do. Corey: People tend to hate marketing, especially developers, when you talk to them, but when you really drill down into it, it’s not marketing that they hate. It is, on some level, a marketing straw man—or straw person, whatever the current term of art is—because they think of the experience through the lens of the worst examples of it. And everyone who has been in the industry for five minutes knows what I’m talking about. Billboards that make no sense where a company spent $20 million on an ad buy and seven bucks over the lunch counter trying to figure out what to say once you have all of that attention, or bad email blasts that are completely irrelevant, untargeted, misspell your name, and are clearly written by a robot. That’s not what marketing is, at least in my mind. What is it to you? Kim: For me, marketing is how you communicate who you are, what have you built, what is the value that it provides, and how can somebody use it. There’s many ways in which you can share that, that can be all of those activities that you just talked about. And I think it’s easy to sometimes lose the story in all of that and talk about things that may not be as important. I think a lot of times people get excited about what they’ve built, and love to talk about what they’ve built but not why it provides value, and what value it provides. And so, staying focused and really sharing that clear story is—it’s a lot harder than I think people give it credit for. Corey: A very senior, well-known engineering leader whose name I will not mention because I—I can tell stories, or I can name names, but I don’t believe in doing both—once said, out of what was otherwise like this—like, this person just dispenses wisdom like a vending machine. It’s amazing, but one of the dumbest things I ever heard this person say was, “I never want to get marketing outreach, or show me ads or the rest. If you’ve built something awesome. I will find it on my own.” Which is a terrific recipe to follow if you’d like to starve to death. Kim: Yeah, I agree with that. And I think there is this… I don’t know, maybe it feels great to imagine that what you’ve built is just so interesting that people would automagically find their way to you and pop up in your DMs and beg to throw money at you for what your product is. But I mean, truly if nobody knows that the thing exists, or even what it does, how could they? I’ve seen this happen quite often in technology where there’s actually an amazing product that maybe they are sharing who they are, they are promoting themselves, but the messaging just doesn’t quite land, and so there’s a lot of confusion and misunderstanding about an amazing product. And so, not sharing, but also not sharing a very accurate, complete picture of who you are can also hurt you. Corey: When I first started going out independently in the fall of 2016, I did not know whether it was going to work, whether I was going to succeed or have to go do something else, but what I knew very obviously, was that, one way or another, 18 months from now, I was going to want to have an audience to tell about whatever I was doing. Like, the best time to build an audience is five years ago; the second-best time is today, just like planting a tree. So, I started building out the email newsletter. It was something I wish existed, no one else had built it, I figured I’d give it a shot, and it resonated, and that’s where the newsletter came from. But it means that I can reach out and talk to 32,000 people in their inbox, more or less whenever I want to, tell them whatever is on my mind, and I do that in the form of my newsletters. And that more than anything else has really led to anything that could be equated to be… me as a brand, so to speak. It took work to get there, but I view it as something that, in hindsight or to someone who had spent 20 minutes thinking about marketing, was obvious, but it took me a while to get there from first principles. Kim: Yeah, for sure. And, you know, as a person who receives your newsletter, as somebody who has collaborated with you in the past, something I know you do really well is you are very clear about who you are, what you stand for, and you’re consistent. And so, I think… in my opinion, I think you’ve done a great job of earning your audience’s trust, and that’s a huge part of this, right? As a marketer, it’s very easy to say, you know, “My thing is bigger, better, faster,” but if it’s pure conjecture, if it’s not—if there’s no there there, people will find out, you will lose that trust, and it can become difficult. And so, it does take time. And I think—I imagine, and I would ask you—I imagine you were very intentional about what you did. It took time, and you understood that, and it’s like, okay, put your head down and be patient because this will reap rewards in the end. Corey: That’s the curse, on some level, of having succeeded at something. You look back in hindsight, and everything looks like one thing clearly led to another, and where you are now is sort of inevitable when viewed through that lens. It does not feel like that on the day-to-day. I promise. Kim: [laugh] What—okay, so as you built your audience, what was the hardest part for you? Corey: Figuring out who the audience was, to be perfectly honest. It didn’t take long before Datadog came sniffing around, six issues in, asking if they could sponsor. And it was, “You want to give me money to talk about you? Of course, you can give me money. How much money?” And I inadvertently found myself with a sponsor-driven media business. But that led to a bit of a crisis of faith for me of, who is my audience? Is it the sponsors because that—like, I like money, and I wish to incentivize the behavior of giving it to me, but if I do that, then suddenly, I’m more or less just a mouthpiece or a shill for whoever pays me enough, and that means the audience loses interest. It has to be the community is my target because that’s what I consider myself a part of. I write content that I want to read, that I want to exist, and if sponsors like that, great. If they don’t, then well, okay, it’s not for everyone. But the audience is around because they either agree with what I say, or they appreciate the authenticity of it. And it goes down to the old saw of would you rather have a pile of money, or would you rather have a relationship with someone? It’s like, “Well, I can turn a relationship into money way more easily than I can the opposite.” So yeah, I would much rather build a working rapport with the people who support me. Kim: Interesting. Yeah, I agree with you. And I would ask another question about your audience. Who was in that audience? Is this one kind of person? Is this many kinds of people? How do you think about who you’re speaking to? Is it a unified group, or are you considering that there are three or four different kinds of people within this body, and you try to address all of them at different points in a week or month? Corey: If you try to write for everyone, you wind up writing for no one— Kim: Yeah. Corey: —and every time I think I have a grasp on who my audience is—like, if you’re listening to this show, for example, I have some baseline assumptions about you in the aggregate, but if you were to reach out—which again, everyone is welcome to do—I would be probably astounded to learn some of the things that you folks are working on, how you view these things, what you like, what you don’t like about the show. On some level, I operate in a vacuum here, just because feedback to a podcast is a rare thing. I suspect it’s because it’s like listening to an AM radio show, and who calls into an AM radio show? Lunatics, obviously. And most people—except on Twitter—don’t self-identify as lunatics, so that’s not something that they want to do. I encourage you to buck that trend. Reach out. I promise, I drag multi-trillion-dollar companies, not individuals who dare to reach out. Some of my best friendships started off with someone reaching out like, “Hey, I like what you’re doing, and I’d like to learn more about it.” One thing leads to another, and there are no strangers; just friends we haven’t met yet. Kim: Yeah, yeah. In the world of developer marketing, sometimes that audience can be a range of people. It can be the user versus your buyer. So, when I think about content marketing and I think about telling the story of a platform or a brand to, you know, this range of people, maybe I want to tell that same story, but I’ve got to do it in slightly different ways. Because to your point, if you try to be, you know, one thing for everybody or nothing to everyone, it just, it doesn’t work. And so, how do you talk to that buyer who can actually sign the check versus the individual contributor, the person who’s using the product day-to-day? What part of that story do they want to hear? What makes sense to them? What is engaging to them? Corey: Part of the challenge I’ve had is that I always assume that the audience was largely comprised of people who vaguely resemble me, namely relatively senior engineering folks who have seen way too many cycles where today’s shiny new shit becomes tomorrow’s legacy garbage that they needed to maintain. But that is not true. In practice, about 60% of the audience is individual contributing engineers, and the remaining 40 is almost entirely some form of management, ranging from team leads to C-level executives of Fortune 50s and everything in between. And every piece that I write is written for someone. And by that I mean, a specific person or my idea of that person as I go. Now, I don’t mention them by name, but that means that different pieces are targeted at different audiences and presuppose different baseline levels of knowledge. And sometimes that works, sometimes that doesn’t, but it means that everything that I write should ideally resonate with some constituency. Kim: Yeah. Yeah. And, again, as a person who has collaborated with you, you have a range of channels that you share content across. And so, I think when I first met you and first started working with you, I very quickly started to understand where that made sense to me, not just as a collaborator, but as somebody who enjoys the people that you bring in to interview, the stories that you tell, the conversations that you start. But I’ve noticed there’s areas that I tend towards, and would listen to or read more. I don’t know if that was intentional, if there are certain areas that you focus on for different segments of your audience. Corey: Partially. And this is a weird thing for me to say, particularly in this medium. I don’t listen to podcasts myself. I read extremely quickly, I do not have the patience to sit through a conversation. It makes sense when I’m driving somewhere, but I barely do that. My drive home from dropping off my toddler at preschool is all of seven minutes, which is not long enough for basically anything, so it’s not for me. I don’t watch videos. I don’t listen to podcasts. I read. That’s part of the reason that every episode of this show has a transcript. It’s also part of the reason, though, that I have the podcast entirely, as that I am not the common case in a bunch of things. An awful lot of people do listen to the podcast. I’ve talked to listeners who are surprised to learn I have an email newsletter, but I view it as the newsletter came first and then the podcast. Occasionally, I find people who only know me through my YouTube videos—which are sporadic because it’s a lot of effort to get one of those up—and no one sees all of it. This did lead to a bit of a weird crisis for me early on of, okay, so I have a Twitter account, I have a LinkedIn page, I have the podcast, I have the podcast, I have the newsletter, and I have the blog, and of course, I have my day job at The Duckbill Group where we fix AWS bills. That is seven or eight different URLs. Where do I tell people to go? Kim: Yeah. Corey: It’s a very hard problem. Kim: Do you do that? How do you do that? Or do you allow people to find their own way? Corey: Whether you allow people to or not, they’re going to do it on their own. My default of where do I send people is lastweekinaws.com https://lastweekinaws.com. That talks a little bit about who I am, it has a prominently featured ‘newsletter signup’ widget there, give me your email address and you will get an opt-in confirmation. Click that, and you will start receiving my newsletters, which talk in the bottom about other things that I do, and let people find their way to different places, like slack.lastweekinaws.com https://slack.lastweekinaws.com, for the community Slack channel, which is sort of the writer's room for some of these conversations. There’s a bunch of different ways, but not everyone wants to engage in the same way, and that’s okay. Kim: Yeah. That is something that’s come up a lot for me, managing content programs. You said it yourself: not everybody learns the same way, and so thinking about different ways to share a story, I would say right now a lot of people are really burnt out on webinars. I think the past couple of years of being at home and staring at screens has done a number on us all. But still, there are ways in which some people do prefer video. Maybe shorter format is better, or audio, or reading. And it’s great that you put the transcript in because I know I’m a person who really values that. Sometimes I can’t listen to an episode, and it’s great that I can, you know, kind of skim through and read through parts of the interview that I knew that were going to come up. And so, being attuned to the fact that there’s many different ways to tell a story, and having fun with that—dare I say [laugh]—is, I think, a huge part of it. Corey: You have to have fun, otherwise, you aren’t going to be able to stay the course, at least that’s my philosophy. I am very fortunate in that what I do is technically marketing for the consultancy because an overwhelming percentage of our leads come from, people have heard of me and that leads them here. It’s never clear to me where was the original point of contact, how did you get into the orbit, who recommended you, but that is functionally what it is. I’m fortunate in that the media side of our business with sponsorships turns this into a business unit that generates a profit. But it is functionally still a marketing department. That is not mandatory. Kim: Yeah. So, an interesting thing that I’ve seen happen within developer marketing is when thinking about this audience and how you market your consultancy, you spoke about how many people are individual contributors in your audience. I—did you say it was like 60%? Corey: 60% engineers, although it’s also how people view what their role is changes rather drastically. And I’ve never found that any of these things that are categorizations of roles or company styles or what have ever fit me well. I don’t fit anywhere I go. And that’s okay. I assume that there’s a lot of slop and wiggle room in there, but it gives me a direction to go in. I would have guessed before that, that 95% of the audience was engineering hands-on coding-type practitioners. Kim: Right. Corey: Clearly I’m wrong. Kim: Well, in understanding that, I mean, what you’ve got is an understanding of who can take what action. I mean, yeah, at some point, you do want sponsors, right? If you are marketing for your consultancy, you probably do want to reach those executives that would be the person that would actually bring you in—your team in—to evaluate and give them advice and feedback, and that’s not always the individual contributor. However, having a presence within the community is equally beneficial to your brand. And so, for me, as a person who has worked in-house at teams, often the demand gen team is telling me, “Oh, we just want to do things that will get leads in the door,” you know, leads that will actually turn into customers, but addressing your community and having a presence there, and showing up there, and participating is just as important. You know, that’s brand awareness. And so, there will sometimes be activities that you do that really are just about participating, and showcasing yourself and your team as the experts that you are. And sometimes it will be a direct, “We have this feature. We have this product. Here's how you can do a trial and sign up to become a customer.” Corey: That is, I think, something that gets missed a lot. With so much marketing in this industry slash sector slash whatever it is that you want to call it is, in larger companies in particular, you wind up with people who are writing some of the messaging around this that are too far removed from the actual customer journey. You see it very early startup phase, too, where… I see it on the show, sometimes, with very early stage technical co-founders. They want to talk about the internals of this very hard thing that they built and how it works. Great. That’s not your customer. That is not something that anything other than your competitor or your prospective hires are really going to be that interested in. Kim: Yeah. Corey: Talk about the painful problem that you solve. Kim: Absolutely. Show—oh, my gosh, I just had a conversation with a colleague about this very thing. Show the return on investment, show the value you provide, and do it explicitly, do it very clearly. Do not assume that people understand. Give numbers if you can, metrics. Just really put it out there because I think in this moment right now, in this economy… budgets are tight. And so, if you can’t clearly show what value you provide and why you should be there, you know, why somebody should bring your product into their stack, you’re just not going to make it through, or you’re not going to last long. Corey: Yeah. It’s hard. None of this stuff is easy, and marketing is way, way, way harder than it looks. Done well, it looks like you barely did anything at all. Do it badly, and suddenly the entire internet lines up to dunk on you. Kim: Oh, that is so true. Gosh, and that’s really difficult for marketers because, as you said, we’ve done well, it just feels natural. Like, of course, this would happen. But there’s so much that goes on behind the scenes to execute and make it look seamless and flawless. That is something that I like to advise onto my fellow marketers and content marketers is, don’t forget to remind your team what you’ve been up to and what it took to get there so that they appreciate the value of what you’re providing, and will continue to do those things that help keep that momentum moving forward. As you said, how many years did you work on getting that audience together where it is today? This was not six months. This was a real time and effort for you to build this following, and to earn this trust, and to have the brand that you have now. Corey: The funny part is, I didn’t do most of it. My entire time doing this, I have been unable to materially alter the trajectory of growth. It is all word of mouth, people in the audience telling other people about whatever it is that I do. I have run a number of experiments across almost every medium that was within my reach, and none of them seem to materially tip anything other than being authentic and being there for the audience, and then just letting the rest sort of handle itself. Kim: Mm-hm. I like that you said that, that you’re running experiments. You’re in conversation with your audience. You’re really thinking about how your message lands, and what they like or don’t like, or what resonates. Corey: It’s a hard problem. How do you view marketing? You’ve been working in this space a lot. You have specifically in your title of Freelance Content Marketing Strategist a derivation of the word strategy, which has always been something that I’m not great at. It’s longer-term, big picture thinking. I’m much better tactically in the weeds. What do you see as the broad sweep of how it’s being done in this industry? Kim: I can speak to myself. I studied sociology. I really love thinking about what influences people, I love stories and storytelling, and so my focus is strategic communications. And that’s a fancy way of just saying, you know, taking these complex ideas, these products that people built, and turning them into compelling narratives so we can showcase the value they provide. And I think it’s especially interesting and challenging doing that in technology when a lot of times you’re bringing forth a completely new products that never existed before, so how do you speak to that? How do you help people understand that a thing they’ve never been able to do before they can now do, and it could be a part of their life, and it could be part of their workflow, and change how they think about their own practices? And so, for me, it really is storytelling. I’m a sucker for, you know, a good podcast and a good book on the side. That’s how I think about it, but I also do appreciate that at the end of the day, this is marketing, we are, you know, a business, and so I also enjoy being a part of a team. So, I can help build the beautiful story and think about how to share that effectively, get that in front of the right people at the right time so that they can have an understanding of who you are, what you are, what you offer, be a part of the larger conversation that is in place that you can become a trusted brand, and doing that within you know, a larger marketing team, those people that make sure that, you know, ultimately we’re getting those people into the marketing and sales funnel, and the appropriate activities that happen next. So I’m, I tend to hang out in my storytelling realm of marketing, but fully well appreciate and know that this is—to your point, this is—marketing is a large effort, and there are a lot of people that contribute to the different moving parts. And it’s like a dance making it all come together. Corey: Something I found as well is a complete lack of awareness outside of marketing itself, in the differences between all of the marketing sub-functions. It’s the engineering equivalent of lumping mobile developers, and front-end developers, and SREs, and back-end developers, and DBAs, and so on, and so on, and so on, all into the same bucket. Like, “You’re just an engineer. Can you fix my printer?” Style stuff. Kim: Yeah. Corey: Marketing is a vast landscape, and you start subdividing it further and further, and there’s a reason that it’s an entire organization within companies and not a person. Kim: Yeah, for sure. And gosh, some of the people that I’ve worked with at earlier-stage companies that are capable of covering more than one area, really creative, flexible, nimble fingers, you know, they are quick on their feet and can see that, you know, larger vision and help contribute to that. So, you know, building out messaging is one thing. Thinking about how to get that in front of your audience is another. How to guide your customers through that journey, like, what does the learning process look like, and how do you make sure that you continue to drive those conversations so that somebody can go through that learning process? How are you showing up in the real world at an event? How is your team talking to [media 00:25:23] to analysts? I mean, the list can go on, as you begin to think about the more and more people in the world that you want to touch and interact with, who should know who you are? They should understand who you are, what is your brand, what product have you built, and why it’s important to the conversation right now. And so yeah, you start to bring in more team members who specialize in that, who can help you make sure that you’re doing that particular function really well. And it’s fascinating being inside of a small startup and then watching that operation scale into something larger, and really watching that effort take off. It’s pretty cool to see. Corey: Something I’m curious about that you have been rather vocal about is that marketing extends after the product is sold. What do you mean by that? Kim: The way that I think about that is, in my opinion, customers should be a part of the customer journey. So, the customer journey is from point zero where this person or team or organization was not aware of who you are to, “Oh, apparently, there’s a solution that fits my need,” to, “Oh, and I want this particular brand, I want this tool in my stack, I want to work with these people,” to, they’ve signed on to become a customer. Even after that point, in my opinion, marketing efforts should continue, in that perhaps that customer came in to solve one or two use cases, but your platform or product can help with many others. And so, making sure that customer is onboarded appropriately so that they’re getting the full value out of the product that they should, and they’re keeping them educated so that they’re aware of other parts of the product that maybe they didn’t learn about in their discovery journey, as well as, you know, as your product evolves, new features that are offered. So, as I think about marketing, the existing customer base is also a group of people that I’m always thoughtful about. So, let’s say that, you know, if I were to plan out a product release announcement, that is a segment that I would absolutely want to make sure that we include in our strategy. And where are the touchpoints for that? How can we make sure that segment is also understanding and aware of this new announcement, and how it can affect them? And what resources would I provide to them so that they know about it, they will use it well, perhaps become a power user, and you know, very selfishly… sorry to say this out loud, but maybe they’ll become a power user and want to come on a webinar with me, or be featured in an article about how much they enjoy using it. But again, just because you’ve got a customer in-house doesn’t mean that journey is finished. There’s, as your product continues to grow and evolve, your relationship with that customer should also continue. Corey: There are two schools of thought on taking money from customers. One of them is you get them as much money as you possibly can upfront, once. And there’s also the idea of, all right, I want to have an ongoing relationship in which they broaden their relationship in the fullness of time and grow as a customer. Some of our best sources of business have come from folks who either—not just—don’t tell their peers at other companies about us, but come back to us when their situation changes, or wind up doing business with us as they land somewhere else in the ecosystem. Like there is, “Yeah, we like working with you,” is all well and good, “And I want to do it again; here’s money,” is a different level of endorsement. Kim: Absolutely. And some of the companies that I’ve worked with, often customers will come in because they have some extreme point of pain, and they want to solve that one thing. They do not have time to think about the dozen different interesting use cases. “I have this thing that I need to solve, and I need to get it done now.” And so, work with them on that, and later on, that opportunity to expand their understanding of what else is possible. And even coach and provide guidance on, especially with some newer products where people are learning new development techniques. “Did you know that this is also possible? Have you considered this?” And so, thinking about that, like, not everybody is just twiddling their thumbs, “Oh, I have free time. I’d love to learn a thing.” They’re usually coming to you because they have a very painful thing that they need solved, hence why it’s great to talk about the value you provide: “I can help you solve that, I can help this pain go away, and help your business do what it needs to get done.” And so, when they’re our customer, that next moment is that great, great opportunity to talk about other use cases, other parts of the platform. Corey: I really want to thank you for taking the time to speak with me. If people want to learn more, where’s the best place for them to find you? Kim: Right now, I’m mostly active on LinkedIn https://www.linkedin.com/in/kimberh/, and I believe—would you be able to provide a link to that in the show notes? Corey: Oh, we absolutely will put that in the show notes, whether you want us to or not. That’s the beautiful part of having show notes for folks. Kim: Awesome. Yeah, I think that’s the best place to find me today. Unfortunately, I don’t use Twitter https://twitter.com/kittyriot as much as I used to. So, I do exist there, but I’m not— Corey: That’s such a smart decision. Kim: I know, I feel terrible about it. And I got to say, I miss the community that it was. Corey: Yeah, that’s the reason I focus on the newsletter as the primary means of audience building. Because email is older than I am. It will exist after I’m gone—and that’s fine—but it means that it’s not going to be purchased by some billionaire man-child who’s going to ruin the thing. I don’t need to worry about algorithmic nonsense in the same way. I can reach out and talk to people with something to say. I’m in that very rarefied space where when a company blocks an email that I send out, they get yelled at by their internal constituencies of, “Hey, where’d that email go? I was looking for it.” Kim: That’s awesome. Corey: Thank you so much for taking the time to speak with me. I appreciate it. Kim: Thank you, Corey. It’s a pleasure talking with you. Corey: It really is because I—like you—am delightful. Kim Harrison, freelance content marketing strategist, has been my guest today. I’m Cloud Economist Corey Quinn, and this is . If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry and insulting comment. Don’t worry about telling me about it. If your comment was any good, I’m sure I’ll find it on my own. Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com https://duckbillgroup.com to get started.

32m
Jan 18, 2024
The Future of Entertaining Developer Content with Jason Lengstorf

Jason Lengstorf, a developer media producer and host of the show joins Corey on this week’s episode of Screaming in the Cloud to layout his ideas for creative developer content. Jason explains how devTV can have way more reach than webinars, the lack of inspiration he experiences at conferences these days, and why companies should be focused on hiring specialists before putting DevRels on the payroll. Plus, Corey and Jason discuss walking the line between claiming you’re good at everything and not painting yourself into a corner as a DevRel and marketer. ABOUT JASON Jason Lengstorf helps tech companies connect with developer communities through better media. He advocates for continued learning through collaboration and play and regularly live streams coding with experts on his show, Learn With Jason. He lives in Portland, Oregon. LINKS REFERENCED: __ __ Transcript Announcer: Hello, and welcome to with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is . Corey: Welcome to . I’m Corey Quinn. Before I went to re:Invent, I snuck out of the house for a couple of days to GitHub Universe. While I was there, I discovered all kinds of fascinating things. A conference that wasn’t predicated on being as cheap as humanly possible was one of them, and a company that understood how developer experience might play out was another. And I also got to meet people I don’t normally get to cross paths with. My guest today is just one such person. Jason Lengstorf is a developer media producer at Learn with Jason https://www.learnwithjason.dev/, which I have to assume is named after yourself. Jason: [laugh] It is yes. Corey: Or it’s a dramatic mispronunciation on my part, like, no, no, it’s ‘Learn with JSON’ and it’s basically this insane way of doing weird interchange formats, and you just try to sneak it through because you know I happen to be an XML purist. Jason: [laugh] Right, I’m just going to throw you a bunch of YAML today. That’s all I want to talk about. Corey: Exactly. It keeps things entertaining, we’re going to play with it. So, let’s back up a sec. What do you do? Where do you start and where do you stop? Jason: I’m still learning how to answer this question, but I help companies do a better job of speaking to developer audiences. I was an engineer for a really long time, I went from engineering into developer advocacy and developer experience, and as of the last year, I’m doing that independently, with a big focus on the media that companies produce because I think that what used to work isn’t working, and that there’s a big opportunity ahead of us that I am really excited to help companies move into. Corey: It feels like this has been an ongoing area of focus for an awful lot of folks. How do you successfully engage with developer audiences? And if I’m being direct and more than a little bit cynical, a big part of it is that historically, the ways that a company marketed to folks was obnoxious. And for better or worse, when you’re talking about highly technical topics and you’re being loudly incorrect, a technical audience is not beholden to some of the more common business norms, and will absolutely call you out in the middle of you basically lying to them. “Oh, crap, what do we do now,” seemed to be a large approach. And the answer that a lot of folks seem to have come up with was DevRel, which… I’ve talked about it before in a bunch of different ways, and my one-liner is generally, “If you work in DevRel, that means you work in marketing, but they’re scared to tell you that.” Jason: [laugh] I don’t think you’re wrong. And you know, the joke that I’ve made for a long time is that they always say that developers hate marketing. But I don’t think developers hate marketing; they just hate the way that your company does it. And— Corey: Oh, wholeheartedly agree. Marketing done right is engaging and fun. A lot of what I do in public is marketing. Like, “Well, that’s not true. You’re just talking about whatever dumb thing AWS did this week.” “Well, yes, but then you stick around to see what else I say, and I just become sort of synonymous with ‘Oh, yeah, that’s the guy that fixes AWS bills.’” That is where our business comes from, believe it or not. Jason: Ri—and I think this was sort of the heart of DevRel is that people understood this. They understood that the best way to get an audience engaged is to have somebody who’s part of that audience engage with them because you want to talk to them on the level that they work. You’re not—you know, a marketing message from somebody who doesn’t understand what you do is almost never going to land. It just doesn’t feel relatable. But if you talk to somebody who’s done the thing that you do for work, and they can tell you a story that’s engaging about the thing that you do for work, you want to hear more. You—you know, you’re looking for a community, and I think that DevRel, the aim was to sort of create that community and give people a space to hang out with the added bonus of putting the company that employs that DevRel as an adjacent player to get some of that extra shine from wherever this community is doing well. Corey: It felt like 2019 was peak DevRel, and that’s where I started to really see that you had, effectively, a lot of community conferences were taken over by DevRel, and you wound up with DevRel pitching to DevRel. And it became so many talks that were aligned with almost imagined problems. I think one of the challenges of working in DevRel is, if you’re not careful, you stop being a practitioner for long enough that you can no longer relate to what the audience is actually dealing with. I can sit here and complain about data center travails that I had back in 2011, but are those still accurate in what’s about to be 2024? Probably not. Jason: And I think the other problem that happens too is that when you work in DevRel, you are beholden to the company’s goals, if the company employees you. And where I think we got really wrong is companies have to make money. We have to charge customers or the company ceases to exist, so when we go out and tell stories, we’re encouraged by the company to focus on the stories that have the highest ROI for the company. And that means that I’m up on stage talking about some, like, far-future, large-scale enterprise thing that very few companies need, but most of the paying customers of my company would need. And it becomes less relatable, and I think that leads to some of the collapse that we saw that you mentioned, where dev events feel less like they’re for devs and more like they’re partner events where DevRel is talking to other DevRel is trying to get opportunities to schmooze partners, and grow our partner pipeline. Corey: That’s a big part of it, where it seems, on some level, that so much of what DevRel does, when I see them talking about DevRel, it doesn’t get around to DevRel is. Instead, it gets stuck in the weeds of what DevRel is not“. We are not shills for our employer.” Okay, I believe you, but also, I don’t ever see you saying anything that directly contravenes what your employer does. Now, let me be clear: neither do I, but I’m also in a position where I can control what my employer does because I have the control to move in directions that align with my beliefs. I’m not saying that it’s impossible to be authentic and true to yourself if you work for an employer, but I have seen a couple of egregious examples of people changing companies and then their position on topics they’ve previously been very vocal on pulled an entire one-eighty, where it’s… it really left a bad taste in my mouth. Jason: Yeah. And I think that’s sort of the trick of being a career DevRel is you have to sort of walk this line of realizing that a DevRel career is probably short at every company. Because if you’re going to go there and be the face of a company, and you’re not the owner of that company, they’re almost inevitably going to start moving in a direction as business develops, that’s not going to line up with your core values. And you can either decide, like, okay that’s fine, they pay me well enough, I’m just going to suck it up and do this thing that I don’t care about that much, or you have to leave. And so, if you’re being honest with yourself, and you know that you’re probably going to spend between 12 and 24 months at any given company as a DevRel, which—by the history I’m seeing, that seems to be pretty accurate—you need to be positioning and talking about things in a way that isn’t painting you into that corner where you have to completely about-face, if you switch companies. But that also works against your goals as a DevRel at the company. So, it’s—I think we’ve made some big mistakes in the DevRel industry, but I will pause to take a breath here [laugh]. Corey: No, no, it’s fine. Like, it’s weird that I view a lot of what I do is being very similar to DevRel, but I would never call myself that. And part of it is because, for better or worse, it is not a title that tends to engender a level of respect from business owners, decision makers, et cetera because it is such a mixed bag. You have people who have been strategic advisors across the board becoming developer advocates. That’s great. You also see people six months out of a boot camp who have decided don’t like writing code very much, so they’re going to just pivot to talking about writing code, and invariably, they believe, more or less, whatever their employer tells them because they don’t have the history and the gravitas to say, “Wait a minute, that sounds like horse pucky to me.” And it’s a very broad continuum. I just don’t like blending in. Jason: Where I think we got a lot of this wrong is that we never did define what DevRel is. As you say, we mostly define what DevRel is not, and that puts us in a weird position where companies see other companies do DevRel, and they mostly pay attention to the ones who do DevRel really well. And they or their investors or other companies say, “You need a great DevRel program. This is the secret to growth.” Because we look at companies that have done it effectively, and we see their growth, and we say, “Clearly this has a strong correlation. We should invest in this.” But they don’t—they haven’t done it themselves. They don’t understand which part of it is that works, so they just say, “We’re hiring for DevRel.” The job description is nine different careers in a trench coat. And the people applying— Corey: Oh, absolutely. It’s nine different things and people wind up subdividing into it, like, “I’m an events planner. I’m not a content writer.” Jason: Right. Corey: Okay, great, but then why not bill yourself as a con—as an events planner, and not have to wear the DevRel cloak? Jason: Exactly. And this is sort of what I’ve seen is that when you put up a DevRel job, they list everything, and then when you apply for a DevRel job, you also don’t want to paint yourself into a corner and say, “My specialty is content,” or, “My specialty is public speaking,” or whatever it is. And therefore you say, “I do DevRel,” to give yourself more latitude as an employee. Which obviously I want to keep optionality anywhere I go. I would like to be able to evolve without being painted into a small box of, like, this is all I’m allowed to do, but it does put us in this really precarious position. And what I’ve noticed a lot of companies do is they hire DevRel—undefined, poorly written job description, poor understanding of the field. They get a DevRel who has a completely different understanding of what DevRel is compared to the people with the role open. Both of them think they’re doing DevRel, they completely disagree on what those fundamentals are, and it leads to a mismatch, to burnout, to frustration, to, you know, this high turnover rate in this field. And everybody then starts to say, well, “DevRel is the problem.” But really, the problem is that we’re not—we’re defining a category, not a job, and I think that’s the part that we really screwed up as an industry. Corey: Yeah. I wish there were a better way around there, but I don’t know what that might be. Because it requires getting a bunch of people to change some cornerstone of what’s become their identity. Jason: This is the part where I—this is probably my spiciest take, but I think that DevRel is marketing, but it is a different kind of marketing. And so, in a perfect world—like, where things start to fall apart is you try to slot DevRel into engineering, or you try to slot it into marketing, as a team on these broader organizations, but the challenge then becomes, if you have DevRel, in marketing, it will inevitably push more toward marketing goals, enterprise goals, top-of-funnel, qualified leads, et cetera. If you put them into engineering, then they have more engineering goals. They want to do developer experience reviews. They want to get out there and do demos. You know, it’s much more engineering-focused—or if you’re doing it right, is much more engineering-focused. But the best DevRel teams are doing both of those with a really good measure, and really clear metrics that don’t line up with engineering or marketing. So, in a perfect world, you would just have an enterprise marketing team, and a developer marketing team, and that developer marketing team would be an organization that is DevRel today. And you would hire specialists—event planners, great speakers, great demo writers, probably put your docs team in there—and treat it as an actual responsibility that requires a larger team than just three or four ex-developers who are now speaking at conferences. Corey: There were massive layoffs across DevRel when the current macroeconomic correction hit, and I’d been worried about it for years in advance because— Jason: Mm-hm. Corey: So, many of these folks spent so much time talking about how they were not marketing, they were absolutely not involved in that. But marketing is the only department that really knows how to describe the value of these sorts of things without having hard metrics tied to it. DevRel spent a lot of time talking about how every metric used to measure them was somehow wrong, and if you took it to its logical conclusion, you would basically give these people a bunch of money—because they are expensive—and about that much money again in annual budget to travel more or less anywhere they want to go, and every time something good happened, as a result, to the company, they had some hand in it nebulously, but you could never do anything to measure their performance, so just trust that they’re doing a good job. This is tremendously untenable. Jason: Mm-hm. Yeah, I think when I was running the developer experience org at Netlify, most of my meetings were justifying the existence of the team because there weren’t good metrics. You can’t put sales qualified leads on DevRel. It doesn’t make any sense because there are too many links in the chain after DevRel opens the door, where somebody has to go from, ‘I’m aware of this company’ to ‘I’ve interacted with the landing page’ to ‘I’ve actually signed up for something’ to ‘now I’m a customer,’ before you can get them to a lead. And so, to have DevRel take credit is actually removing credit from the marketing team. And similarly, if somebody goes through onboarding, a lot of that onboarding can be guided by DevRel. The APIs that new developers interface with can be—the feedback can come from DevRel, but ultimately, the engineering team did that work the product team did that work. So, DevRel is this very interesting thing. I’ve described it as a turbocharger, where if you put it on an engine that runs well, you get better performance out of that engine. If you just plop one on the table, not a lot happens. Corey: Yeah, it’s a good way of putting it. I see very early stage startups looking to hire a developer advocate or DevRel person in their seed stage or Series A, and it’s… there’s something else you’re looking for here. Hire that instead. You’re putting the cart before the horse. Jason: What a lot of people saw is they saw—what they’re thinking of as DevRel is what they saw from very public founders. And when you get a company that’s got this very public-facing, very engaging, charismatic founder, that’s what DevRel feels like. It is, you know, this is the face of the company, we’re showing you what we do on the inside, we’re exposing our process, we’re sharing the behind the scenes, and proving to you that we really are great engineers, and we care a lot. Look at all this cool stuff we’re doing. And that founder up on stage was, I think, the original DevRel. That’s what we used to love about conferences is we would go there and we would see somebody showing this thing they invented, or this new product they had built, and it felt so cool because it was these inspirational moments of watching somebody brilliant do something brilliant. And you got to follow along for that journey. And then we try to— Corey: Yeah I mean, that’s natural, but you see booths at conferences, the small company startup booths, a lot of times you’ll be able to talk to the founders directly. As the booths get bigger, your likelihood of being able to spend time talking to anyone who’s materially involved in the strategic direction of that company gets smaller and smaller. Like, the CEO of GitHub isn’t going to be sitting around at the GitHub booth at re:Invent. They’re going to be, you know, talking to other folks—if they’re there—and going to meetings and whatnot. And then you wind up with this larger and larger company. It’s a sign of success, truly, but it also means that you’ve lost something along the way. Jason: Yeah, I think, you know, it’s the perils of scale. And I think that when you start looking at the function of DevRel, it should sort of be looked at as, like, when we can’t handle this anymore by ourselves, we should look for a specialty the same way that you do for any other function inside of a company. You know, it wouldn’t make sense on day one of a startup to hire a reliability engineer. You’re not at the point where that makes sense. It’s a very expensive person to hire, and you don’t have enough product or community or load to justify that role yet. And hopefully, you will. And I think DevRel is sort of the same way. Like, when you first start out your company, your DevRel should be the founding team. It should be your engineers, sharing the things that they’re building so that the community can see the brilliance of your engineering team, sharing with the community, obviously, being invested in that community. And when you get big enough that those folks can no longer manage that and their day-to-day work, great, then look into adding specialists. But I think you’re right that it’s cart before the horse to, you know, make a DevRel your day-one hire. You just don’t have enough yet. Corey: Yeah, I wish that there were an easy way to skin the cat. I’m not sure there is. I think instead we wind up with people doing what they think is going to work. But I don’t know what the truth is. Jason: Mmm. Corey: At least. That’s where I land on it. Jason: [laugh] Yeah, I mean, every company is unique, and every experience is going to be unique, so I think to say, “Do it exactly like this,” is—that’s got a lot of survivorship bias, and do as I say—but at the same time, I do think there’s some universal truths. Like, it doesn’t really make sense to hire a specialist before you’ve proven that specialty is the secret sauce of your business. And I think you grow when it’s time to grow, not just in case. I think companies that over-hire end up doing some pretty painful layoffs down the road. And, you know, obviously, there’s an opposite end of that spectrum where you can grow too slowly and bury your team and burn everybody out, but I think, you know—we, [laugh] leading into the pandemic, I guess, we had a lot of free money, and I think people were thinking, let’s go build an empire and we’ll grow into that empire. And I think that is a lot of why we’re seeing this really painful downsizing right now, is companies hired just in case and then realized that actually, that in case didn’t come to be. Corey: What is the future of this look like? Easy enough to look back and say, well, that didn’t work? Well, sure. What is the future? Jason: The playbook that we saw before—in, like, 2019 and before—was very event-driven, very, like, webinar-driven. And as we went into 2020, and people were at home, we couldn’t travel, we got real sick of Zoom calls. We don’t want to get on another video call again. And that led to that playbook not working anymore. You know, I don’t want to get on a webinar with a company. I don’t want to go travel to a company event, you know, or at least not very many of them. I want to go see the friends I haven’t seen in three years. So, travel priorities changed, video call fatigue is huge, so we need something that people want to do, that is interesting, and that is, you know, it’s worth making in its own right, so that people will engage with it, and then you work in the company goals as an incidental. Not as a minor incidental, but you know, it’s got to be part of the story; it can’t be the purpose. People won’t sign up for a webinar willingly these days, I don’t think, unless they have exactly the problem that your webinar purports to solve. Corey: And even if they do, it becomes a different story. Jason: Right. Corey: It’s [high buying 00:19:03] signal, but people are constantly besieged by requests for attention. This is complicated by what I’ve seen over the last year. When marketing budgets get—cut, arguably too much, but okay—you see now that there’s this follow-on approach where, okay, what are we going to cut? And people cut things that in many cases work, but are harder to attribute success to. Events, for example, are doing very well because you have someone show up at your booth, you scan their badge. Three weeks later, someone from that company winds up signing up for a trial or whatnot, and ah, I can connect those dots. Whereas you advertise on I don’t know, a podcast as a hypothetical example that I’m pulling out of what’s right in front of me, and someone listening to this and hearing a message from a sponsor, they might be doing something else. They’ll be driving, washing dishes, et cetera, and at best they’ll think, “Okay, I should Google that when I get back to a computer.” And they start hearing about it a few times, and, “Oh. Okay, now it’s time for me to go and start paying serious attention to this because that sounds like it aligns with a problem I have.” They’re not going to remember where they initially heard it. They’re going to come in off of a Google search, so it sounds like it’s all SEO’s benefit that this is working, and it is impossible to attribute. I heard some marketer once say that 50% of your marketing budget is wasted, but you’ll go bankrupt trying to figure out which half. It all ties together. But I can definitely see why people bias for things that are more easily attributed to the metric you care about. Jason: Yes. And I think that this is where I see the biggest opportunity because I think that we have to embrace that marketing signal is directional, not directly attributable. And if you have a focus campaign, you can see your deviation from baseline signups, and general awareness, and all of the things that you want to be true, but you have to be measuring that thing, right? So, if we launch a campaign where we’re going to do some video ads, or we’re going to do some other kind of awareness thing, the goal is brand awareness, and you measure that through, like, does your name get mentioned on social media? Do you see a deviation from baseline signups where it is trending upward? And each of those things is signal that the thing you did worked. Can you directly attribute it? No, but I think a functional team can—you know, we did this at Netlify all the time where we would go and look: what were the efforts that were made, what were the ones that got discussion on different social media platforms, and what was the change from baseline? And we saw certain things always drove a non-trivial deviation from baseline in the right direction. And that’s one of the reasons that I think the future of this is going to be around how do you go broader with your reach? And my big idea—to nutshell it—is, like, dev TV. I think that developers want to see the things that they’re interested in, but they want it to be more interesting than a straight webinar. They want to see other developers using tools and getting a sense of what’s possible in an entertaining way. Like, they want stories, they don’t want straight demos. So, my thinking here is, let’s take this and steer into it. Like, we know that developers love when you put a documentary together. We saw the Vue documentary, and the React documentary, and the GraphQL documentary, and the Kubernetes documentary coming out of the Honeypot team, and they’ve got hundreds of thousands, and in some cases, millions of views because developers really want to see good stories about us, about our community. So, why not give the dev community a , but for web devs? Why not create an Anthony Bourdain -style travel show that highlights various web communities? Why not get out there and make reality competition shows and little docuseries that help us highlight all the things that we’re learning and sharing and building? Every single one of those is going to involve developers talking about the tools they use, talking about the problems they solve, talking about what they were doing before and how they’ve made it better. That’s exactly what a webinar is, that’s what a conference talk is, but instead of getting a small audience at a conference, or you know, 15 to 30 people signing up for your webinar, now we’ve got the potential for hundreds of thousands or even millions of people to watch this thing because it’s fun to watch. And then they become aware of the companies involved because it’s presented by the company; they see the thing get used or talked about by developers in their community, I think there’s a lot of magic and potential in that, and we’ve seen it work in other verticals. Corey: And part of the problem comes down as well to the idea that, okay, you’re going to reach some people in person at events, but the majority of engineers are not going to be at any event or— Jason: Right. Corey: Any event at all, for that matter. They just don’t go to events for a variety of excellent reasons. How do you reach out to them? Video can work, but I always find that requires a bit of a different skill than, I don’t know, podcasting or writing a newsletter. So, many times, it feels like it’s, oh, and now you’re just going to basically stare at the camera, maybe with someone else, and it looks like the Zoom call to which the viewer is not invited. Jason: Right. Corey: They get enough of that. There has to be something else. Jason: And I think this is where the new skill set, I think, is going to come in. It exists in other places. We see this happen in a lot of other industries, where they have in-house production teams, they’re doing collaborations with actors and athletes and bringing people in to make really entertaining stories that drive underlying narratives. I mean, there’s the ones that are really obvious, like, the Nikes of the world, but then there are far less obvious examples. Like, there was this show called . It was… Nick Offerman and Amy Poehler were the hosts. It was the same format as the but around DIY and crafting. And one of the permanent judges was the Etsy trend expert, right? And so, every single episode, as they’re judging this, the Etsy trend expert is telling all of these crafters and contestants, “You know, what you built here is always a top seller on Etsy. This is such a good idea, it’s so well executed, and people love this stuff. It flies off the shelves in Etsy stores.” Every single episode, just perfectly natural product placement, where a celebrity that you know—Nick Offerman and Amy Poehler—are up there, lending—like, you want to see them. They’re so funny and engaging, and then you’ve got the credibility of Etsy’s trend expert telling the contestants of the show, “If you do DIY and crafting, you can make a great living on Etsy. Here are the things that will make that possible.” It’s such subtle, but brilliant product placement throughout the entire thing. We can do that. Like, we have the money, we just spend it in weird places. And I think that as an industry, if we start getting more creative about this and thinking about different ways we can apply these marketing dollars that we’re currently dumping into very expensive partner dinners or billboards or getting, you know, custom swag or funding yet another $150,000 conference sponsorship, we could make a series of a TV show for the same cost as throwing one community event, and we would reach a significantly larger group. Corey: Yeah. Now, there is the other side of it, too, where Lord knows I found this one out the fun way, that creating content requires significant effort and— Jason: Yes. Corey: Focus. And, “Oh, it’s a five-minute video. Great, that could take a day or three to wind up putting together, done right.” One of the hardest weeks of my year is putting together a bunch of five-minute videos throughout the course of re:Invent. So much that is done in advance that is basically breaking the backs of the editing team, who are phenomenal, but it still turns into more than that, where you still have this other piece of it of the actual content creation part. And you can’t spend all your time on that because pretty soon I feel like you become a talking head who doesn’t really do the things that you are talking to the world about. And that content gets pretty easy to see when you start looking at, okay, what did someone actually do? Oh, they were a developer for three years, and they spent the next seven complaining about development, and how everyone is— Jason: [laugh]. Corey: Doing it wrong on YouTube. Hmm… it starts to get a little, how accurate is this really? So, for me, it was always critical that I still be hands-on with things that I’m talking about because otherwise I become a disaster. Jason: And I agree. One of the things that my predecessor at Netlify, Sarah Drasner, put in place was a, what she called an exchange program, where we would rotate the DevRel team onto product, and we rotate product onto the DevRel team. And it was a way of keeping the developer experience engineers actually engineers. They would work on the product, they didn’t do any DevRel work, they were exclusively focused on doing actual engineering work inside our product to just help keep their skills sharp, keep them up to date on what’s going on, build more empathy for the engineers that we talk to every day, build more empathy for our team instead of us—you know, you never want to hear a DevRel throw the engineering team under the bus for not shipping a feature everybody wants. So, these sorts of things are really important, and they’re hard to do because we had to—you know, that’s a lot of negotiation to say, “Hey, can we take one of your engineers for a quarter, and we’ll give you one of our engineers for a quarter, and you got to trust us that’s going to work out in your favor.” [laugh] Right? Like, there’s a lot that goes into this to make that sort of stuff possible. But I absolutely agree. I don’t think you get to make this type of content if you’ve fully stepped out of engineering. You have to keep it part of your practice. Corey: There’s no way around it. You have to be hands-on. I think that’s the right way to do it, otherwise, it just leads to, frankly, disaster. Very often, you’ll see people who are, like, “Oh, they’re great in the DevRel space. What do they do?” And they go to two or three conferences a year, and they have a blog post or so. It’s like, okay, what are they doing the rest of that time? Sometimes the answer is fighting internal political fires. Other times it’s building things and learning these things and figuring out where they stand. There are some people, I don’t want to name names, although an easy one is Kelsey Hightower, who has since really left the stage, that he’s retired, but when he went up on stage and said something—despite the fact that he worked at Google—it was eminently clear that he believed in what he was saying, or he would not say it. Jason: Right. Corey: He was someone who was very clearly aware of the technology about which he was speaking. And that was great. I wish that it were not such a standout moment to see him speak and talk about that. But unfortunately, he kind of is. Not as many people do that as well as we’d like. Jason: Agreed. I think it was always a treat to see Kelsey speak. And there are several others that I can think of in the community who, when they get on stage, you want to be in that audience, and you want to sit down and listen. And then there are a lot of others who when they get on stage, it’s like that this book could have been a blog post, or this—you know, this could have been an email, that kind of thing. Like you could have sent me this repo because all you did was walk through this repo line-by-line, or something that—it doesn’t feel like it came from them; it feels like it’s being communicated by them. And I think that’s, again, like, when I criticize conferences, a lot of my criticism comes from the fact that, coming up, I feel like every speaker that I saw on stage—and this is maybe just memory… playing favorites for me, but I feel like I saw a lot of people on stage who were genuinely passionate about what they were creating, and they were genuinely putting something new into the world every time they got on stage. And I have noticed that I feel less and less like that. Also, I feel like events have gotten less and less likely to put somebody on stage unless they’ve got a big name DevRel title. Like, you have to work at a company that somebody’s heard of because they’re all trying to get that draw because attendance is going down. And— Corey: Right. It’s a—like, having run some conferences myself, the trick is, is you definitely want some ringers in there. People you know will do well, but you also need to give space for new voices to arise. And sometimes it’s a—it always bugs me when it seems like, oh, they’re here because their company is a big sponsor. Of course, they have the keynote. Other times, it’s a… like, hate the actual shill talks, which I don’t see as much, which I’m thankful for; I’d stop going to those conferences, but jeez. Jason: Yeah, and I think it’s definitely one of those, like, this is a thing that we can choose to correct. And I have a suspicion that this is a pendulum not a—not, like, the denouement of—is that the right—how do you say that word? De-NOW-ment? De-NEW-ment? Whatever. Corey: Denouement is my understanding, but that might be the French acc— Jason: Oh, me just— Corey: The French element. Jason: —absolutely butchering that. Yeah [laugh]. I don’t think this is the end of conferences, like we’re seeing them taper into oblivion. I think this is a lull. I think that we’re going to realize that we want to—we really do love being in a place with other developers. I want to do that. I love that. But we need to get back to why we were excited to go to conferences in the first place, which was this sharing of knowledge and inspiration, where you would go see people who were literally moving the world forward in development, and creating new things so that you would walk away with insider info, you had just seen the new thing, up close and personal, had those conversations, and you went back so jazzed to build something new. I feel like these days, I feel more like I went and watched a handful of product demos, and now I’m really just waiting to the hallway track, which is the only, like, actually interesting part at a lot of events these days. Corey: I really want to thank you for taking the time to speak with me. If people want to learn more, where’s the best place for them to find you? Jason: Most of what I share is on learnwithjason.dev https://www.learnwithjason.dev, or if you want a big list of links, I have jason.energy/links https://jason.energy/links, which has a whole bunch of fun stuff for you to find. Corey: Awesome. And we will, of course, include links to that in the show notes. Thank you so much for taking the time to speak with me. I really appreciate it. Jason: Yeah, thanks so much for having me. This was a blast. Corey: Jason Lengstorf, developer media producer at Learn with Jason. I’m Cloud Economist Corey Quinn, and this is . If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment that will no doubt become the basis for somebody’s conference talk. Jason: [laugh]. Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com https://duckbillgroup.com to get started.

33m
Jan 16, 2024
Championing CDK While Accepting the Limits of AWS with Matthew Bonig

Matthew Bonig, Chief Cloud Architect at Defiance Digital, joins Corey on Screaming in the Cloud to discuss his experiences in CDK, why developers can’t be solely reliant on AI or coding tools to fill in the blanks, and his biggest grievances with AWS. Matthew gives an in-depth look at how and why CDK has been so influential for him, as well as the positive work that Defiance Digital is doing as a managed service provider. Corey and Matthew debate the need for AWS to focus on innovating instead of simply surviving off its existing customer base. About Matthew Chief Cloud Architect at Defiance Digital. AWS DevTools Hero, co-author of The CDK Book, author of the Advanced CDK Course. All things CDK and Star Trek. LINKS REFERENCED: __ __ Transcript Announcer: Hello, and welcome to with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is . Corey: Welcome to . I’m Corey Quinn. And I’m back with my first recording that was conducted post-re:Invent and all of its attendant glory and nonsense; we might talk a little bit about what happened at the show. But my guest today is the Chief Cloud Architect at Defiance Digital, Matthew Bonig. Matthew, thank you for joining me. Matthew: Thank you, Corey. Thanks for having me today. Corey: So, you are deep into the CDK. You’re one of the AWS Dev Tools Heros, and you’re the co-author of the CDK Book https://www.thecdkbook.com/, you’ve done a lot, really. You have a course now for Advanced CDK work. Honestly, at this point, it starts to feel like when I say the CDK is a cult, you’re one of the cult leaders, or at least very high up in the cult. Matthew: [laugh] Yes, it was something that I discovered— Corey: Your robe has a fringe on it. Matthew: Yeah, yeah. I discovered this at re:Invent, and it kind of hit me a little surprised that I got called out by a couple people by being the CDK guy. And I didn’t realize that I’d hit that status yet, so I got to get myself a hat, and a cloak, and maybe some fun stuff to wear. Corey: For me, what I saw on the—it was in the run-up to re:Invent, but the big CDK sized announcement was the fact that the new version of Amplify now is much closer tied to the CDK than it was in previous incarnations, which is great. It sort of solves the problem, how do I build a thing through a variety of different tools? Great, and how do I manage that thing programmatically? It seems if, according to what it says on the tin, that it narrows that gap. Of course, here in reality, I haven’t had time to pick anything like that up, and I won’t for months, just because so much comes out all at the same time. What happened in the CDK world? What did I miss? What’s exciting? Matthew: Well, you know, the CDK world has been, I’ve said, fairly mature for a while now. You know, fundamentally the way the CDK works and the functionality within it hasn’t changed drastically. Even when 2.0 came out a couple of years ago, there wasn’t a drastic fundamental change in the way that the API worked. Really, the efforts that we’ve been seeing for the last year or so, and especially the last few months, is trying to button up some functionality, hit some of those edge cases have been rough for some users, and ultimately just continue to fill out things like L2 constructs and maybe try to build out some L3s. I think what they’re doing with Amplify is a good sign that they are trying to, sort of, reach across the aisle and work with other frameworks and work with other systems within AWS to make the experience better, shows their commitment to the CDK of making it really the first class citizen for doing IaC work in AWS. Corey: I think that that is a—that’s a long road, and it’s also a lot of work under the hood that’s not easily appreciated. You’ve remarked at one point that my talk at the CDK Community Day was illuminating, if nothing else, if for no other reason than I dressed up as a legitimate actual cultist and a robe to give the talk— Matthew: Yeah. Loved it. Corey: Because I have deep-seated emotional problems. But it was fun. It talked a bit about my journey with it, where originally I viewed it as, more or less, this thing that was not for me. And a large part of that because I come from a world of sysadmin ops types, where, “I don’t really know how to code,” was sort of my approach to this. Because I was reaff—I had that reaffirmed every time I talked to a developer. Like, “You call this a bash script? It’s terrible.” And sure, but it worked, and it tied into a different knowledge set. Then, when I encountered the CDK for the first time, I tried to use it in Python, which at the time was not really well-supported and led to unfortunate outcomes—I do not know if that’s still the case—what got me into it, in seriousness, was when I tried it a few months later with TypeScript and that started to work a little bit more clearly, with the caveat that I did not know JavaScript, I did not know TypeScript, I had to learn it as I went in service to the CDK. And it works really well insofar as it scratched an itch that I had. There’s a whole class of problems that I don’t have to deal with, which include getting someone who isn’t me involved in some of that codebase, or working in environments where you have either a monorepo or a crap ton of tiny repos scattered everywhere and collaborating with other people. I cannot speak authoritatively to any of that. I will say it’s incredibly annoying when I’m trying to update something written in the CDK, and then I have touched it in a year-and-a-half, and the first thing I have to do is upgrade a whole a bunch of dependencies, clear half a day just to get the warnings to clear before I can go ahead and deploy the things, let alone implement the tiny change I’m logging into the thing to fix. Matthew: Oh, yeah, yes. Yeah, the dependency updates are probably one of the most infuriating things about any Node.js system, and I don’t think that I’ve ever run across any application project framework, anything in which doing dependency upgrades wasn’t a nightmare. And I think it’s because the Node.js community, more so than I’ve seen any other, doesn’t care about semantic versioning. And unfortunately, the CDK doesn’t technically care about semantic versioning, either, which makes it very tricky to do upgrades properly. Corey: There also seems to be the additional problem layered on top, which is all of the various documentation sources that I stumble upon, the official documentation, not terrific at giving real-world use case. It feels like it’s trying to read the dictionary to learn how English works, not really its purpose. So, I find a bunch of blog posts, and all of them tend to approach this ecosystem slightly differently. One talks about using NPM. Another talks about Yarn. If you’re doing anything that involves a web app, as seems to be increasingly common, some will say, “Oh, use WEBrick,” others will recommend using Vite. There’s the whole JavaScript framework wars, and the only unifying best practice seems to be, “Oh, there’s another way to do it that you should be using instead of the way you currently are on.” And if you listen to that, you wind up in hell. Matthew: Oh, horribly so. Yeah, the split in the ecosystem between NPM and Yarn, I think, has been incredibly detrimental to the overall comfort level in Node.js development. You know, I was an NPM guy for many, many years, and then actually, the CDK got me more using Yarn, simply because Yarn handles cross-library dependency resolution a bit different from NPM. And I just ran into fewer errors and fewer problems if I use Yarn along the way. But NPM then came a long way since then. Now, there’s also a PNPM, which is good if you’re using monorepos. But then if you’re going to be using monorepos, there’s another 15 tools out there that you can use for those sorts of things. And ultimately, I think it’s going to be what is the thing that causes you the least amount of problems when dealing with them. And every single dependency issue that I’ve ever run into when upgrading any project, whether it be a web application, a back-end API, or the CDK, it’s always unique enough that there isn’t a one-size-fits-all answer to solving those problems. Corey: The most recent experience I had with the CDK—since you know, you’re basically Mr. CDK at this point, whether you want to be or not, and this is what I do, instead of filing issues anywhere or asking for help, I drag people onto this show, and then basically assault them with my weird use cases—I’m in the process of building something out in the service of shitposting, because that is my nature, and I decided, oh, there’s a new thing called the Dynamo table v2— Matthew: Yes. Corey: Which is great. I looked into it. The big difference is that it addresses it from the beginning as a global table, so you have optionality. Cool. Trying to migrate something that is existing from a Dynamo table to a Dynamo v2 table started throwing CloudFormation issues, so my answer was—this was pre-production—just tear down the stack and rebuild it. That feels like that would be a problem if this had been something that was actually full of data at this point. Matthew: There’s a couple of ways that you could maybe go about it. Now, this is a very special case that you mentioned because you’re talking about fundamentally changing the CloudFormation resource that you are creating, so of course, the CDK being an abstraction layer over top of CloudFormation and the Dynamo table v2 using the global table resource rather than just the table resource. If you had a case where you have to do that migration—and I’ve actually got a client right now who’s very much looking to do that—the process would probably be to orphan the existing table so that you can retain the data and then using an import routine with CloudFormation to bring that in under the new resource. I haven’t tried it yet— Corey: In this case, the table was empty, so it was easy enough to just destroy and then recreate, but it meant that I also had to tear down and recreate everything else in the stack as well, including CloudFront distributions, ACM certificates, so it took 20 minutes. Matthew: Yes. And that is one of the reasons why I often will stick any sort of stateful resource into their own stack so that if I have to go through an operation like this, I’m know that I’m not going to be modifying things that are very painful to drop and recreate, like, CloudFront distributions, which can take a half an hour or more to re-initialize. Corey: Yeah. So, that was fun. The problem got sorted out, but it was still a bit challenging. I feel like at some level, the CDK is hobbled by the fact that under the hood, it really just is just CloudFormation once all is said and done, and CloudFormation has never been the speediest thing. I didn’t understand that until I started playing with Terraform and I saw how much more quickly it could provision things just by calling the service APIs directly. It sort of raises the question of what the hell the CloudFormation service is doing when it takes five times longer to do effectively the same thing. Matthew: Yeah, and the big thing that I appreciate about Terraform versus CloudFormation—speed being kind of the big win—is the fact that Terraform doesn’t obfuscate or hide state from you. If you absolutely need to, you can go in and change that state that relates your Terraform definitions to the back-end resources. You can’t do that with CloudFormation. So CloudFormation, did release few years ago, that import routine, and that was pretty good—not great, but pretty good; it’s getting better all the time—whereas this was a complete and unneeded feature with Terraform because if it came down to the point where you already had a resource, and you just want to tie it to your IaC, you just edit a state file. And they’ve got their import routines and tie-in routines as well, but having that underlying state exposed was a big advantage, in my mind, to Terraform that I missed going to CloudFormation, and still to this day frustrates me that I can’t do that underlying state change. Corey: It becomes painful and challenging, for better or worse. Matthew: Yep. Corey: But yeah, that was what I ran into. Things have improved, though. When I google various topics, I find that the v2 documentation comes up instead of the v1. That was maddening for a little while. I find that there are still things that annoy me, but they become less all the time, partially because I feel like I’m getting better at knowing how to search for them, and also because I think I’m becoming broken in the right ways that the CDK tends to expect. Matthew: Oh, like how? Corey: Oh, easy example here: I was recently trying to get something set up and running, and I don’t know why this is the case, I don’t know if it holds true and other programming languages, but I’m getting more used to the fact that there are two files in TypeScript-land that run a project. One is generally small and in a side directory that no one cares about, I think it’s in a lib or the bin subdirectory. I don’t remember which because I don’t care. And then there are things you have to do within the other equivalent that basically reference each other. And I’ve gotten better at understanding that those aren’t one file, for example. Though they seem to sure be a lot in all the demos, but it’s not how the init process, when you’re starting something new, spins up. Matthew: Yeah, this is the hell of TypeScript, the fact that Node.js, as a runtime, cannot process TypeScript files, so you always have to pass them through a compiler. This is actually one of the things that I like about using Projen for all of my projects instead of using CDK init to start them is that those baseline configurations handle the TypeScript nature of the runtime—or I should say, the anti-TypeScript nature of the runtime a little bit better, and you run into fewer problems. You never have to worry about necessarily doing build routines or other things because they actually use the ts-node runtime to handle your CDK files instead of the node runtime. And I think that’s a big benefit in terms of the developer experience. It just makes it so I generally never have to care about those JavaScript files that get compiled from TypeScript. In the, you know, two years or so I’ve been using Projen, I never have to worry about a build routine to turn that into JavaScript. And that makes the developer experience significantly better. Corey: Yeah, I still miss an awful lot of things that I feel like I should be understanding. I’ve never touched Projen, for example. It’s on my backlog of things to look into. Matthew: Highly recommend it. Corey: Yeah, I also am still in that area of… my TypeScript knowledge has not yet gotten to a point where I see the value of it. It feels like I’ve spent far more time fighting with the arbitrary restrictions that are TypeScript than it has saved me from typing errors in anything that I’ve built. I believe it has to come back around at some point of familiarity with the language, but I’m not there yet. Matthew: Got you. So, Python developer before this? Corey: Ish. Mostly brute force and enthusiasm, but yeah, Python. Matthew: Python, and I think you said bash scripting and other things that have no inherent typing built into it. Corey: Right. Matthew: Yeah, that is a problem, I think… that I thankfully avoided. I was an application developer for many years. My background and my experience has always been around strongly typed languages, so when it came to adopting the CDK, everything felt very natural to me. But as I’ve worked with people over the years, both internally at Defiance as well as people in the community that don’t have a background in that, I’ve been exposed to how problematic TypeScript as a language truly can be for someone who has never had this experience of, I’ve got this thing and it has a well-defined shape to it, and if I don’t respect that, then I’m going to bang my head against to these weird errors that are hard to comprehend and hard to grok way more than it feels like I’m getting value from it. Corey: There’s also a lack of understanding around how to structure projects, in my case, where all right, I have a front-end and I have a back-end. Is this all within the context of the CDK project? And this, of course, also presupposes that everything I’m doing is effectively greenfield, in which case, great, do I use the front-end wizard tutorial thing that I’m following, and how does that integrate when I’m using the CDK to deploy it somewhere, and so on and so forth. It’s stuff that makes sense once you have angry and loud enough opinions, but I don’t yet. Matthew: Yeah, so the key thing that I tell people about project structure—because it does often come up a lot—is that ultimately, the CDK itself doesn’t really care how you structure things. So, how you structure, where you put certain files, how you organize them, is your personal preference. Now, there are some exceptions to that. When it comes to things like Lambda functions that you’re building or Docker files, there are probably some better practices you can go through, but it’s actually more dependent on those systems rather than the CDK directly itself. So I go through, in the Advanced CDK course, you know, my basic starting directory structure for everything, which is stacks, constructs, apps, and stages all go into their own specific directories. But then once those directories start growing—because I’ve added more stacks, more constructs, and things—once I get to around five to maybe seven files in a directory, then I look at them and go, “Okay, how can I group these together?” I create subdirectories, I move those files around. My development tool of choice, which is WebStorm—JetBrains’s long-running tool—handles the moving of those files for me, so all of my imports, all of my references automatically get updated accordingly, which is really nice, and I can refactor things as much as I want to without too much of a problem. So, as a project grows over time, my directory structure can change to make sure that it is readable, well organized, and understandable, and it’s never been too much of a problem. Corey: Yeah, it’s one of those things that does take some getting used to. It helps, I think, having a mentor of sorts to take you under their wing and explain these things to you, but that’s a hard thing to scale as well. So, in the absence of that we wind up defaulting to oh, whatever the most recent blog post we read is. Matthew: Yeah. Yeah, and I think one of the truest, I think, and truthful complaints I’ve heard about the CDK and why it can be fundamentally very difficult is that it has no guardrails. It is a general-purpose languages, and general purpose languages don’t have guardrails. They don’t want to be in the way of you building whatever you need to build. But when it comes to an Infrastructure as Code project, which is inherently very different from an API or a website or other, sort of, more typical programming projects, having guardrail—or not having guardrails is a bad thing, and it can really lead you down some bad paths. I remember working with a client this last year who had leveraged context instead of properties on classes to hand configuration value down through code, down through stacks and constructs and things like that. And it worked. It functionally got them what they needed, up until a point, and then all of sudden, they were like, “Well, now we want to do X with the CDK, and we simply cannot because we’ve now painted ourselves into a corner.” And that’s the downside of not having these good guard rails. And I think that early, they needed to do this early on. When the CDK was initially released, and it got popular back around the 0.4, 0.5 timeframe—I think I picked it up right around 0.4, too—when it officially hit a 1.0 release, there should have been a better set of guidelines and best practices published. You can go to the documents and see them, and they have been published, but it really didn’t go far enough to really explain how and why you had to take the steps to make sure you didn’t screw yourself six months later. Corey: It’s sort of those one-way doors you don’t realize you’re passing through when you first start building something. And I find, especially when you follow my development approach of more or less used to be copying and pasting for various places, now it’s copying and pasting from one place which is Chat-Gippity-4, then—although I’ve seen increasingly GitHub’s Copilot has been great at this and Code Whisperer, in my experience, has not yet been worth the energy it takes to really go diving into it. Your mileage may of course vary on that. But I found it was not making materially better or suggestions on CDK stuff then Copilot was. Matthew: Yeah, I haven’t tried Code Whisperer outside of the shell. I’ve been using Copilot for the last year and absolutely adore it. I think it has completely changed the way that I felt about coding. I saw writing code for the last couple of years as being very tedious and very boring in terms of there weren’t interesting problems to solve, and Copilot, as I’ve seen it, is autocomplete on steroids. So, it doesn’t keep me from having to solve the interesting problems; it just keeps me from having to type out the boring solutions, and it’s the thing that I love about it. Now, hopefully, Code Whisperer continues to get better over time. I’m hoping all of Amazon’s GenAI products continue to get better over time and I can maybe ditch a subscription to Copilot, but for now, Copilot is still my thing. And it’s producing good enough results for me. Thankfully because I’ve been working with it for four years now, I don’t rely on it to answer my questions about how to use constructs. I go back to the docs for those. If I need to. Corey: It occurs to me that I can talk about this now because this episode will not air until after this has become generally available, but what’s really spanked it from my perspective has been Google’s Duet. And the key defining difference is, as I’m in one of these files—in many cases, I’m doing something with React these days due to an escalating series of weird choices—and— Matthew: My apologies, by the way. My condolences, I should say. Corey: Well, yeah. Well, things like Copilot Chat are great when they say, “Oh yeah, assuming that you’re handling the state this way in your component, now…” What I love about Duet is it goes, and it actually checks, which is awesome. And it has contextual awareness of the entire project, not just the three lines that I’m talking about, or the file that I’m looking at this moment. It goes ahead and does the intelligent thing of looking at some of these things. It still has some problems where it’s confidently wrong about things that really shouldn’t be, but okay, early days. Matthew: Sure. Yeah, I’ll need to check that out a little bit more because I still, to this day, despise working with React. It is still my framework of choice because the ecosystem is so good around it. And so, established that I know that whatever problem I have, I’ll find 14 blogs, and maybe one of them is the answer that I want, versus any other framework where it still feels so very new and so very immature that I will probably beat my head more than I want to. Web development now is a hobby, not a job, so I don’t want to bang my head against a hobby project. Corey: I tend to view, on some level, that these AIs coding assistants are good enough to get me almost anywhere I need to go, to the point where a beginner or enthusiastic amateur will be able to get sorted out. And for a lot of what I’m building, that’s all I really need. I don’t need this to be something that will withstand the rigors of production at a bank, for example. One challenge I have seen with all these things is there’s a delay in something being released and their training data growing to understand those things. Very often it’ll wind up giving me recommendations for—I forget the name of it, but there was a state manager in React that the first thing you saw when you installed it was, “This has been deprecated. This is the new replacement.” And if you explicitly ask about the replacement, it does the right thing, but it just cheerfully goes ahead and tells you to use ancient stuff or apply poor security practices or the rest. Matthew: Yeah, that’s very scary to me, to be honest because I think these AI development tools—for me, it’s revitalized my interest in doing development, but where I get really, really scared is where they become a dependency in writing the right code. And every time I ever use Copilot to fill out stuff, I’m always double-checking, and I’m always making sure that this is right or that is right. And what I worry about is those developers who are maybe still learning some things, or are having to write in-line SQL on to their back-end and let Copilot, or Code Whisperer, or whatever tool they pick fill this stuff out, and that answer is based on a solution that works for a 10,000 record database, but fails horribly on a 100 million record database. And now all of a sudden, and you’ve got this problem that is just festering in through a dev environment, in through a QA environment, and even maybe into a prod environment, and you don’t find out that failure until six months later, when some database table runs past its magical limit and now all of sudden, you’ve got these queries that are failing, they’re crashing databases, they’re running into problems, and this developer that didn’t really know what they built in the first place is now being asked, “Why doesn’t your code work,” and they just sort of have to go, “Maybe ChatGPT can tell me why my code doesn’t work.” And that’s the scariest part of me to these things is that they’re a little bit too good at answering difficult questions with a simple answer. There is no, “It depends,” with these answers, and there needs to be for a lot of what we do in complex systems that, for example, in the AWS world, we’re expected to build complex systems, and ChatGPT and these other tools are bad at that. Corey: We’re required to build complex systems, and, on some level, I would put that onus on Amazon in many respects. I mean, the challenge I keep smacking into is that they’re building—they’re giving you a bunch of components and expecting you to assemble them all yourself to achieve even relatively simple things. It increasingly feels like this is the direction that they want customers to go in because they’re bad at moving up the stack and develop—delivering integrated solutions themselves. Matthew: Well, so I would wonder, would you consider a relatively simple system, then? Corey: Okay, one of the things I like to do is go out in the evenings, and sometimes with a friend, I’ll have a few too many beers. And then I’ll come up with an idea for I want to redirect this random domain that I want to buy to someone else’s website. The end. Now, if you go with Namecheap, or GoDaddy, or one of these various things, you can set that up in their mobile app with a couple of clicks and a payment, and you’re done. With AWS, you have a minimum of six different services you need to work with, many of which do not support anything on a mobile basis and don’t talk to one another relatively well. I built a state machine out of step functions that will do a lot of it for me, but it’s an example of having to touch so many different things just for a relatively straightforward solution space that is a common problem. And that’s a small example, but you see it across the board. Matthew: Yeah, yeah. I was expecting you to come up with a little bit of a different answer for what a simple system is, for example, a website. Everyone likes to say, “Oh, a static website with just raw HTML. That’s a simple”— Corey: No, that’s hard as hell because the devil is in the details, and it slices you to ribbons whenever you go down that path. Matthew: Exactly. Corey: No, I’m talking things that a human being would do without needing to be an expert in getting that many different AWS services to talk to one another. Matthew: Yeah, and I agree that AWS traditionally is very bad at moving up that stack and getting those things to work. You had mentioned at the very top of this about Amplify. Amplify is a system that I have tried once or twice, and I generally think that, for the right use case, is an excellent system and I really like a lot of what it does. Corey: It is. I agree. Having gone down that, building up my scavenger hunt app that I’ll be open-sourcing at some point next year. Matthew: Yeah. And it’s fantastic, but it has a very steep cliff where you hit that point where all of a sudden, you go, “Okay, I added this, and I added this, and I added this, and now I want to add this one other thing, but to do it, now all of a sudden, I have to go through a tremendous amount of work.” It wasn’t just the simple push button that the previous four steps were. Now, I have this one other thing that I need to do, and now it’s a very difficult thing to incorporate into my system. And I’m having to learn all new stuff that I never had to care about before because Amplify made it way too easy. And I don’t think this is necessarily an AWS problem. I think this is just a fundamentally difficult software problem to solve. Microsoft, I spent years and years in the Microsoft world, and this was my biggest complaint about Microsoft was that they made extremely difficult things, far too simple to solve. And then once those systems became either buggy, problematic, misconfigured, whatever you want to call it, once they stopped working for some reason, the people who were responsible for figuring those answers out didn’t have the preceding knowledge because they didn’t need it. And then all of a sudden, they go, “Well, I don’t know how to solve this problem because I was told it was just this push-button thing.” So, Amplify is great, and I think it’s fantastic, but it is a very, very difficult problem to solve. Amazon has proven to be very, very good at building the fundamentals, and I think that they function very well as a platform service, as a building blocks. But they give you the Lego pieces, and they expect you to build the very complex Batmobile. And they can maybe give you some custom pieces here and there, like the fenders, and the tires, and stuff like that, but that’s not their bread and butter. Corey: Well, even starting with the CDK is a perfect example. Like, you can use the CDK init to create a new project from scratch, which is awesome. I love the fact that that exists, but it doesn’t go far enough. It doesn’t automatically create a repo you store the thing in that in turn hooks up to a CI/CD process that will wind up doing the build and deploy. Instead, it expects to do that all locally, which is a counter pattern. That’s an anti-pattern. It’ll lead you down the wrong path. And you always have to build these things from scratch yourself as you keep going. At least that’s what it feels like. Matthew: Yeah, it is. And I think that here at Defiance Digital, our job as an MSP is to talk to the customer and figure out, but what are those very specific things you need? So, we do build new CDK repos all the time for our customers. But some of our customers want a trunk base system. Some of them want a branching or a development branch base system. Some of them have a very complex SDLC process within a PR stage of code changes versus a slightly less complex one after things have been merged into trunk. So, we fundamentally look at it like we’re that bridge between the two, and in that case, AWS works great. In fact, all SaaS solutions are really nice because they give us those building blocks and then we provide value by figuring out which one of those we need to incorporate in for our clients. But every single one of our clients is very different. And we’ve only got, you know, less than a dozen right now. But you know, I’ve got project managers and directors always coming back to me and saying, “Well, how do we cookie-cutter this process?” And you can’t do it. It’s just very, very difficult. Not in a small-scale. Maybe when you’re really big, and you’re a company like AWS who has thousands, if not potentially millions of customers, you can find those patterns, but it is a very fundamentally difficult problem to solve, and we’ve seen multiple companies over the last two decades try to do these things and ultimately fail. So, I don’t necessarily blame AWS for not having these things or not doing them well. Corey: Yes and no. I mean, GitHub delivers excellent experience for the user, start to finish. There’s—Vercel does something very similar over in the front-end universe, too, where it is clearly possible, but it seems that designing user interfaces and integrating disparate things together is not an Amazon’s DNA, which makes sense when you view the two-pizza teams assembling to build larger things. But man, is that a frustration. Matthew: Yeah. I really wonder if this two-pizza team mentality can ever work well for products that are bigger than just the fundamental concepts. I think Amplify is pretty good, but if you really want something that is this service that works for 80% of customers, you can’t do it with five people. You can’t do it with six. You need to have teams like what GitHub and what Vercel and other things, where teams are potentially dozens of people that really coordinate things and have a good project manager and product owner and understand the problem very well. And it’s just very difficult with these very, very small teams to get that going. I don’t know what the future of AWS looks like. It feels like a very Microsoft in the mid-2000s, which is, they’re running off of their existing customers, they don’t really have a need to innovate significantly because they have a lot of people locked in, they would be just fine for years on years on end with the products they have. So, there isn’t a huge driver for doing it, not like, maybe, GCP or Azure really need to start to continue to innovate stronger in this space to pick up more customers. AWS doesn’t have a problem getting customers. And if there isn’t a significant change in the mentality, like what Microsoft saw at the end of the 2000s with getting rid of Ballmer, bringing in Satya and really changing the mentality inside the company, I don’t see AWS breaking out from this anytime soon. But I think that’s actually a good thing. I think AWS should stick to just building the fundamentals, and I think that they should rely on their partners and their third parties to bridge that gap. I think Jeremy Daly at Ampt and what they’re building over there is a fantastic product. Corey: Yeah. The problem is that Amazon seems to be in denial about a lot of this, at least with what they’re saying publicly. Matthew: Yeah, but what they say publicly and how they feel internally could be very, very different. I would say that, you know, we don’t know what they’re thinking internally. And that’s fine. I don’t necessarily need to. I think more specifically, we need to understand what their roadmap looks like and we need to understand, you know, what, are they going to change in the future to maybe fill in some of these gaps. I would say that the problem you said earlier about being able to do a simple website redirect, I don’t think that’s Amazon’s desire to build those things. I think there should be a third-party that’s built on top of AWS, and maybe even works directly within your AWS account as a marketplace product for doing that, but I don’t think that’s necessarily in the benefit of AWS to build that directly. Corey: We’ll see. I’m very curious to see how this unfolds because a lot of customers want answers that require things that have to be assembled for them. I mean, honestly, a lot of the GenAI stuff is squarely in that category. Matthew: Agreed, but is this something where AWS needs to build it internally, and then we’ve got a product like App Composer, or Copilot, or things where they try, and then because they don’t get enough traction, it just feels like they stall out and get stagnant? I mean, App Composer was a keynote product announcement during last year’s re:Invent, and this year, we saw them introduce the ability to step function editing within it, and introduce the functionality into your IDE, VS Code directly. Both good things, but a year’s worth of development effort to release those two features feels slow to me. The integration to VS Code should have been simple. Corey: Yeah. They are not the innovative company that would turn around and deliver something incredible three months after something had launched, “And here’s a great new series of features around it.” It feels like the pace of innovation and face of delivery has massively slowed. Matthew: Yeah. And that’s the scariest thing for me. And, you know, we saw this a little bit with a discussion recently in the cdk.dev https://cdk.dev/ server because if you take a look at what’s been happening with the CDK application for the last six months and even almost a year now, it feels like the pace of changes within the codebase has slowed. There have been multiple releases over the course of the last year where the release at the end of the week—and they hit a pretty regular cadence of a release every week—that release at the end of the week fixes one bug or adds one small feature change to one construct in some library that maybe 10% of users are going to use. And that’s troublesome. One of the main reasons why I ditched the Terraform and went hard on the CDK was that I looked at how many issues were open on the Terraform AWS provider, and how many missing features were, and how slow they were to incorporate those in, and said, “I can’t invest another two years into this product if there isn’t going to be that innovation.” And I wasn’t in a place to do the development work myself—despite the fact that you can because it’s open-source and providers are forkable—and the CDK is getting real close to that same spot right now. So, this weekend—and I know this is going to come out, you know, weeks later—but you know, the weekend of December 10th, they announced a change to the way that they were going to take contributions from the CDK community. And the long and short of it right now—and there’s still some debate over exactly what they said—is, we’re not going to accept brand-new L2 constructs from the community. Those have to be built internally by AWS only. That’s a dr—step in the wrong direction. I understand why they’re taking that approach. Contributions in the CDK have been very rough for the last four or five months because of the previous policies they put into place, but this is an open-source product. It’s supposed to be an open-source product. It’s also a very complex set of code because of all of the various AWS services that are being hit by it. This isn’t just Amplify, which is hitting a couple of things here and there. This is potentially— Corey: It touches everything. Matthew: It touches everything. Corey: Yeah, I can see their perspective, but they’ve got to get way better at supporting things rapidly if they want to play that game. Matthew: And they can’t do that internally with AWS, not with a two-pizza team. Corey: No. And there’s an increasing philosophy I’m hearing from teams of, “Well, my service supports it. Other stuff, that’s not my area of responsibility.” The wisdom that I’ve seen that really encapsulates this is written on Colm MacCárthaigh’s old laptop in 2019: “AWS is the product.” That’s the truth. It’s not about the individual components; it’s about the whole, collectively. Matthew: Right. And so, if we’re not getting these L2 constructs and these things being built out for all of the services that CloudFormation hits, then the product feels stalled, there isn’t a good initiative for users to continue trying to adopt it because over time, users are just going to hit more and more services in AWS, not fewer as they use the products. That’s what AWS wants. They want people to be using VPC Lattice and all the GenAI stuff, and Glue, and SageMaker, and all these things, but if you don’t have those L2 constructs, then there’s no advantage of the CDK over top of just raw CloudFormation. So, the step in the right direction, in my opinion, would have been to make it easier and better for outside contributions to get into CDK, and they went the opposite way, and that’s scary. Now, they basically said, go build these on your own, go publish them on the Construct Hub, and if they’re good, we’ll incorporate them in. But they also didn’t define what good was, and what makes a good API. API development is very difficult. How do you build a construct that’s going to hit 80% of use cases and still give you an out for those other 20 you missed? That’s fundamentally hard. Corey: It is. And I don’t know if there are good answers, yet. Maybe they’re going in the right direction, maybe they’re not. Matthew: Time will tell. My hope is that I can try to do some videos here after the new year to try to maybe make this a better experience for people. What does good API design look like? What is it like to implement these things well so they can be incorporated in? There has been a lot of pushback already, just after the first couple of days, from some very vocal users within the CDK community saying, “This is bad. This is fundamentally bad stuff.” Even from big fanboys like myself, who have supported the CDK, who co-authored the , and they said, “This is not good.” So, we’ll see what happens. Maybe they change direction after a couple of days. Maybe this is— turns out to be a great way to do it. Only time will really tell at this point. Corey: Awesome. And where can people go to find out more as you continue your exploration in this space and find out what you’re up to in general? Matthew: So, I do have a Twitter account at@mattbonig on Twitter https://twitter.com/mattbonig, however, I am probably going to be doing less and less over there. Engagement and the community as a whole over there has been problematic for a while, and I’ll probably be doing more on LinkedIn https://www.linkedin.com/in/matthewbonig/, so you can find me there. Just search for Matthew Bonig. It’s a very unique name. I’ve also got a website, matthewbonig.com https://matthewbonig.com/, and from there, you can see blog articles, and a link to my Advanced CDK course, which I’m going to continue adding sessions to over the course of the next few months. I’ve got one coming out shortly about the deadly embrace and how you can work through that problem with the deadly embrace and hopefully not be so scared about multi-stack applications. Corey: I look forward to that because Lord knows, I’m running into that one myself increasingly frequently. Matthew: Well, good. I will hopefully be able to get this video out and solve all of your problems very easily. Corey: Awesome. Thank you so much for taking the time to speak with me. I appreciate it. Matthew: Thank you for having me. I really appreciate it. Corey: Matthew Bonig, Chief Cloud Architect at Defiance Digital, AWS Dev Tools Hero, and oh so much more. I’m Cloud Economist Corey Quinn, and this is . If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment that you will then have to wind up building the implementation for that constructs that power that comment yourself because apparently we’re not allowed to build them globally anymore. Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com https://duckbillgroup.com/ to get started.

43m
Jan 11, 2024
The Importance of the Platform-As-a-Product Mentality with Evelyn Osman

Evelyn Osman, Principal Platform Engineer at AutoScout24, joins Corey on Screaming in the Cloud to discuss the dire need for developers to agree on a standardized tool set in order to scale their projects and innovate quickly. Corey and Evelyn pick apart the new products being launched in cloud computing and discover a large disconnect between what the industry needs and what is actually being created. Evelyn shares her thoughts on why viewing platforms as products themselves forces developers to get into the minds of their users and produces a better end result. About Evelyn Evelyn is a recovering improviser currently role playing as a Lead Platform Engineer at Autoscout24 in Munich, Germany. While she says she specializes in AWS architecture and integration after spending 11 years with it, in truth she spends her days convincing engineers that a product mindset will make them hate their product managers less. LINKS REFERENCED: __ __ Transcript Announcer: Hello, and welcome to with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is . Corey: Welcome to . I’m Corey Quinn. My guest today is Evelyn Osman, engineering manager at AutoScout24. Evelyn, thank you for joining me. Evelyn: Thank you very much, Corey. It’s actually really fun to be on here. Corey: I have to say one of the big reasons that I was enthused to talk to you is that you have been using AWS—to be direct—longer than I have, and that puts you in a somewhat rarefied position where AWS’s customer base has absolutely exploded over the past 15 years that it’s been around, but at the beginning, it was a very different type of thing. Nowadays, it seems like we’ve lost some of that magic from the beginning. Where do you land on that whole topic? Evelyn: That’s actually a really good point because I always like to say, you know, when I come into a room, you know, I really started doing introductions like, “Oh, you know, hey,” I’m like, you know, “I’m this director, I’ve done this XYZ,” and I always say, like, “I’m Evelyn, engineering manager, or architect, or however,” and then I say, you know, “I’ve been working with AWS, you know, 11, 12 years,” or now I can’t quite remember. Corey: Time becomes a flat circle. The pandemic didn’t help. Evelyn: [laugh] Yeah, I just, like, a look at that the year, and I’m like, “Jesus. It’s been that long.” Yeah. And usually, like you know, you get some odd looks like, “Oh, my God, you must be a sage.” And for me, I’m… you see how different services kind of, like, have just been reinventions of another one, or they just take a managed service and make another managed service around it. So, I feel that there’s a lot of where it’s just, you know, wrapping up a pretty bow, and calling it something different, it feels like. Corey: That’s what I’ve been low-key asking people for a while now over the past year, namely, “What is the most foundational, interesting thing that AWS has done lately, that winds up solving for this problem of whatever it is you do as a company? What is it that has foundationally made things better that AWS has put out in the last service? What was it?” And the answers I get are all depressingly far in the past, I have to say. What’s yours? Evelyn: Honestly, I think the biggest game-changer I remember experiencing was at an analyst summit in Stockholm when they announced Lambda. Corey: That was announced before I even got into this space, as an example of how far back things were. And you’re right. That was transformative. That was awesome. Evelyn: Yeah, precisely. Because before, you know, we were always, like, trying to figure, okay, how do we, like, launch an instance, run some short code, and then clean it up. AWS is going to charge for an hour, so we need to figure out, you know, how to pack everything into one instance, run for one hour. And then they announced Lambda, and suddenly, like, holy shit, this is actually a game changer. We can actually write small functions that do specific things. And, you know, you go from, like, microservices, like, to like, tiny, serverless functions. So, that was huge. And then DynamoDB along with that, really kind of like, transformed the entire space for us in many ways. So, back when I was at TIBCO, there was a few innovations around that, even, like, one startup inside TIBCO that quite literally, their entire product was just Lambda functions. And one of their problems was, they wanted to sell in the Marketplace, and they couldn’t figure out how to sell Lambda on the marketplace. Corey: It’s kind of wild when we see just how far it’s come, but also how much they’ve announced that doesn’t change that much, to be direct. For me, one of the big changes that I remember that really made things better for customers—thought it took a couple of years—was EFS. And even that’s a little bit embarrassing because all that is, “All right, we finally found a way to stuff a NetApp into us-east-1,” so now NFS, just like you used to use it in the 90s and the naughts, can be done responsibly in the cloud. And that, on some level, wasn’t a feature launch so much as it was a concession to the ways that companies had built things and weren’t likely to change. Evelyn: Honestly, I found the EFS launch to be a bit embarrassing because, like, you know, when you look closer at it, you realize, like, the performance isn’t actually that great. Corey: Oh, it was horrible when it launched. It would just slam to a halt because you got the IOPS scaled with how much data you stored on it. The documentation explicitly said to use dd to start loading a bunch of data onto it to increase the performance. It’s like, “Look, just sandbag the thing so it does what you’d want.” And all that stuff got fixed, but at the time it looked like it was clown shoes. Evelyn: Yeah, and that reminds me of, like, EBS’s, like, gp2 when we’re, like you know, we’re talking, like, okay, provision IOPS with gp2. We just kept saying, like, just give yourself really big volume for performance. And it feel like they just kind of kept that with EFS. And it took years for them to really iterate off of that. Yeah, so, like, EFS was a huge thing, and I see us, we’re still using it now today, and like, we’re trying to integrate, especially for, like, data center migrations, but yeah, you always see that a lot of these were first more for, like, you know, data centers to the cloud, you know. So, first I had, like, EC2 classic. That’s where I started. And I always like to tell a story that in my team, we’re talking about using AWS, I was the only person fiercely against it because we did basically large data processing—sorry, I forget the right words—data analytics. There we go [laugh]. Corey: I remember that, too. When it first came out, it was, “This sounds dangerous and scary, and it’s going to be a flash in the pan because who would ever trust their core compute infrastructure to some random third-party company, especially a bookstore?” And yeah, I think I got that one very wrong. Evelyn: Yeah, exactly. I was just like, no way. You know, I see all these articles talking about, like, terrible disk performance, and here I am, where it’s like, it’s my bread and butter. I’m specialized in it, you know? I write code in my sleep and such. [Yeah, the interesting thing is, I was like, first, it was like, I can 00:06:03] launch services, you know, to kind of replicate when you get in a data center to make it feature comparable, and then it was taking all this complex services and wrapping it up in a pretty bow for—as a managed service. Like, EKS, I think, was the biggest one, if we’re looking at managed services. Technically Elasticsearch, but I feel like that was the redheaded stepchild for quite some time. Corey: Yeah, there was—Elasticsearch was a weird one, and still is. It’s not a pleasant service to run in any meaningful sense. Like, what people actually want as the next enhancement that would excite everyone is, I want a serverless version of this thing where I can just point it at a bunch of data, I hit an API that I don’t have to manage, and get Elasticsearch results back from. They finally launched a serverless offering that’s anything but. You have to still provision compute units for it, so apparently, the word serverless just means managed service over at AWS-land now. And it just, it ties into the increasing sense of disappointment I’ve had with almost all of their recent launches versus what I felt they could have been. Evelyn: Yeah, the interesting thing about Elasticsearch is, a couple of years ago, they came out with OpenSearch, a competing Elasticsearch after [unintelligible 00:07:08] kind of gave us the finger and change the licensing. I mean, OpenSearch actually become a really great offering if you run it yourself, but if you use their managed service, it can kind—you lose all the benefits, in a way. Corey: I’m curious, as well, to get your take on what I’ve been seeing that I think could only be described as an internal shift, where it’s almost as if there’s been a decree passed down that every service has to run its own P&L or whatnot, and as a result, everything that gets put out seems to be monetized in weird ways, even when I’d argue it shouldn’t be. The classic example I like to use for this is AWS Config, where it charges you per evaluation, and that happens whenever a cloud resource changes. What that means is that by using the cloud dynamically—the way that they supposedly want us to do—we wind up paying a fee for that as a result. And it’s not like anyone is using that service in isolation; it is definitionally being used as people are using other cloud resources, so why does it cost money? And the answer is because literally everything they put out costs money. Evelyn: Yep, pretty simple. Oftentimes, there’s, like, R&D that goes into it, but the charges seem a bit… odd. Like from an S3 lens, was, I mean, that’s, like, you know, if you’re talking about services, that was actually a really nice one, very nice holistic overview, you know, like, I could drill into a data lake and, like, look into things. But if you actually want to get anything useful, you have to pay for it. Corey: Yeah. Everything seems to, for one reason or another, be stuck in this place where, “Well, if you want to use it, it’s going to cost.” And what that means is that it gets harder and harder to do anything that even remotely resembles being able to wind up figuring out where’s the spend going, or what’s it going to cost me as time goes on? Because it’s not just what are the resources I’m spinning up going to cost, what are the second, third, and fourth-order effects of that? And the honest answer is, well, nobody knows. You’re going to have to basically run an experiment and find out. Evelyn: Yeah. No, true. So, what I… at AutoScout, we actually ended up doing is—because we’re trying to figure out how to tackle these costs—is they—we built an in-house cost allocation solution so we could track all of that. Now, AWS has actually improved Cost Explorer quite a bit, and even, I think, Billing Conductor was one that came out [unintelligible 00:09:21], kind of like, do a custom tiered and account pricing model where you can kind of do the same thing. But even that also, there is a cost with it. I think that was trying to compete with other, you know, vendors doing similar solutions. But it still isn’t something where we see that either there’s, like, arbitrarily low pricing there, or the costs itself doesn’t really quite make sense. Like, AWS [unintelligible 00:09:45], as you mentioned, it’s a terrific service. You know, we try to use it for compliance enforcement and other things, catching bad behavior, but then as soon as people see the price tag, we just run away from it. So, a lot of the security services themselves, actually, the costs, kind of like, goes—skyrockets tremendously when you start trying to use it across a large organization. And oftentimes, the organization isn’t actually that large. Corey: Yeah, it gets to this point where, especially in small environments, you have to spend more energy and money chasing down what the cost is than you’re actually spending on the thing. There were blog posts early on that, “Oh, here’s how you analyze your bill with Redshift,” and that was a minimum 750 bucks a month. It’s, well, I’m guessing that that’s not really for my $50 a month account. Evelyn: Yeah. No, precisely. I remember seeing that, like, entire ETL process is just, you know, analyze your invoice. Cost [unintelligible 00:10:33], you know, is fantastic, but at the end of the day, like, what you’re actually looking at [laugh], is infinitesimally small compared to all the data in that report. Like, I think oftentimes, it’s simply, you know, like, I just want to look at my resources and allocate them in a multidimensional way. Which actually isn’t really that multidimensional, when you think about it [laugh]. Corey: Increasingly, Cost Explorer has gotten better. It’s not a new service, but every iteration seems to improve it to a point now where I’m talking to folks, and they’re having a hard time justifying most of the tools in the cost optimization space, just because, okay, they want a percentage of my spend on AWS to basically be a slightly better version of a thing that’s already improving and works for free. That doesn’t necessarily make sense. And I feel like that’s what you get trapped into when you start going down the VC path in the cost optimization space. You’ve got to wind up having a revenue model and an offering that scales through software… and I thought, originally, I was going to be doing something like that. At this point, I’m unconvinced that anything like that is really tenable. Evelyn: Yeah. When you’re a small organization you’re trying to optimize, you might not have the expertise and the knowledge to do so, so when one of these small consultancies comes along, saying, “Hey, we’re going to charge you a really small percentage of your invoice,” like, okay, great. That’s, like, you know, like, a few $100 a month to make sure I’m fully optimized, and I’m saving, you know, far more than that. But as soon as your invoice turns into, you know, it’s like $100,000, or $300,000 or more, that percentage becomes rather significant. And I’ve had vendors come to me and, like, talk to me and is like, “Hey, we can, you know, for a small percentage, you know, we’re going to do this machine learning, you know, AI optimization for you. You know, you don’t have to do anything. We guaranteed buybacks your RIs.” And as soon as you look at the price tag with it, we just have to walk away. Or oftentimes we look at it, and there are truly very simple ways to do it on your own, if you just kind of put some thought into it. Corey: While we want to talking a bit before this show, you taught me something new about GameLift, which I think is a different problem that AWS has been dealing with lately. I’ve never paid much attention to it because it is the—as I assume from what it says on the tin, oh, it’s a service for just running a whole bunch of games at scale, and I’m not generally doing that. My favorite computer game remains to be Twitter at this point, but that’s okay. What is GameLift, though, because you want to shining a different light on it, which makes me annoyed that Amazon Marketing has not pointed this out. Evelyn: Yeah, so I’ll preface this by saying, like, I’m not an expert on GameLift. I haven’t even spun it up myself because there’s quite a bit of price. I learned this fall while chatting with an SA who works in the gaming space, and it kind of like, I went, like, “Back up a second.” If you think about, like, I’m, you know, like, , all you have are thousands of game clients all over the world, playing the same game, you know, on the same server, in the same instance, and you need to make sure, you know, that when I’m running, and you’re running, that we know that we’re going to reach the same point the same time, or if there’s one object in that room, that only one of us can get it. So, all these servers are doing is tracking state across thousands of clients. And GameLift, when you think about your dedicated game service, it really is just multi-region distributed state management. Like, at the basic, that’s really what it is. Now, there’s, you know, quite a bit more happening within GameLift, but that’s what I was going to explain is, like, it’s just state management. And there are far more use cases for it than just for video games. Corey: That’s maddening to me because having a global session state store, for lack of a better term, is something that so many customers have built themselves repeatedly. They can build it on top of primitives like DynamoDB global tables, or alternately, you have a dedicated region where that thing has to live and everything far away takes forever to round-trip. If they’ve solved some of those things, why on earth would they bury it under a gaming-branded service? Like, offer that primitive to the rest of us because that’s useful. Evelyn: No, absolutely. And honestly, I wouldn’t be surprised if you peeled back the curtain with GameLift, you’ll find a lot of—like, several other you know, AWS services that it’s just built on top of. I kind of mentioned earlier is, like, what I see now with innovation, it’s like we just see other services packaged together and releases a new product. Corey: Yeah, IoT had the same problem going on for years where there was a lot of really good stuff buried in there, like IOT events. People were talking about using that for things like browser extensions and whatnot, but you need to be explicitly told that that’s a thing that exists and is handy, but otherwise you’d never know it was there because, “Well, I’m not building anything that’s IoT-related. Why would I bother?” It feels like that was one direction that they tended to go in. And now they take existing services that are, mmm, kind of milquetoast, if I’m being honest, and then saying, “Oh, like, we have Comprehend that does, effectively detection of themes, keywords, and whatnot, from text. We’re going to wind up re-releasing that as Comprehend Medical.” Same type of thing, but now focused on a particular vertical. Seems to me that instead of being a specific service for that vertical, just improve the baseline the service and offer HIPAA compliance if it didn’t exist already, and you’re mostly there. But what do I know? I’m not a product manager trying to get promoted. Evelyn: Yeah, that’s true. Well, I was going to mention that maybe it’s the HIPAA compliance, but actually, a lot of their services already have HIPAA compliance. And I’ve stared far too long at that compliance section on AWS’s site to know this, but you know, a lot of them actually are HIPAA-compliant, they’re PCI-compliant, and ISO-compliant, and you know, and everything. So, I’m actually pretty intrigued to know why they [wouldn’t 00:16:04] take that advantage. Corey: I just checked. Amazon Comprehend is itself HIPAA-compliant and is qualified and certified to hold Personal Health Information—PHI—Private Health Information, whatever the acronym stands for. Now, what’s the difference, then, between that and Medical? In fact, the HIPAA section says for Comprehend Medical, “For guidance, see the previous section on Amazon Comprehend.” So, there’s no difference from a regulatory point of view. Evelyn: That’s fascinating. I am intrigued because I do know that, like, within AWS, you know, they have different segments, you know? There’s, like, Digital Native Business, there’s Enterprise, there’s Startup. So, I am curious how things look over the engineering side. I’m going to talk to somebody about this now [laugh]. Corey: Yeah, it’s the—like, I almost wonder, on some level, it feels like, “Well, we wound to building this thing in the hopes that someone would use it for something. And well, if we just use different words, it checks a box in some analyst’s chart somewhere.” I don’t know. I mean, I hate to sound that negative about it, but it’s… increasingly when I talk to customers who are active in these spaces around the industry vertical targeted stuff aimed at their industry, they’re like, “Yeah, we took a look at it. It was adorable, but we’re not using it that way. We’re going to use either the baseline version or we’re going to work with someone who actively gets our industry.” And I’ve heard that repeated about three or four different releases that they’ve put out across the board of what they’ve been doing. It feels like it is a misunderstanding between what the world needs and what they’re able to or willing to build for us. Evelyn: Not sure. I wouldn’t be surprised, if we go far enough, it could probably be that it’s just a product manager saying, like, “We have to advertise directly to the industry.” And if you look at it, you know, in the backend, you know, it’s an engineer, you know, kicking off a build and just changing the name from Comprehend to Comprehend Medical. Corey: And, on some level, too, they’re moving a lot more slowly than they used to. There was a time where they were, in many cases, if not the first mover, the first one to do it well. Take Code Whisperer, their AI powered coding assistant. That would have been a transformative thing if GitHub Copilot hadn’t beaten them every punch, come out with new features, and frankly, in head-to-head experiments that I’ve run, came out way better as a product than what Code Whisperer is. And while I’d like to say that this is great, but it’s too little too late. And when I talk to engineers, they’re very excited about what Copilot can do, and the only people I see who are even talking about Code Whisperer work at AWS. Evelyn: No, that’s true. And so, I think what’s happening—and this is my opinion—is that first you had AWS, like, launching a really innovative new services, you know, that kind of like, it’s like, “Ah, it’s a whole new way of running your workloads in the cloud.” Instead of you know, basically, hiring a whole team, I just click a button, you have your instance, you use it, sell software, blah, blah, blah, blah. And then they went towards serverless, and then IoT, and then it started targeting large data lakes, and then eventually that kind of run backwards towards security, after the umpteenth S3 data leak. Corey: Oh, yeah. And especially now, like, so they had a hit in some corners with SageMaker, so now there are 40 services all starting with the word SageMaker. That’s always pleasant. Evelyn: Yeah, precisely. And what I kind of notice is… now they’re actually having to run it even further back because they caught all the corporations that could pivot to the cloud, they caught all the startups who started in the cloud, and now they’re going for the larger behemoths who have massive data centers, and they don’t want to innovate. They just want to reduce this massive sysadmin team. And I always like to use the example of a Bare Metal. When that came out in 2019, everybody—we’ve all kind of scratched your head. I’m like, really [laugh]? Corey: Yeah, I could see where it makes some sense just for very specific workloads that involve things like specific capabilities of processors that don’t work under emulation in some weird way, but it’s also such a weird niche that I’m sure it’s there for someone. My default assumption, just given the breadth of AWS’s customer base, is that whenever I see something that they just announced, well, okay, it’s clearly not for me; that doesn’t mean it’s not meeting the needs of someone who looks nothing like me. But increasingly as I start exploring the industry in these services have time to percolate in the popular imagination and I still don’t see anything interesting coming out with it, it really makes you start to wonder. Evelyn: Yeah. But then, like, I think, like, roughly a year or something, right after Bare Metal came out, they announced Outposts. So, then it was like, another way to just stay within your data center and be in the cloud. Corey: Yeah. There’s a bunch of different ways they have that, okay, here’s ways you can run AWS services on-prem, but still pay us by the hour for the privilege of running things that you have living in your facility. And that doesn’t seem like it’s quite fair. Evelyn: That’s exactly it. So, I feel like now it’s sort of in diminishing returns and sort of doing more cloud-native work compared to, you know, these huge opportunities, which is everybody who still has a data center for various reasons, or they’re cloud-native, and they grow so big, that they actually start running their own data centers. Corey: I want to call out as well before we wind up being accused of being oblivious, that we’re recording this before re:Invent. So, it’s entirely possible—I hope this happens—that they announce something or several some things that make this look ridiculous, and we’re embarrassed to have had this conversation. And yeah, they’re totally getting it now, and they have completely surprised us with stuff that’s going to be transformative for almost every customer. I’ve been expecting and hoping for that for the last three or four re:Invents now, and I haven’t gotten it. Evelyn: Yeah, that’s right. And I think there’s even a new service launches that actually are missing fairly obvious things in a way. Like, mine is the Managed Workflow for Amazon—it’s Managed Airflow, sorry. So, we were using Data Pipeline for, you know, big ETL processing, so it was an in-house tool we kind of built at Autoscout, we do platform engineering. And it was deprecated, so we looked at a new—what to replace it with. And so, we looked at Airflow, and we decided this is the way to go, we want to use managed because we don’t want to maintain our own infrastructure. And the problem we ran into is that it doesn’t have support for shared VPCs. And we actually talked to our account team, and they were confused. Because they said, like, “Well, every new service should support it natively.” But it just didn’t have it. And that’s, kind of, what, I kind of found is, like, there’s—it feels—sometimes it’s—there’s a—it’s getting rushed out the door, and it’ll actually have a new managed service or new service launched out, but they’re also sort of cutting some corners just to actually make sure it’s packaged up and ready to go. Corey: When I’m looking at this, and seeing how this stuff gets packaged, and how it’s built out, I start to understand a pattern that I’ve been relatively down on across the board. I’m curious to get your take because you work at a fairly sizable company as an engineering manager, running teams of people who do this sort of thing. Where do you land on the idea of companies building internal platforms to wrap around the offerings that the cloud service providers that they use make available to them? Evelyn: So, my opinion is that you need to build out some form of standardized tool set in order to actually be able to innovate quickly. Now, this sounds counterintuitive because everyone is like, “Oh, you know, if I want to innovate, I should be able to do this experiment, and try out everything, and use what works, and just release it.” And that greatness [unintelligible 00:23:14] mentality, you know, it’s like five talented engineers working to build something. But when you have, instead of five engineers, you have five teams of five engineers each, and every single team does something totally different. You know, one uses Scala, and other on TypeScript, another one, you know .NET, and then there could have been a [last 00:23:30] one, you know, comes in, you know, saying they’re still using Ruby. And then next thing you know, you know, you have, like, incredibly diverse platforms for services. And if you want to do any sort of like hiring or cross-training, it becomes incredibly difficult. And actually, as the organization grows, you want to hire talent, and so you’re going to have to hire, you know, a developer for this team, you going to have to hire, you know, Ruby developer for this one, a Scala guy here, a Node.js guy over there. And so, this is where we say, “Okay, let’s agree. We’re going to be a Scala shop. Great. All right, are we running serverless? Are we running containerized?” And you agree on those things. So, that’s already, like, the formation of it. And oftentimes, you start with DevOps. You’ll say, like, “I’m a DevOps team,” you know, or doing a DevOps culture, if you do it properly, but you always hit this scaling issue where you start growing, and then how do you maintain that common tool set? And that’s where we start looking at, you know, having a platform… approach, but I’m going to say it's Platform-as-a-Product. That’s the key. Corey: Yeah, that’s a good way of framing it because originally, the entire world needed that. That’s what RightScale was when EC2 first came out. It was a reimagining of the EC2 console that was actually usable. And in time, AWS improved that to the point where RightScale didn’t really have a place anymore in a way that it had previously, and that became a business challenge for them. But you have, what is it now, 2, 300 services that AWS has put out, and out, and okay, great. Most companies are really only actively working with a handful of those. How do you make those available in a reasonable way to your teams, in ways that aren’t distracting, dangerous, et cetera? I don’t know the answer on that one. Evelyn: Yeah. No, that’s true. So, full disclosure. At AutoScout, we do platform engineering. So, I’m part of, like, the platform engineering group, and we built a platform for our product teams. It’s kind of like, you need to decide to [follow 00:25:24] those answers, you know? Like, are we going to be fully containerized? Okay, then, great, we’re going to use Fargate. All right, how do we do it so that developers don’t actually—don’t need to think that they’re running Fargate workloads? And that’s, like, you know, where it’s really important to have those standardized abstractions that developers actually enjoy using. And I’d even say that, before you start saying, “Ah, we’re going to do platform,” you say, “We should probably think about developer experience.” Because you can do a developer experience without a platform. You can do that, you know, in a DevOps approach, you know? It’s basically build tools that makes it easy for developers to write code. That’s the first step for anything. It’s just, like, you have people writing the code; make sure that they can do the things easily, and then look at how to operate it. Corey: That sure would be nice. There’s a lack of focus on usability, especially when it comes to a number of developer tools that we see out there in the wild, in that, they’re clearly built by people who understand the problem space super well, but they’re designing these things to be used by people who just want to make the website work. They don’t have the insight, the knowledge, the approach, any of it, nor should they necessarily be expected to. Evelyn: No, that’s true. And what I see is, a lot of the times, it’s a couple really talented engineers who are just getting shit done, and they get shit done however they can. So, it’s basically like, if they’re just trying to run the website, they’re just going to write the code to get things out there and call it a day. And then somebody else comes along, has a heart attack when see what’s been done, and they’re kind of stuck with it because there is no guardrails or paved path or however you want to call it. Corey: I really hope—truly—that this is going to be something that we look back and laugh when this episode airs, that, “Oh, yeah, we just got it so wrong. Look at all the amazing stuff that came out of re:Invent.” Are you going to be there this year? Evelyn: I am going to be there this year. Corey: My condolences. I keep hoping people get to escape. Evelyn: This is actually my first one in, I think, five years. So, I mean, the last time I was there was when everybody’s going crazy over pins. And I still have a bag of them [laugh]. Corey: Yeah, that did seem like a hot-second collectable moment, didn’t it? Evelyn: Yeah. And then at the—I think, what, the very last day, as everybody’s heading to re:Play, you could just go into the registration area, and they just had, like, bags of them lying around to take. So, all the competing, you know, to get the requirements for a pin was kind of moot [laugh]. Corey: Don’t you hate it at some point where it’s like, you feel like I’m going to finally get this crowning achievement, it’s like or just show up at the buffet at the end and grab one of everything, and wow, that would have saved me a lot of pain and trouble. Evelyn: Yeah. Corey: Ugh, scavenger hunts are hard, as I’m about to learn to my own detriment. Evelyn: Yeah. No, true. Yeah. But I am really hoping that re:Invent proves me wrong. Embarrassingly wrong, and then all my colleagues can proceed to mock me for this ridiculous podcast that I made with you. But I am a fierce skeptic. Optimistic nihilist, but still a nihilist, so we’ll see how re:Invent turns out. Corey: So, I am curious, given your experience at more large companies than I tend to be embedded with for any period of time, how have you found that these large organizations tend to pick up new technologies? What does the adoption process look like? And honestly, if you feel like throwing some shade, how do they tend to get it wrong? Evelyn: In most cases, I’ve seen it go… terrible. Like, it just blows up in their face. And I say that is because a lot of the time, an organization will say, “Hey, we’re going to adopt this new way of organizing teams or developing products,” and they look at all the practices. They say, “Okay, great. Product management is going to bring it in, they’re going to structure things, how we do the planning, here’s some great charts and diagrams,” but they don’t really look at the culture aspect. And that’s always where I’ve seen things fall apart. I’ve been in a room where, you know, our VP was really excited about team topologies and say, “Hey, we’re going to adopt it.” And then an engineering manager proceeded to say, “Okay, you’re responsible for this team, you’re responsible for that team, you’re responsible for this team talking to, like, a team of, like, five engineers,” which doesn’t really work at all. Or, like, I think the best example is DevOps, you know, where you say, “Ah, we’re going to adopt DevOps, we’re going to have a DevOps team, or have a DevOps engineer.” Corey: Step one: we’re going to rebadge everyone with existing job titles to have the new fancy job titles that reflect it. It turns out that’s not necessarily sufficient in and of itself. Evelyn: Not really. The Spotify model. People say, like, “Oh, we’re going to do the Spotify model. We’re going to do skills, tribes, you know, and everything. It’s going to be awesome, it’s going to be great, you know, and nice, cross-functional.” The reason I say it bails on us every single time is because somebody wants to be in control of the process, and if the process is meant to encourage collaboration and innovation, that person actually becomes a chokehold for it. And it could be somebody that says, like, “Ah, I need to be involved in every single team, and listen to know what’s happening, just so I’m aware of it.” What ends up happening is that everybody differs to them. So, there is no collaboration, there is no innovation. DevOps, you say, like, “Hey, we’re going to have a team to do everything, so your developers don’t need to worry about it.” What ends up happening is you're still an ops team, you still have your silos. And that’s always a challenge is you actually have to say, “Okay, what are the cultural values around this process?” You know, what is SRE? What is DevOps, you know? Is it seen as processes, is it a series of principles, platform, maybe, you know? We have to say, like—that’s why I say, Platform-as-a-Product because you need to have that product mindset, that culture of product thinking, to really build a platform that works because it’s all about the user journey. It’s not about building a common set of tools. It’s the user journey of how a person interacts with their code to get it into a production environment. And so, you need to understand how that person sits down at their desk, starts the laptop up, logs in, opens the IDE, what they’re actually trying to get done. And once you understand that, then you know your requirements, and you build something to fill those things so that they are happy to use it, as opposed to saying, “This is our platform, and you’re going to use it.” And they’re probably going to say, “No.” And the next thing, you know, they’re just doing their own thing on the side. Corey: Yeah, the rise of Shadow IT has never gone away. It’s just, on some level, it’s the natural expression, I think it’s an immune reaction that companies tend to have when process gets in the way. Great, we have an outcome that we need to drive towards; we don’t have a choice. Cloud empowered a lot of that and also has given tools to help rein it in, and as with everything, the arms race continues. Evelyn: Yeah. And so, what I’m going to continue now, kind of like, toot the platform horn. So, Gregor Hohpe, he’s a [solutions architect 00:31:56]—I always f- up his name. I’m so sorry, Gregor. He has a great book, and even a talk, called , that if somebody is actually curious about understanding of why platforms are nice, they should really watch that talk. If you see him at re:Invent, or a summit or somewhere giving a talk, go listen to that, and just pick his brain. Because that’s—for me, I really kind of strongly agree with his approach because that’s really how, like, you know, as he says, like, boost innovation is, you know, where you’re actually building a platform that really works. Corey: Yeah, it’s a hard problem, but it’s also one of those things where you’re trying to focus on—at least ideally—an outcome or a better situation than you currently find yourselves in. It’s hard to turn down things that might very well get you there sooner, faster, but it’s like trying to effectively cargo-cult the leadership principles from your last employer into your new one. It just doesn’t work. I mean, you see more startups from Amazonians who try that, and it just goes horribly because without the cultural understanding and the supporting structures, it doesn’t work. Evelyn: Exactly. So, I’ve worked with, like, organizations, like, 4000-plus people, I’ve worked for, like, small startups, consulted, and this is why I say, almost every single transformation, it fails the first time because somebody needs to be in control and track things and basically be really, really certain that people are doing it right. And as soon as it blows up in their face, that’s when they realize they should actually take a step back. And so, even for building out a platform, you know, doing Platform-as-a-Product, I always reiterate that you have to really be willing to just invest upfront, and not get very much back. Because you have to figure out the whole user journey, and what you’re actually building, before you actually build it. Corey: I really want to thank you for taking the time to speak with me today. If people want to learn more, where’s the best place for them to find you? Evelyn: So, I used to be on Twitter, but I’ve actually got off there after it kind of turned a bit toxic and crazy. Corey: Feels like that was years ago, but that’s beside the point. Evelyn: Yeah, precisely. So, I would even just say because this feels like a corporate show, but find me on LinkedIn https://www.linkedin.com/in/evelyn-osman/ of all places because I will be sharing whatever I find on there, you know? So, just look me up on my name, Evelyn Osman, and give me a follow, and I’ll probably be screaming into the cloud like you are. Corey: And we will, of course, put links to that in the show notes. Thank you so much for taking the time to speak with me. I appreciate it. Evelyn: Thank you, Corey. Corey: Evelyn Osman, engineering manager at AutoScout24. I’m Cloud Economist Corey Quinn, and this is . If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice, and I will read it once I finish building an internal platform to normalize all of those platforms together into one. Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com https://duckbillgroup.com to get started.

35m
Jan 09, 2024
Benchmarking Security Attack Response Times in the Age of Automation with Anna Belak

Anna Belak, Director of the Office of Cybersecurity Strategy at Sysdig, joins Corey on Screaming in the Cloud to discuss the newest benchmark for responding to security threats, 5/5/5. Anna describes why it was necessary to set a new benchmark for responding to security threats in a timely manner, and how the Sysdig team did research to determine the best practices for detecting, correlating, and responding to potential attacks. Corey and Anna discuss the importance of focusing on improving your own benchmarks towards a goal, as well as how prevention and threat detection are both essential parts of a solid security program.  ABOUT ANNA Anna has nearly ten years of experience researching and advising organizations on cloud adoption with a focus on security best practices. As a Gartner Analyst, Anna spent six years helping more than 500 enterprises with vulnerability management, security monitoring, and DevSecOps initiatives. Anna's research and talks have been used to transform organizations' IT strategies and her research agenda helped to shape markets. Anna is the Director of Thought Leadership at Sysdig, using her deep understanding of the security industry to help IT professionals succeed in their cloud-native journey.  Anna holds a PhD in Materials Engineering from the University of Michigan, where she developed computational methods to study solar cells and rechargeable batteries. LINKS REFERENCED: __ Sysdig: https://sysdig.com/ Sysdig 5/5/5 Benchmark: https://sysdig.com/555 __ Transcript Announcer: Hello, and welcome to with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is . Corey: Welcome to . I’m Corey Quinn. I am joined again—for another time this year—on this promoted guest episode brought to us by our friends at Sysdig https://sysdig.com/, returning is Anna Belak, who is their director of the Office of Cybersecurity Strategy at Sysdig. Anna, welcome back. It’s been a hot second. Anna: Thank you, Corey. It’s always fun to join you here. Corey: Last time we were here, we were talking about your report that you folks had come out with, the, “Cybersecurity Threat Landscape for 2022.” And when I saw you were doing another one of these to talk about something, I was briefly terrified. “Oh, wow, please tell me we haven’t gone another year and the cybersecurity threat landscape is moving that quickly.” And it sort of is, sort of isn’t. You’re here today to talk about something different, but it also—to my understanding—distills down to just how quickly that landscape is moving. What have you got for us today? Anna: Exactly. For those of you who remember that episode, one of the key findings in the Threat Report for 2023 was that the average length of an attack in the cloud is ten minutes. To be clear, that is from when you are found by an adversary to when they have caused damage to your system. And that is really fast. Like, we talked about how that relates to on-prem attacks or other sort of averages from other organizations reporting how long it takes to attack people. And so, we went from weeks or days to minutes, potentially seconds. And so, what we’ve done is we looked at all that data, and then we went and talked to our amazing customers and our many friends at analyst firms and so on, to kind of get a sense for if this is real, like, if everyone is seeing this or if we’re just seeing this. Because I’m always like, “Oh, God. Like, is this real? Is it just me?” And as it turns out, everyone’s not only—I mean, not necessarily everyone’s seeing it, right? Like, there’s not really been proof until this year, I would say because there’s a few reports that came out this year, but lots of people sort of anticipated this. And so, when we went to our customers, and we asked for their SLAs, for example, they were like, “Oh, yeah, my SLA for a [PCRE 00:02:27] cloud is like 10, 15 minutes.” And I was like, “Oh, okay.” So, what we set out to do is actually set a benchmark, essentially, to see how well are you doing. Like, are you equipped with your cloud security program to respond to the kind of attack that a cloud security attacker is going to—sorry, an anti-cloud security—I guess—attacker is going to perpetrate against you. And so, the benchmark is—drumroll—5/5/5. You have five seconds to detect a signal that is relevant to potentially some attack in the cloud—hopefully, more than one such signal—you have five minutes to correlate all such relevant signals to each other so that you have a high fidelity detection of this activity, and then you have five more minutes to initiate an incident response process to hopefully shut this down, or at least interrupt the kill chain before your environments experience any substantial damage. Corey: To be clear, that is from a T0, a starting point, the stopwatch begins, the clock starts when the event happens, not when an event shows up in your logs, not once someone declares an incident. From J. Random Hackerman, effectively, we’re pressing the button and getting the response from your API. Anna: That’s right because the attackers don’t really care how long it takes you to ship logs to wherever you’re mailing them to. And that’s why it is such a short timeframe because we’re talking about, they got in, you saw something hopefully—and it may take time, right? Like, some of the—which we’ll describe a little later, some of the activities that they perform in the early stages of the attack are not necessarily detectable as malicious right away, which is why your correlation has to occur, kind of, in real time. Like, things happen, and you’re immediately adding them, sort of like, to increase the risk of this detection, right, to say, “Hey, this is actually something,” as opposed to, you know, three weeks later, I’m parsing some logs and being like, “Oh, wow. Well, that’s not good.” [laugh]. Corey: The number five seemed familiar to me in this context, so I did a quick check, and sure enough, allow me to quote from chapter and verse from the CloudTrail documentation over an AWS-land. “CloudTrail typically delivers logs within an average of about five minutes of an API call. This time is not guaranteed.” So effectively, if you’re waiting for anything that’s CloudTrail-driven to tell you that you have a problem, it is almost certainly too late by the time that pops up, no matter what that notification vector is. Anna: That is, unfortunately or fortunately, true. I mean, it’s kind of a fact of life. I guess there is a little bit of a veiled [unintelligible 00:04:43] at our cloud provider friends because, really, they have to do better ultimately. But the flip side to that argument is CloudTrail—or your cloud log source of choice—cannot be your only source of data for detecting security events, right? So, if you are operating purely on the basis of, “Hey, I have information in CloudTrail; that is my security information,” you are going to have a bad time, not just because it’s not fast enough, but also because there’s not enough data in there, right? Which is why part of the first, kind of, benchmark component is that you must have multiple data sources for the signals, and they—ideally—all will be delivered to you within five seconds of an event occurring or a signal being generated. Corey: And give me some more information on that because I have my own alerter, specifically, it’s a ClickOps detector. Whenever someone in one of my accounts does something in the console, that has a write aspect to it rather than just a read component—which again, look at what you want in the console, that’s fine—if you’re changing things that is not being managed by code, I want to know that it’s happening. It’s not necessarily bad, but I want to at least have visibility into it. And that spits out the principal, the IP address it emits from, and the rest. I haven’t had a whole lot where I need to correlate those between different areas. Talk to me more about the triage step. Anna: Yeah, so I believe that the correlation step is the hardest, actually. Corey: Correlation step. My apologies. Anna: Triage is fine. It’s [crosstalk 00:06:06]— Corey: Triage, correlations, the words we use matter on these things. Anna: Dude, we argued about the words on this for so long, you could even imagine. Yeah, triage, correlation, detection, you name it, we are looking at multiple pieces of data, we’re going to connect them to each other meaningfully, and that is going to provide us with some insight about the fact that a bad thing is happening, and we should respond to it. Perhaps automatically respond to it, but we’ll get to that. So, a correlation, okay. The first thing is, like I said, you must have more than one data source because otherwise, I mean, you could correlate information from one data source; you actually should do that, but you are going to get richer information if you can correlate multiple data sources, and if you can access, for example, like through an API, some sort of enrichment for that information. Like, I’ll give you an example. For SCARLETEEL, which is an attack we describe in the thread report, and we actually described before, this is—we’re, like—on SCARLETEEL, I think, version three now because there’s so much—this particular certain actor is very active [laugh]. Corey: And they have a better versioning scheme than most companies I’ve spoken to, but that’s neither here nor there. Anna: [laugh]. Right? So, one of the interesting things about SCARLETEEL is you could eventually detect that it had happened if you only had access to CloudTrail, but you wouldn’t have the full picture ever. In our case, because we are a company that relies heavily on system calls and machine learning detections, we [are able to 00:07:19] connect the system call events to the CloudTrail events, and between those two data sources, we’re able to figure out that there’s something more profound going on than just what you see in the logs. And I’ll actually tell you, which, for example, things are being detected. So, in SCARLETEEL, one thing that happens is there’s a crypto miner. And a crypto miner is one of these events where you’re, like, “Oh, this is obviously malicious,” because as we wrote, I think, two years ago, it costs $53 to mine $1 of Bitcoin in AWS, so it is very stupid for you to be mining Bitcoin in AWS, unless somebody else is— Corey: In your own accounts. Anna: —paying the cloud bill. Yeah, yeah [laugh] in someone else’s account, absolutely. Yeah. So, if you are a sysadmin or a security engineer, and you find a crypto miner, you’re like, “Obviously, just shut that down.” Great. What often happens is people see them, and they think, “Oh, this is a commodity attack,” like, people are just throwing crypto miners whatever, I shut it down, and I’m done. But in the case of this attack, it was actually a red herring. So, they deployed the miner to see if they could. They could, then they determined—presumably; this is me speculating—that, oh, these people don’t have very good security because they let random idiots run crypto miners in their account in AWS, so they probed further. And when they probed further, what they did was some reconnaissance. So, they type in commands, listing, you know, like, list accounts or whatever. They try to list all the things they can list that are available in this account, and then they reach out to an EC2 metadata service to kind of like, see what they can do, right? And so, each of these events, like, each of the things that they do, like, reaching out to a EC2 metadata service, assuming a role, doing a recon, even lateral movement is, like, by itself, not necessarily a scary, big red flag malicious thing because there are lots of, sort of, legitimate reasons for someone to perform those actions, right? Like, reconnaissance, for one example, is you’re, like, looking around the environment to see what’s up, right? So, you’re doing things, like, listing things, [unintelligible 00:09:03] things, whatever. But a lot of the graphical interfaces of security tools also perform those actions to show you what’s, you know, there, so it looks like reconnaissance when your tool is just, like, listing all the stuff that’s available to you to show it to you in the interface, right? So anyway, the point is, when you see them independently, these events are not scary. They’re like, “Oh, this is useful information.” When you see them in rapid succession, right, or when you see them alongside a crypto miner, then your tooling and/or your process and/or your human being who’s looking at this should be like, “Oh, wait a minute. Like, just the enumeration of things is not a big deal. The enumeration of things after I saw a miner, and you try and talk to the metadata service, suddenly I’m concerned.” And so, the point is, how can you connect those dots as quickly as possible and as automatically as possible, so a human being doesn’t have to look at, like, every single event because there’s an infinite number of them. Corey: I guess the challenge I’ve got is that in some cases, you’re never going to be able to catch up with this. Because if it’s an AWS call to one of the APIs that they manage for you, they explicitly state there’s no guarantee of getting information on this until the show’s all over, more or less. So, how is there… like, how is there hope? Anna: [laugh]. I mean, there’s always a forensic analysis, I guess [laugh] for all the things that you’ve failed to respond to. Corey: Basically we’re doing an after-action thing because humans aren’t going to react that fast. We’re just assuming it happened; we should know about it as soon as possible. On some level, just because something is too late doesn’t necessarily mean there’s not value added to it. But just trying to turn this into something other than a, “Yeah, they can move faster than you, and you will always lose. The end. Have a nice night.” Like, that tends not to be the best narrative vehicle for these things. You know, if you’re trying to inspire people to change. Anna: Yeah, yeah, yeah, I mean, I think one clear point of hope here is that sometimes you can be fast enough, right? And a lot of this—I mean, first of all, you’re probably not going to—sorry, cloud providers—you don’t go into just the cloud provider defaults for that level of performance, you are going with some sort of third-party tool. On the, I guess, bright side, that tool can be open-source, like, there’s a lot of open-source tooling available now that is fast and free. For example, is our favorite, of course, Falco, which is looking at system calls on endpoints, and containers, and can detect things within seconds of them occurring and let you know immediately. There is other EBPF-based instrumentation that you can use out there from various vendors and/or open-source providers, and there’s of course, network telemetry. So, if you’re into the world of service mesh, there is data you can get off the network, also very fast. So, the bad news or the flip side to that is you have to be able to manage all that information, right? So, that means—again, like I said, you’re not expecting a SOC analyst to look at thousands of system calls and thousands of, you know, network packets or flow logs or whatever you’re looking at, and just magically know that these things go together. You are expecting to build, or have built for you by a vendor or the open-source community, some sort of dissection content that is taking this into account and then is able to deliver that alert at the speed of 5/5/5. Corey: When you see the larger picture stories playing out, as far as what customers are seeing, what the actual impact is, what gave rise to the five-minute number around this? Just because that tends to feel like it’s a… it is both too long and also too short on some level. I’m just wondering how you wound up at—what is this based on? Anna: Man, we went through so many numbers. So, we [laugh] started with larger numbers, and then we went to smaller numbers, then we went back to medium numbers. We align ourselves with the timeframes we’re seeing for people. Like I said, a lot of folks have an SLA of responding to a P0 within 10 or 15 minutes because their point basically—and there’s a little bit of bias here into our customer base because our customer base is, A, fairly advanced in terms of cloud adoption and in terms of security maturity, and also, they’re heavily in let’s say, financial industries and other industries that tend to be early adopters of new technology. So, if you are kind of a laggard, like, you probably aren’t that close to meeting this benchmark as you are if you’re saying financial, right? So, we asked them how they operate, and they basically pointed out to us that, like, knowing 15 minutes later is too late because I’ve already lost, like, some number of millions of dollars if my environment is compromised for 15 minutes, right? So, that’s kind of where the ten minutes comes from. Like, we took our real threat research data, and then we went around and talked to folks to see kind of what they’re experiencing and what their own expectations are for their incident response in SOC teams, and ten minutes is sort of where we landed. Corey: Got it. When you see this happening, I guess, in various customer environments, assuming someone has missed that five-minute window, is a game over effectively? How should people be thinking about this? Anna: No. So, I mean, it’s never really game over, right? Like until your company is ransomed to bits, and you have to close your business, you still have many things that you can do, hopefully, to save yourself. And also, I want to be very clear that 5/5/5 as a benchmark is meant to be something aspirational, right? So, you should be able to meet this benchmark for, let’s say, your top use cases if you are a fairly high maturity organization, in threat detection specifically, right? So, if you’re just beginning your threat detection journey, like, tomorrow, you’re not going to be close. Like, you’re going to be not at all close. The point here, though, is that you should aspire to this level of greatness, and you’re going to have to create new processes and adopt new tools to get there. Now, before you get there, I would argue that if you can do, like, 10-10-10 or, like, whatever number you start with, you’re on a mission to make that number smaller, right? So, if today, you can detect a crypto miner in 30 minutes, that’s not great because crypto miners are pretty detectable these days, but give yourself a goal of, like, getting that 30 minutes down to 20, or getting that 30 minutes down to 10, right? Because we are so obsessed with, like, measuring ourselves against our peers and all this other stuff that we sometimes lose track of what actually is improving our security program. So yes, compare it to yourself first. But ultimately, if you can meet the 5/5/5 benchmark, then you are doing great. Like, you are faster than the attackers in theory, so that’s the dream. Corey: So, I have to ask, and I suspect I might know the answer to this, but given that it seems very hard to move this quickly, especially at scale, is there an argument to be made that effectively prevention obviates the need for any of this, where if you don’t misconfigure things in ways that should be obvious, if you practice defense-in-depth to a point where you can effectively catch things that the first layer meets with successive layers, as opposed to, “Well, we have a firewall. Once we’re inside of there, well [laugh], it’s game over for us.” Is prevention sufficient in some ways to obviate this? Anna: I think there are a lot of people that would love to believe that that’s true. Corey: Oh, I sure would. It’s such a comforting story. Anna: And we’ve done, like, I think one of my opening sentences in the benchmark, kind of, description, actually, is that we’ve done a pretty good job of advertising prevention in Cloud as an important thing and getting people to actually, like, start configuring things more carefully, or like, checking how those things have been configured, and then changing that configuration should they discover that it is not compliant with some mundane standard that everyone should know, right? So, we’ve made great progress, I think, in cloud prevention, but as usual, like, prevention fails, right? Like I still have smoke detectors in my house, even though I have done everything possible to prevent it from catching fire and I don’t plan to set it on fire, right? But like, threat detection is one of these things that you’re always going to need because no matter what you do, A, you will make a mistake because you’re a human being, and there are too many things, and you’ll make a mistake, and B, the bad guys are literally in the business of figuring ways around your prevention and your protective systems. So, I am full on on defense-in-depth. I think it’s a beautiful thing. We should only obviously do that. And I do think that prevention is your first step to a holistic security program—otherwise, what even is the point—but threat detection is always going to be necessary. And like I said, even if you can’t go 5/5/5, you don’t have threat detection at that speed, you need to at least be able to know what happened later so you can update your prevention system. Corey: This might be a dangerous question to get into, but why not, that’s what I do here. This [could 00:17:27] potentially an argument against Cloud, by which I mean that if I compromise someone’s Cloud account on any of the major cloud providers, once I have access of some level, I know where everything else in the environment is as a general rule. I know that you’re using S3 or its equivalent, and what those APIs look like and the rest, whereas as an attacker, if I am breaking into someone’s crappy data center-hosted environment, everything is going to be different. Maybe they don’t have a SAN at all, for example. Maybe they have one that hasn’t been patched in five years. Maybe they’re just doing local disk for some reason. There’s a lot of discovery that has to happen that is almost always removed from Cloud. I mean, take the open S3 bucket problem that we’ve seen as a scourge for 5, 6, 7 years now, where it’s not that S3 itself is insecure, but once you make a configuration mistake, you are now in line with a whole bunch of other folks who may have much more valuable data living in that environment. Where do you land on that one? Anna: This is the ‘leave cloud to rely on security through obscurity’ argument? Corey: Exactly. Which I’m not a fan of, but it’s also hard to argue against from time-to-time. Anna: My other way of phrasing it is ‘the attackers are ripping up the stack’ argument. Yeah, so—and there is some sort of truth in that, right? Part of the reason that attackers can move that fast—and I think we say this a lot when we talk about the threat report data, too, because we literally see them execute this behavior, right—is they know what the cloud looks like, right? They have access to all the API documentation, they kind of know what all the constructs are that you’re all using, and so they literally can practice their attack and create all these scripts ahead of time to perform their reconnaissance because they know exactly what they’re looking at, right? On-premise, you’re right, like, they’re going to get into—even to get through my firewall, whatever, they’re getting into my data center, they don’t do not know what disaster I have configured, what kinds of servers I have where, and, like, what the network looks like, they have no idea, right? In Cloud, this is kind of all gifted to them because it’s so standard, which is a blessing and a curse. It’s a blessing because—well for them, I mean, because they can just programmatically go through this stuff, right? It’s a curse for them because it’s a blessing for us in the same way, right? Like, the defenders… A, have a much easier time knowing what they even have available to them, right? Like, the days of there’s a server in a closet I’ve never heard of are kind of gone, right? Like, you know what’s in your Cloud account because, frankly, AWS tells you. So, I think there is a trade-off there. The other thing is—about the moving up the stack thing, right—like no matter what you do, they will come after you if you have something worth exploiting you for, right? So, by moving up the stack, I mean, listen, we have abstracted all the physical servers, all of the, like, stuff we used to have to manage the security of because the cloud just does that for us, right? Now, we can argue about whether or not they do a good job, but I’m going to be generous to them and say they do a better job than most companies [laugh] did before. So, in that regard, like, we say, thank you, and we move on to, like, fighting this battle at a higher level in the stack, which is now the workloads and the cloud control plane, and the you name it, whatever is going on after that. So, I don’t actually think you can sort of trade apples for oranges here. It’s just… bad in a different way. Corey: Do you think that this benchmark is going to be used by various companies who will learn about it? And if so, how do you see that playing out? Anna: I hope so. My hope when we created it was that it would sort of serve as a goalpost or a way to measure— Corey: Yeah, it would just be marketing words on a page and never mentioned anywhere, that’s our dream here. Anna: Yeah, right. Yeah, I was bored. So, I wrote some—[laugh]. Corey: I had a word minimum to get out the door, so there we are. It’s how we work. Anna: Right. As you know, I used to be a Gartner analyst, and my desire is always to, like, create things that are useful for people to figure out how to do better in security. And my, kind of, tenure at the vendor is just a way to fund that [laugh] more effectively [unintelligible 00:21:08]. Corey: Yeah, I keep forgetting you’re ex-Gartner. Yeah, it’s one of those fun areas of, “Oh, yeah, we just want to basically talk about all kinds of things because there’s a—we have a chart to fill out here. Let’s get after it.” Anna: I did not invent an acronym, at least. Yeah, so my goal was the following. People are always looking for a benchmark or a goal or standard to be like, “Hey, am I doing a good job?” Whether I’m, like a SOC analyst or director, and I’m just looking at my little SOC empire, or I’m a full on CSO, and I’m looking at my entire security program to kind of figure out risk, I need some way to know whether what is happening in my organization is, like, sufficient, or on par, or anything. Is it good or is it bad? Happy face? Sad face? Like, I need some benchmark, right? So normally, the Gartner answer to this, typically, is like, “You can only come up with benchmarks that are—” they’re, like, “Only you know what is right for your company,” right? It’s like, you know, the standard, ‘it depends’ answer. Which is true, right, because I can’t say that, like, oh, a huge multinational bank should follow the same benchmark as, like, a donut shop, right? Like, that’s unreasonable. So, this is also why I say that our benchmark is probably more tailored to the more advanced organizations that are dealing with kind of high maturity phenomena and are more cloud-native, but the donut shops should kind of strive in this direction, right? So, I hope that people will think of it this way: that they will, kind of, look at their process and say, “Hey, like, what are the things that would be really bad if they happened to me, in terms of sort detection?” Like, “What are the threats I’m afraid of where if I saw this in my cloud environment, I would have a really bad day?” And, “Can I detect those threats in 5/5/5?” Because if I can, then I’m actually doing quite well. And if I can’t, then I need to set, like, some sort of roadmap for myself on how I get from where I am now to 5/5/5 because that implies you would be doing a good job. So, that’s sort of my hope for the benchmark is that people think of it as something to aspire to, and if they’re already able to meet it, then that they’ll tell us how exactly they’re achieving it because I really want to be friends with them. Corey: Yeah, there’s a definite lack of reasonable ways to think about these things, at least in ways that can be communicated to folks outside of the bounds of the security team. I think that’s one of the big challenges currently facing the security industry is that it is easy to get so locked into the domain-specific acronyms, philosophies, approaches, and the rest, that even coming from, “Well, I’m a cloud engineer who ostensibly needs to know about these things.” Yeah, wander around the RSA floor with that as your background, and you get lost very quickly. Anna: Yeah, I think that’s fair. I mean, it is a very, let’s say, dynamic and rapidly evolving space. And by the way, like, it was really hard for me to pick these numbers, right, because I… very much am on that whole, ‘it depends’ bandwagon of I don’t know what the right answer is. Who knows what the right answer is [laugh]? So, I say 5/5/5 today. Like, tomorrow, the attack takes five minutes, and now it’s two-and-a-half/two-and-a-half, right? Like it’s whatever. You have to pick a number and go for it. So, I think, to some extent, we have to try to, like, make sense of the insanity and choose some best practices to anchor ourselves in or some, kind of like, sound logic to start with, and then go from there. So, that’s sort of what I go for. Corey: So, as I think about the actual reaction times needed for 5/5/5 to actually be realistic, people can’t reliably get a hold of me on the phone within five minutes, so it seems like this is not something you’re going to have humans in the loop for. How does that interface with the idea of automating things versus giving automated systems too much power to take your site down as a potential failure mode? Anna: Yeah. I don’t even answer the phone anymore, so that wouldn’t work at all. That’s a really, really good question, and probably the question that gives me the most… I don’t know, I don’t want to say lost sleep at night because it’s actually, it’s very interesting to think about, right? I don’t think you can remove humans from the loop in the SOC. Like, certainly there will be things you can auto-respond to some extent, but there’d better be a human being in there because there are too many things at stake, right? Some of these actions could take your entire business down for far more hours or days than whatever the attacker was doing before. And that trade-off of, like, is my response to this attack actually hurting the business more than the attack itself is a question that’s really hard to answer, especially for most of us technical folks who, like, don’t necessarily know the business impact of any given thing. So, first of all, I think we have to embrace other response actions. Back to our favorite crypto miners, right? Like there is no reason to not automatically shut them down. There is no reason, right? Just build in a detection and an auto-response: every time you see a crypto miner, kill that process, kill that container, kill that node. I don’t care. Kill it. Like, why is it running? This is crazy, right? I do think it gets nuanced very quickly, right? So again, in SCARLETEEL, there are essentially, like, five or six detections that occur, right? And each of them theoretically has a potential auto-response that you could have executed depending on your, sort of, appetite for that level of intervention, right? Like, when you see somebody assuming a role, that’s perfectly normal activity most of the time. In this case, I believe they actually assumed a machine role, which is less normal. Like, that’s kind of weird. And then what do you do? Well, you can just, like, remove the role. You can remove that person’s ability to do anything, or remove that role’s ability to do anything. But that could be very dangerous because we don’t necessarily know what the full scope of that role is as this is happening, right? So, you could take, like, a more mitigated auto-response action and add a restrictive policy to that rule, for example, to just prevent activity from that IP address that you just saw, right, because we’re not sure about this IP address, but we’re sure about this role, right? So, you have to get into these, sort of, risk-tiered response actions where you say, “Okay, this is always okay to do automatically. And this is, like, sometimes, okay, and this is never okay.” And as you develop that muscle, it becomes much easier to do something rather than doing nothing and just, kind of like, analyzing it in forensics and being, like, “Oh, what an interesting attack story,” right? So, that’s step one, is just start taking these different response actions. And then step two is more long-term, and it’s that you have to embrace the cloud-native way of life, right? Like this immutable, ephemeral, distributed religion that we’ve been selling, it actually works really well if you, like, go all-in on the religion. I sound like a real cult leader [laugh]. Like, “If you just go all in, it’s going to be great.” But it’s true, right? So, if your workflows are immutable—that means they cannot change as they’re running—then when you see them drifting from their original configuration, like, you know, that is bad. So, you can immediately know that it’s safe to take an auto-respon—well, it’s safe, relatively safe, take an auto-response action to kill that workload because you are, like, a hundred percent certain it is not doing the right things, right? And then furthermore, if all of your deployments are defined as code, which they should be, then it is approximately—[though not entirely 00:27:31]—trivial to get that workload back, right? Because you just push a button, and it just generates that same Kubernetes cluster with those same nodes doing all those same things, right? So, in the on-premise world where shooting a server was potentially the, you know, fireable offense because if that server was running something critical, and you couldn’t get it back, you were done. In the cloud, this is much less dangerous because there’s, like, an infinite quantity of servers that you could bring back and hopefully Infrastructure-as-Code and, kind of, Configuration-as-Code in some wonderful registry, version-controlled for you to rely on to rehydrate all that stuff, right? So again, to sort of TL;DR, get used to doing auto-response actions, but do this carefully. Like, define a scope for those actions that make sense and not just, like, “Something bad happened; burn it all down,” obviously. And then as you become more cloud-native—which sometimes requires refactoring of entire applications—by the way, this could take years—just embrace the joy of Everything-as-Code. Corey: That’s a good way of thinking about it. I just, I wish there were an easier path to get there, for an awful lot of folks who otherwise don’t find a clear way to unlock that. Anna: There is not, unfortunately [laugh]. I mean, again, the upside on that is, like, there are a lot of people that have done it successfully, I have to say. I couldn’t have said that to you, like, six, seven years ago when we were just getting started on this journey, but especially for those of you who were just at KubeCon—however, long ago… before this airs—you see a pretty robust ecosystem around Kubernetes, around containers, around cloud in general, and so even if you feel like your organization’s behind, there are a lot of folks you can reach out to to learn from, to get some help, to just sort of start joining the masses of cloud-native types. So, it’s not nearly as hopeless as before. And also, one thing I like to say always is, almost every organization is going to have some technical debt and some legacy workload that they can’t convert to the religion of cloud. And so, you’re not going to have a 5/5/5 threat detection SLA on those workloads. Probably. I mean, maybe you can, but probably you’re not, and you may not be able to take auto-response actions, and you may not have all the same benefits available to you, but like, that’s okay. That’s okay. Hopefully, whatever that thing is running is, you know, worth keeping alive, but set this new standard for your new workloads. So, when your team is building a new application, or if they’re refactoring an application, can’t afford the new world, set the standard on them and don’t, kind of like, torment the legacy folks because it doesn’t necessarily make sense. Like, they’re going to have different SLAs for different workloads. Corey: I really want to thank you for taking the time to speak with me yet again about the stuff you folks are coming out with. If people want to learn more, where’s the best place for them to go? Anna: Thanks, Corey. It’s always a pleasure to be on your show. If you want to learn more about the 5/5/5 benchmark, you should go to sysdig.com/555 https://sysdig.com/555. Corey: And we will, of course, put links to that in the show notes. Thank you so much for taking the time to speak with me today. As always, it’s appreciated. Anna Belak, Director at the Office of Cybersecurity Strategy at Sysdig. I’m Cloud Economist Corey Quinn, and this has been a promoted guest episode brought to us by our friends at Sysdig. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry, insulting comment that I will read nowhere even approaching within five minutes. Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com https://duckbillgroup.com to get started.

31m
Jan 04, 2024
The Fundamentals of Building Mission-Driven Technology with Danilo Campos

Danilo Campos, Proprietor of Antigravity, joins @quinnypig on Screaming in the Cloud to discuss his philosophy behind building tools that not only enhance developer experience but also improve the future of our world. Danilo shares his thoughts on how economic factors have influenced tech companies and their strategies for product, open source, and more. He also shares what he thinks is another, better way to approach these strategies, without ignoring the economic element. ABOUT DANILO Danilo Campos wants a world where technology makes us more powerful and expressive versions of ourselves. He worked with GitHub and the White House to deliver coding platforms to public housing residents, supported Glitch.com in its last days as an independent, and developed products for multiple early-stage startups, including Hipmunk. Today Danilo offers freelance developer experience services for devtools firms through Antigravity DX. LINKS REFERENCED: __ Antigravity DX: https://antigravitydx.com/ Blog: https://redeem-tomorrow.com __ Transcript Announcer: Hello, and welcome to with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is . Corey: Welcome to . I’m Corey Quinn, and periodically on this show, we like to gaze into the future and tried to predict how that’s going to play out. On this episode, I want to start off by instead looking into the past, more specifically my past. Before I started this place, I wound up working at a company called FutureAdvisor, which was a great startup for all of three months before we were bought by a BlackRock. I soon learned what a BlackRock actually was. While I was there, I encountered an awful lot of oral tradition around a guy named Danilo, and he—as it turned out—was a contractor who had been brought in to do a fair bit of mobile work. Meet my guest today, Danilo Campos, who is at present, the proprietor of a company called Antigravity DX https://antigravitydx.com/. Thank you so much for joining me, I appreciate it. Danilo: Hey, Corey, it’s good to be here. Corey: It’s weird talking to you, just because you were someone that I knew by reputation, and if I were to take all the things that were laid at your feet after you no longer had been there, it feels like you were there for 20 years. What did you actually do there, and how long were you embedded for? Danilo: I loved the FutureAdvisor guys. I thought they were such a blast to work with. I loved what they were working on. I learned so much about how finance and investing works from FutureAdvisor, and somehow it was only seven months of my life. I’d been introduced to the founders as a freelance iOS developer at the time—this was 2014—and a guy I had worked with at Hipmunk actually put me in touch with these guys, and we connected. And they needed to get started doing mobile. They’d never done any mobile stuff, they didn’t have anyone on staff who did mobile stuff. And by that point, I’d shipped I think, must have been half a dozen iOS native apps, and so I knew this stuff pretty well. I understood the workflows, I understood the path to getting from idea to shipped product, and they just wanted occasional help. How do we wireframe this? How do we plan the product that way? How do we structure this thing? And so, it started off as this just, kind of, occasional troubleshooting consulting thing. And I think about August 2014. They call me in for a meeting, they said, “Hey, we’re stuck. We don’t know how to get this thing off the ground. Could you help us get this project moving so that we actually ship it?” And so, I just came and embedded for seven months, and by the end of it, I was just running the entire iOS engineering team. We had a designer working with us. We had, I think it was four folks who were building the product. We had QA. It was a whole team to get this thing out the door. And we got it out the door after seven months of really working at it. And like I said, it was a blast. I love those folks. Corey: I have to be clear, when I say that I encountered a lot of what you had done. It was not negative. This was not one of those startups where there’s a glorious tradition of assassinating the character out of everyone who has left the company—or at least Git repos—because they’re not there to defend themselves anymore. There were times where decisions that you had made were highlighted as, “We needed to be doing things more like this.” There were times it was, “Oh, we can’t do that because of how you wound up building this other thing.” And it was weird because it felt like you were the hand of some ancient deity, just moving things back and forth in your infinite wisdom of the ancients. It was unknowable, and we had to accept it as gospel, whether we liked it or not, at different times. In practice, I now know this was honestly just the outgrowth of a rapidly expanding culture where you’ve got to go from a team of five people to the team of 50 and keep everyone rowing in the same direction, ideally. But it was a really interesting social dynamic that I got to observe as a result, and I’m just tickled pink to be able to talk to you now. What are you doing these days? Danilo: Thank you for the context, by the way, because you know, I move on, as you do in a contract capacity, and you hope things work out. Corey: Yeah. To be clear, it was never a context of, “There’s the bastard. Get him.” Like, that is not the perspective we are coming at this from at all. Danilo: Yeah, yeah, yeah. No, and it’s hard because it was a very strange, alien codebase compared to the rest of the company. I get how it ended up in that spot. These days, I am a freelance developer experience consultant, and I spent a year-and-a-half at Glitch.com. And developer experience was always something that I really cared about. I did some work at GitHub that was about getting people—specifically teenagers living in public housing—into computing and the internet, and I’d had to do a bunch of DX work to make that happen because I had an afternoon to get people from zero to writing code. And that is not a straightforward situation, especially in a low-income housing environment, for example, right? So, I cared about this stuff a lot. And then I spent a year-and-a-half at Glitch.com, and it was like getting a graduate degree on everything about the leverage for creating outcomes in developer tools. And I just, I felt like I was carrying some gift from the Gods. I just, I felt the need to get this out to the wider world, and so that’s what I do with Antigravity. Corey: When I got to catch up with you in person for the first time at the excellent and highly recommended Monktoberfest conference— Danilo: Excellent. Corey: —that the folks over at RedMonk put on every year, it was interesting, in that you and I got to talking very rapidly, not about technology as such, but about culture and the industry and values and the rest. It was a wonderfully refreshing conversation that I don’t normally get to have so soon after meeting someone. I think that one of the more interesting aspects of our relatively wide-ranging conversation in a surprisingly brief period of time focused, first off, among the idea of developer tools and what so many of them seem to get wrong. I know that we basically dove into discussing about our violently agreeing opinions around the state of developer experience, for example. What are the hills you’re willing to die on in that space? Danilo: I think that computing generally exists to amplify and multiply our power. Computing exists to let us do things that we could not do with the simple, frail flesh that we’re born with, right? Computers augment our ambitions because they can do things with infinite iteration. And so, if you can come up with something that you can bottle in the form of an algorithm that repeats infinitely, you can have incredible impact on the world. And so, I think that there’s a responsibility to find ways to make that power something that is easy to hand to other people and let them pick up and run with. And so, developer tools, to me, has this almost sacred connotation because what you’re doing is handing people the fire of the Gods and saying, “Whatever you can come up with, whatever your imagination allows you to do with these tools, they can repeat infinitely and make whatever change you want—for good or for ill—in the world.” And that’s very special to me. I think we’ve gotten bored of it because it’s just, you know, it’s a 50-year-old business at this point. But I think there’s still a lot of magic to it, and the more we see the magic, the more magical outcomes we can coax out of everyday people who become better developers. Corey: From my perspective, one of the reasons I care so much about developer experience is that the failure mode of getting it wrong means that the person trying to understand the monstrosity you’ve built feels like they’re somehow not smart, or they’re just not getting it in some key and fundamental way. And that’s not true. It’s that you, for whatever reason, what you have built is not easily understandable to them where they are. I go back to what I first heard in 2012, at a talk that Logstash creator, Jordan Sissel wound up saying, where his entire thesis was that if a user has a bad time, it’s a bug. Danilo: Yeah. Corey: And I thought that that was just a wonderfully prescient statement that I wanted to sign onto wholeheartedly. [That was 00:09:08] my first exposure to it. I know that’s not the entirety of developer experience by a long shot, but it’s the one where I think you lose the most mind share when you get it wrong. Danilo: Well, and I’m glad that you bring that up because I think that kind of defines the spectrum of the emotional experience of interacting with developer tools. On one end of the spectrum, you’ve got, “I feel so stupid. This has made me feel worse about myself. This has given me less of a sense of confidence in myself than I had when I started.” And at the other end of the spectrum, the other extreme is, “I cannot believe I am this cool. I cannot believe that my imagination has been made manifest in this way that now exists in the world and can go out and touch other people and make their lives better.” Those are the two, kind of, extremes of the subjective emotional experience that can come from developer tools. And so, I think that there is a business imperative that really pushes us toward the extreme of making people feel awesome. I think about this in the context of , you’ve seen , yeah. Corey: Oh, yes. Danilo: All right. So, the Iron Man suit is the perfect metaphor for a developer tool that is working correctly for you, right? Because on its own, the suit is not very interesting, and on his own, Tony Stark is not all that powerful, but you combine the suit and the person, and suddenly extraordinary emergent outcomes come out. The ambition of the human is amplified, and he feels so [BLEEP] cool. And I think that’s what we’re looking to do with developer tools is that we want to take a person, amplify their range, give them a range of motion that lets them soar into the clouds and do whatever they need to do up there so that when they come back down, they feel transformed. They feel like more than what they started. Corey: I would agree with that. There’s a sense of whimsy and wonder as I look through my career trajectory, going from a sysadmin role, where you there was a pretty constant and hard to beat ratio in most shops—and the ratio [unintelligible 00:11:29] varied—but number of admins to the number of servers. And now with the magic of cloud being what it is, it’s a, “Well, how many admins does it take to run X number of servers?” Like, “Well, as an [admin done 00:11:39] right, I can manage all of them because that’s how programming languages work.” And that is a mystical and powerful thing. But lately, it seems like there’s been some weird changes in the world of developer tooling. Cynically, I’ve said a couple of times that giving a toss about the developer experience was in fact a zero interest rate phenomenon. Like, when you’re basically having to fend off casual offers of 400 grand a year from big tech, how do you hire and retain people at a company that has one of those old, tiny profit-generating business models and compete with them? And a lot of times, developer experience was part of how you did that. I don’t know that I necessarily believe that that is as tied to that cynical worldview as I might pretend on the internet, but I don’t know—I do wonder if it’s a factor because it seems like we’ve seen a definite change in the way that developer tools are approaching their community of users and customers. Danilo: Well, my immediate reflex is to open up the kind of systems theory box and look at what’s inside of that. Because I think that what we are experiencing, if we use the interest rate lens, is a period of time where everyone is a little bit worried that the good times are over for good. And I feel the sense of this in a lot of places. I think developer experience is a pretty good avatar to try this on with because I definitely also perceive it in that sphere. During the heyday of 0% interest rates, everything was about how much totalizing growth can you achieve? And from a developer tools perspective, all right, well, we need to make it so that the tools, kind of, grow themselves, so let’s invest a lot in developer experience so that people very quickly get onboarded, without us having to hold their hand, without us having to conduct a sales call, let’s get them to the point where they can quickly understand—because the documentation is so good and the artifacts are so good—exactly how to use these tools to maximum effect. Let’s get them to a point where it’s very easy for them to share the results of their work so that other people see the party and really want to join in. And so, all kinds of effort and energy and capital was being invested in this kind of growth strategy. And now I think that people are, again, a little bit afraid that the good times are over, and so we see this really sales-driven culture of growth, where it’s like, all right, well, for this company to succeed, we have to really make sure that we’re going and closing these big sales, and if individual developers can’t figure out how the hell this works, well, that’s their problem, and we’re not going to worry about it. And we’ve talked about this: this fear of the good times being over drives people, I think, to all kinds of bad behavior. The rug-pulling that we’ve seen in open-source licensing where somebody’s like, “All right, I’ve taken a bunch from this community, and now I’m going to keep it, and I’m not going to give anything back.” This is the behavior of people who are afraid that the good times are behind them. I don’t have the luxury of being that pessimistic about the future, and I don’t think our industry can afford it either. [midroll 00:15:03] Corey: The rules changing late in the game is something that has always upset me. It feels inherently unfair, and it’s weird because you can have these companies say that, “Look, we’ve never done anything like that. Why wouldn’t you trust us?” Right up until the point where they do. Reddit is a great example, where for years, they had a great API—ish—that could do things that their crap-ass mobile client natively couldn’t. And Apollo was how I interacted with Reddit constantly. I was a huge Reddit user. I was simultaneously, at one point, moderator of the legal advice subreddit and the personal finance subreddit. I was passionate about that stuff, and it was great. And then they wound up effectively killing all third-party clients that don’t bend the knee, and well, why am I going to spend my time donating content and energy and time to a for-profit company that gets very jealous when other people find ways to leverage their platform in ways that they don’t personally find themselves able to do. Screw ‘em. I haven’t been back on Reddit since. It’s just a, “Fool me once, shame on me story.” Twitter did the exact same thing. I built a threading Twitter client simultaneously deployed to 20 AWS regions, until they decided they didn’t want people creating content through their APIs and killed the whole thing with no notice. Great. Now, they’re—I got an email asking me to come back. Go to hell. I tried that once. You’ve eviscerated people’s businesses and the rest. And you see it with licensed changes as well. But it all comes down to the same thing, from my perspective, which is an after-the-fact changing of the rules. And by moving the goalposts like that, I wonder what guarantees a startup or a project that doesn’t intend to do those things can offer to its community. Because, look, HashiCorp made its decision to change the licensing for Terraform. Good for them. They’re entitled to do that. I’m not suggesting, in any way shape or form, that they have violated any legal term. And I don’t even know they’re necessarily doing anything that doesn’t make sense from their point of view. And the only people I really see that upset about it are licensing purists—which I no longer am for a variety of reasons—people who work at HashiCorp, obviously, and their direct competitors who are not sympathetic in that particular place. But as a counterpoint, if they wind up building a new open-source project, of course, I’m not going to contribute. I mean, that’s a decision I get to make. And I don’t know how you square that circle because otherwise, if that continues, no one will be able to have a sense of safety around contributing to anything open-source unless they’re pleased to wind up doing volunteer work for a one-day unicorn. Danilo: So, I really appreciate the economical survey of the landscape that you just provided because I think that captures it really well. The Reddit case in particular breaks my heart. I will go to my grave absolutely loving Steve Huffman. Steve Huffman gave me my first break as a paid developer and product designer, and he was an enormous pain in the ass to work with, and I loved every minute of it. Like, he’s just an interesting, if volatile, character. And I see that volatility playing out with Red Hat in the incredible hostility that they were conveying around being held to account for these changes. And I have a lot of sympathy for that crew because they’ve built all this value, they kind of missed the euphoria boat in terms of, you know, getting the best price for an IPO, for example, and they’ve got to figure out, all right, how do we scrape together value from what we’ve got within the constraints that we have? How do we build a fence around the value that we’ve got and put a tollbooth in front of it so that the public markets are excited about this and give us our best bang for the buck? That’s Steve Hoffman’s job. That’s his crew’s job. I understand the pressures and I respect that. And I think that the way they went about it this year was short-sighted because what it does is it undervalues everybody who isn’t in the boardroom, making decisions with them. I think what we have to understand that when we build software, Metcalfe’s law applies to developer tools just as much as any other network here. And so, the people who are stakeholders, who are participants, who are constituents of your community, are load-bearing members of the value chain that you are putting together, and so when you just cut them out, you might be nicking an artery that bleeds out very, very, very slowly. And the sentiment that you just expressed here about how your experience of Reddit was soured, I mean you’re the enthusiast type, right? Like, who wants to sign up for the drama of flame wars and moderation except if you really just love it? And so, what they were able to do was take people who, for years, absolutely loved it, and just drain away their love and enthusiasm for it. And the thing is, over time, that harms the long-term value that you are trying to actually protect. When we live in a world where computers can do all of this stuff infinitely, when they will provide us with extraordinary scale, when information can be copied and distributed at near-zero marginal cost, what we’re doing is setting up chains of incentives to get people to do stuff, essentially, for free. You were unpaid labor doing that moderation, and the reason that you did it for free was because it was fun, was because it spoke to something inside of you that really mattered, and you wanted to provide for a community of other people who also cared about these topics. And that fun was taken away from you. So, there’s a bunch of this stuff that doesn’t fit into a spreadsheet, and if we make decisions exclusively on what fits into a spreadsheet, we’re going to turn around someday and find that we have cut off some of the most valuable parts of what makes this industry great. Corey: I agree. I feel like companies have a—they launch, and they want the benefits of having an open-source community, but as they grow and get to a point of success and becoming self-sustaining, it’s harder to see those benefits because at that point, it just feels like it’s all downside: you are basically giving what you built away to your direct competitors, you are seeing significant value scattered throughout the ecosystem that you are capturing a very small portion of, and it becomes frustrating—especially in historical environments—where you have the sense of—back when you built the company years ago, it’s well, obviously we’d be the best place to host and run this because no one’s going to run this as well as the people who built it. And then cloud companies, with their operational excellence, come in and put the lie to that, in many cases [laugh]. It’s like, oh dear. Not like that. And I understand, truly, the frustration and the pain and the fear that drives companies in that position. And I don’t have a better answer, which is my big problem because I’m just sitting here saying, “You’re doing it wrong. Don’t do it like that.” “Okay, well, what should they do instead?” “No, I just want to be angry. I’m not here to offer solutions.” And I feel for them. I do. I have a lot of empathy for everyone involved in this conversation. It just sucks, but we need a better outcome than the current state, or we’re going to not see the same open innovation. Even these days, when I build things, by default, I don’t build in the open, not because I’m worried about competitive threats, but because I don’t want to deal with people complaining to me about things that I’ve built and don’t want to think about this week. Danilo: I think that we’re living through the hangover of—I mean, if you looked at the crypto craze as an example of this hangover, right—here we were with the sky the limit. We can sell monkey pictures for extraordinary amounts of money and there’s nothing behind it. We went from euphoria to fear in the space of a handful of quarters. And so, that has put all of us, even the most optimistic, in a place where we feel our backs are against the walls. But I think the responsibility we have is, again, computing fundamentally changes the economics of so many categories of labor, and it changes the economics of information generally. And so, we can do a bunch of stuff that doesn’t cost that much over the long-term, relative to the value it creates. But it only works if we have a really clear thesis of the value we’re creating. If we don’t value the contributions of a community, if we don’t value the emergent outcomes that arise from building something that’s very expressive, that then lets outsiders show up and do things that we never predicted, if we’re not building strategies that look at this value as something that is precious instead of something to be cut off and captured, then I think that we just continue to spiral down the drain of paranoia, and greed, and fear instead of doing things that actually create long-term sustainable growth for our business. Corey: I really wish that there were easier, direct paths. Like on some level, too, it’s—I feel like this is part of the problem, that every company views going public as its ultimate goal. Danilo: Yeah. Corey: At least that’s what it feels like. Like The Duckbill Group. If we ever go public, my God, I will have been so far gone from this company long before then, just because at that point, you have given control over to people who are not aligned, in many cases, with the values that you founded the company with. Like, one of the things I love about being a small business is that I don’t need to necessarily think the next quarter’s earnings. I can think longer-term. “Okay, in two or three years, what do I want to be doing?” Or five or ten. I’m not forced into this narrow, short-sighted treadmill where I have to continually show infinite growth in all areas at all times. That doesn’t sound healthy. Danilo: I agree, and I think that this is a place where I can give you a lot of hope because I look at a handful of economic tailwinds that are really going to make it possible to build businesses in a different way than was practical before. If we look at the last cycle, one of the absolute game changers was open-source. So, you showed up and there was already a web server written for you, and there was already a database written for you, and so you would just pull these things off the shelf instead of having to hire a team that would build your web server from scratch, that would build your database from scratch. And so, that changed the economics of how companies could be made, and that created an entire cycle of new technology growth. And if we look for an analogy of that kind of labor savings for the next technology cycle, we’re going to see things like cloud-based serverless services, right? So like, now you don’t need to even administer a Linux server. You don’t need to know how the server works under the hood. You pay one company for an API that gives you a database, and they manage the stuff. So, I’m thinking of companies like Neon, or PlanetScale, right? You give them cash, they give you a database, they worry about it, they do all of the on-call stuff, you don’t have to think about. So, this makes it even cheaper to build things of higher complexity because you are outsourcing much of the management of that complexity to other firms. And I think that that pattern is going to change the overall costs of starting and scaling and maintaining any sort of web-based product. And so, that’s number one. And then number two, is that when we look at stuff like large language models, the stuff that you can do with ChatGPT in terms of figuring out how to solve a broad array of problems that maybe you don’t have a lot of domain expertise in, I think that means that we’re going to see smaller teams get even further than we expect. And so, the net result of these trends is going to be, you don’t need to take vast amounts of venture funding in order to get to a company that serves a large number of people at a meaningful scale, with meaningful returns for the principles involved, and then they don’t have to go all the way down to the IPO route. They don’t have to figure out some sort of mega-scale unicorn exit; they can just build companies that work, that solve customer problems, keep it close, and then you don’t have the totalizing endless need for growth. I think we’re going to see a lot more of that this cycle. Corey: I sure hope you’re right. I think that there’s been a clear trend toward panic, or at least if not panic, then at least looking at current conditions and assuming that they’ll persist forever. We just saw ten years of an unprecedented bull run, where people tended to assume that interest rates would be forever low, growth was always going to be double-digit at least, and there was no need to think about anything that would ever argue against those things. For the first few years of my consulting company, it was a devil of a problem trying to convince people to care about their AWS bills because frankly, when money is free, there is no reason for someone to. They are being irrational if they do. Now, of course, that’s a very different story, but at the time, I felt for a while like I was the one who was nuts. Danilo: So, the interest rate conditions are always going to make people behave a certain way. That’s why they exist, right? We have monetary policy designed to influence business behavior. And if we look at that zoom, then we say, “All right, look, this stuff is all cyclical. We know there’s going to be good times, we know there’s going to be lean times, but at the end of the day, we care about building stuff.” Right? I don’t spend a lot of time with the sort of venture capitalist set who’s really obsessed with building, but I really love building. I just, I can’t stop building things. It is what I was put on this planet to do, and I think that there are so many people who feel exactly the same way. And so, regardless of the larger interest-rate phenomenon, we have to find a path where we can just build the stuff that we need to build. Build it for our reasons, for the right reasons, not because we just want to cash out. Although, you know, getting paid is great. I don’t begrudge anyone that. Corey: You can’t eat aspirations, as it turns out. Danilo: That’s right, right? We’ve got to worry about the economics, and that’s reasonable. But at the end of the day, making things happen through technology is its own mission and its own reward, regardless of what some sort of venture fund needs to make return happen. So, I think that we are going to get past this moment of slump and return to the fundamentals of we need to build technology because building technology makes us feel good and creates impact in the world that we absolutely need. And those are the fundamentals of this business. Corey: I agree with you wholeheartedly. I think that I’ve been around too many cycles—this is a polite way of saying I’m old—and you learn when that happens that everything that feels so immediate and urgent in the moment, in the broad sweep of things, so rarely is. Not everything can be life or death because you’ll die lots of times. Danilo: Yeah. Corey: I really want to thank you for taking the time to speak with me. If people want to learn more, where’s the best place for them to find you? Danilo: If you want to engage me for my thinking and strategy around humanist technology tools growth, you should find me at antigravitydx.com https://antigravitydx.com. And if you want to read more about what I think about, I maintain a blog at redeem-tomorrow.com https://redeem-tomorrow.com, and you can learn all about my thinking about the last cycle, and the coming one as well. Corey: And I will absolutely include a link to that in the [show notes 00:31:52]. Thank you so much for taking the time to speak with me. I appreciate it. Danilo: It’s a pleasure, Corey. Thank you for having me. Really great to chat. Corey: Danilo Campos, proprietor at Antigravity DX. I’m Cloud Economist Corey Quinn, and this is . If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry, insulting comment taking care within that comment to link to a particular section of the FutureAdvisor code repo. Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com https://duckbillgroup.com to get started.

33m
Jan 02, 2024
How Vercel is Improving the Developer Experience on the Front End with Guillermo Rauch

Guillermo Rauch, Founder and CEO of Vercel, joins Corey on Screaming in the Cloud to discuss how he decided to focus on building a front-end tool that is fast, reliable, and focuses on the developer experience. Guillermo explains how he discovered that Javascript was the language that set online offerings apart, and also reveals the advice he gives to founders on how to build an effective landing page. Corey and Guillermo discuss the effects of generative AI on developer experience, and Guillermo explains why Vercel had a higher standard for accuracy when rolling out their new AI product for developers, v0.  ABOUT GUILLERMO Guillermo Rauch is Founder and CEO of Vercel, where he leads the company’s mission to enable developers to create at the moment of inspiration. Prior to founding Vercel, Guillermo co-founded LearnBoost and Cloudup where he served the company as CTO through its acquisition by Automattic in 2013. Originally from Argentina, Guillermo has been a developer since the age of ten and is passionate about contributing to the open source community. He has created a number of JavaScript projects including socket.io, Mongoose.js, Now, and Next.js. LINKS REFERENCED: __ Vercel: https://vercel.com/ https://vercel.com/ v0.dev: https://v0.dev https://v0.dev Personal website: https://rauchg.com https://rauchg.com Personal twitter: https://twitter.com/rauchg __ Transcript Announcer: Hello, and welcome to with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is . Corey: Welcome to . I’m Corey Quinn. I don’t talk a lot about front-end on this show, primarily because I am very bad at front-end, and in long-standing tech tradition, if I’m not good at something, apparently I’m legally obligated to be dismissive of it and not give it any attention. Strangely enough, I spent the last week beating on some front-end projects, and now I’m not just dismissive, I’m angry about it. Here to basically suffer the outpouring of frustration and confusion is Guillermo Rauch, founder and CEO of Vercel https://vercel.com/, but also the creator of Next.js. Guillermo, thank you for joining me. Guillermo: Great to be here. Thanks for setting me up with that awesome intro. Corey: It’s true, if I were talking to someone who looked at what I’ve done, and for some Godforsaken reason, they wanted to follow in my footsteps, well, that path has been closed, so learning a bunch of Perl early on and translating it all to bad bash scripts and the rest, and then maybe picking up Python isn’t really the way that I would advise someone getting started today. The de facto lingua franca of the internet is JavaScript, whether we like it or not, and I would strongly suggest that be someone’s first language despite the fact that I’m bad at it, I don’t understand it, and therefore it makes me angry. Guillermo: Yeah, it’s so funny because it sounds like my story. And my personal journey was, when I was a kid, I had a—I knew I wanted to hack around with computers, and reverse engineer them, and improve them, and just create my own things, and I had these options for what programming language I could go with. And I tried it all: PHP, Perl, [Mod PHP 00:02:12], [Mod Perl 00:02:13], Apache, LAMP, cgi-bin folders, all the whole nine yards. And regardless of what back-end technology I used, I encountered this striking fact, which was… the thing that can make your product really stand out in a web browser is typically involving JavaScript in some fashion. So, when Google came out with suggestions as you would type in a search box, my young kid Argentinian brain blew up. I was like, “Holy crap, they can suggest, they can read my mind, and they can render suggestions without a full page refresh? What is that magic?” And then more products like that came out. Google Docs, Gmail Chat, Facebook’s real time newsfeed, all the great things on the internet seemed to have this common point of, there’s this layer of interactivity, real-time data, customization, personalization, and it seems uniquely enabled by the front-end. So, I just went all in. I taught myself how to code, I taught myself—I became a front-end engineering expert, I joined some of the early projects that shaped the ecosystem. Like, there was this library called MooTools, and a lot of folks might not have heard that name. It’s in the annals of JavaScript history. And later on, you know, what I realized is, what if front-end can actually be the starting point of how you develop the best applications, right, rather than this thing that people, like, reluctantly frown upon, like yourself. I mention that as an opportunity rather than a diss because when you create a great front-end experience, now the data has proven you run a better business, you run a more dynamic business, probably are running an AI-powered business, like, all of the AI products on the planet today are using this technology to stream text in front of your eyes in real time and do all this awesome things. So yeah, I became obsessed with front-end, and I founded this company Vercel, which is a front-end cloud. So, you come here to basically build the best products. Now, you don’t have to build the back-end, so you can use back-ends that are off the shelf, you can connect to your existing back-ends, and we piggyback on the world’s best infrastructure to make this possible, but we offer developers a very streamlined path to create these awesome products on the internet. Corey: I have to say that I have been impressed when I’ve used Vercel for a number of projects. And what impresses me is less the infrastructure powering it, less the look at how performance it is and all the stuff that most people talk about, but as mentioned, I am not good at front-end or frankly programming at all. And so, many products in this space fall into the very pernicious trap of, “Oh, well, everyone who’s using this is at least this tall on the board of how smart you are to get on the amusement park ride.” So, I feel that when I’m coming at this from a—someone who is not a stranger to computers but is definitely new to this entire ecosystem, everything just made sense in a way that remarkably few products can pull off. I don’t know if you would call that user experience, developer experience, or what the terminology you bias for there is, but it is a transformational difference. Guillermo: Thank you. I think it’s a combination of things. So, developer experience has definitely always been a focus for me. I was that weird person that obsessed about the CLI parameters of the tool, and the output of the tool, and just like how it feels for the engineer. I did combine that with—and I think this is where Vercel really stands out—I did combine that with a world-class infrastructure bit because what I realized after creating lots and lots of very popular open-source projects—like, one is called Socket.io, and other one called Mongoose—DX, or developer experience, in the absence of an enticing carrot for the business doesn’t work. Maybe it has some short term adoption, maybe it has raving fans on Twitter or X [laugh], but at the end of the day, you have to deliver something that’s tangible to the end-user and to the business. So I think Vercel focusing on the front-end has found a magical combination there of I can make the developer lives easier. Being a developer myself, I just tremendously empathize with that, but it can also make more profit for the business. When they make your website faster and render more dynamic data that serve as better recommendations for a product on e-commerce or in a marketing channel, I can help you roll out more experiments, then they make your business better, and I think that’s one of the magical combinations there. The other thing, frankly, is that we’d started doing fewer things. So, when you come to Vercel, you typically come with a framework that you’ve already chosen. It could be Next.js, it could be View, [unintelligible 00:07:18], there’s 35-plus frameworks. But we basically told the market, you have to use one of these developer tools in order to guide your development. And what companies were doing before—I mean, this almost seems obvious in retrospect, that we would optimize for her certain patterns and certain tools—but what the market was doing before was rolling out your own frameworks. Like, every company was, basically—React, for example, is a very popular way of building interfaces, and our framework actually is built on top of React. But when I would go to and talk to all these principal engineers that all these companies, they were saying, oh yeah, “We’re creating our own framework. We’re creating our own tools.” And I think that to me now feels almost like a zero interest rate phenomenon. Like, what business do you have in creating frameworks, tools, and bespoke infra when you’re really in the business of creating delightful experiences for your customers? Corey: What I think is lost on a lot of folks is that if you are trying to learn something new and use a tool, and the developer experience is bad, the takeaway—at least for me and a lot of people that I talk to is not, “Oh, this tool has terrible ergonomics. That’s it’s problem.” Instead, the takeaway is very much, “Oh, I’m dumb because I don’t understand this thing.” And I know intellectually that I am not usually the dumbest person in the world when it comes to a particular tool or technology, but I keep forgetting that on a visceral level. It’s, “I just wish I was smart enough to understand that.” No, I don’t. I wish it was presented in a way that was more understandable and the documentation was better. When you’re just starting out and building something in your spare time, the infrastructure cost is basically nothing, but your time is the expensive part in it. So, if you have to spend three hours to track down something just because it wasn’t clearly explained, the burden of adopting that tool is challenging. I would argue that one of the reasons that AWS sees some of the success that it does is not necessarily because it’s great so much as because everyone knows how it breaks. That’s important. I’m not saying their infrastructure isn’t world-class—please, don’t come at me in the comments on this one—but I am saying that we know where its sharp edges are, and that means that we’re more comfortable building with it. But the idea of learning a brand-new cloud with different sharp edges in other areas. That's terrifying. I’d rather stick with the devil I know. Guillermo: Exactly. I just think that you’re not going to be able to make a difference for customers in 2023 by creating another bespoke cloud that is general purpose, it doesn’t really optimize around anything, and you have to learn all the sharp edges from scratch. I think we saw that with the rise of cloud-native companies like Stripe and Twilio where they were going after these amazingly huge markets like financial infrastructure or communications infrastructure, but the angle was, “Here’s this awesome developer experience.” And that’s what we’re doing with Vercel for the front-end and for building products, right? There has to be an opinionated developer experience that guides you to success. And I agree with you that there’s really, these days in the developer communities, zero tolerance for sharp edges, and we’ve spent a lot of time in—even documentation, like, it used to be that your startup would make or break it by whether you had great documentation. I think in the age of frameworks, I would even dare say that documentation, of course, is extremely important, but if I can have the tool itself guide you to success, at that point, you’re not even reading documentation. We’re now seeing this with AI and, like, generative AI for code. At Vercel, we’re investing in generative AI for user interfaces. Do you actually need to read documentation at that point? So, I think we’re optimizing for the absolute minimal amount of friction required to be successful with these platforms. Corey: I think that there’s a truth in that of meeting customers in where they are. Your documentation can be spectacular, but people don’t generally read the encyclopedia for fun either. And the idea of that is that—at least ideally—I should not need to go diving into the documentation, and so many tools get this wrong, where, “Oh, I want to set up a new project,” and it bombards you with 50 questions, and each one of these feels pretty… momentous. Like, what one-way door am I passing through that I don’t realize the consequences of until I’m 12 hours into this thing, and then have to backtrack significantly. I like, personally, things that bias for having a golden path, but also make it easy to both deviate from it, as well as get back onto it. Because there’s more than one way to do it is sort of the old Perl motto. That is true in spades in anything approaching the JavaScript universe. Guillermo: Yeah. I have a lot of thoughts on that. On the first point, I completely agree that the golden path of the product cannot be documentation-mediated. One of the things that I’ve become obsessive about—and this is an advice that I share with a lot of other startup founders is, when it comes to your landing page, the primary call to action has to be this golden path to success, like, 2, 3, 4 clicks later, I have to have something tangible. That was our inspiration. And when we made it the primary call to action for Vercel is deploy now. Start now. Get it out now. Ship it now. And the way that you test out the platform is by deploying a template. What do we do is we create a Git repo for you, it sets up the entire CI/CD pipeline, and then at that point, you already have something working, something in the cloud, you spent zero time reading documentation, and you can start iterating. And even though that might not be the final thing you do in Vercel, I always hear the stories of CTOs that are now deploying Vercel at really large scale, and they always tell me, “I started with your hobby tier, I started with free tier, I deployed a template, I hacked on a product during the weekend.” Now, a lot of our AI examples are very popular in this crowd. And yeah, there’s a golden path that requires zero documentation. Now, you also mentioned that, what about complexity? This is an enterprise-grade platform. What about escape hatches? What about flexibility? And that’s where our platform also shines because we have the entire power of a Turing-complete language, which is JavaScript and TypeScript, to customize every aspect of the platform. And you have a framework that actually answered a lot of the problems that came with serverless solutions in the past, which is that you couldn’t run any of that on your local machine. The beauty of Vercel and Next.js is we kind of pioneered this concept that we called ‘Framework Defined Infrastructure.’ You start with the framework, the framework has this awesome property that you can install on your computer, it has a dev command—like, it literally runs on your computer—but then when you push it to the cloud, it now defines the infrastructure. It creates all of the resources that are highly optimal. This creates—basically converts what was a single node system on your computer to a globally distributed system, which is a very complex and difficult engineering challenge, but Vercel completely automates it away. And now for folks that are looking for, like, more advanced solutions, they can now start poking into the outputs of that compilation process and say, “Okay, I can now have an influence or I can reconfigure some aspects of this pipeline.” And of course, if you don’t think about those escape hatches, then the product just ends up being limiting and frustrating, so we had to think really hard about meeting both ends of the spectrum. Corey: In my own experimentations early on with Chat-Gippity—which is how I insist on pronouncing ChatGPT—a lot of what I found was that it was a lot more productive for me to, instead of asking just the question and getting the answer was, write a Python script to— Guillermo: Yes. Corey: Query this API to get me that answer. Because often it would be wrong. Sometimes very convincingly wrong, and I can at least examine it in various ways and make changes to it and iterate forward, whereas when everything is just a black box, that gets very hard to do. The idea of building something that can be iterated on is key. Guillermo: I love that. The way that Vercel actually first introduced itself to the world was this idea of immutable deployments and immutable infrastructure. And immutable sounds like a horrible word because I want to mutate things, but it was inspired by this idea of functional programming where, like, each iteration to the state, each data change, can be tracked. So, every time you deploy in Vercel, you get this unique URL that represents a point-in-time infrastructure deployment. You can go back in time, you can revert, you can use this as a way of collaborating with other engineers in your team, so you can send these hyperlinks around to your front-end projects. And it gives you a lot of confidence. Now, you can iterate knowing that before things go out, there’s a lot of scrutiny, there’s a lot of QA, there’s a lot of testing processes that you can kick off against this serverless infrastructure that was created for each deployment. The conclusion for us so far has been that our role in the world is to increase iteration velocity. So, iteration speed is the faster horse of the cloud, right? Like, instead of getting a car, you get a faster horse. When you say, “Okay, I made the build pipeline 10% faster,” or, “I brought the TLS termination 10% closer to the visitor, and, like, I have more [pops 00:17:10],” things like that. That to me, is the speed. You can do those things, and they’re awesome, but if you don’t have a direction—which is velocity—then you don’t know what you’re building next. You don’t know if your customers are happy. You don’t know if you’re delivering value. So, we built an entire platform that optimizes around, what should you ship next? What is the friction involved in getting your next iteration out? Is launching an experiment on your homepage, for example, is that a costly endeavor? Does it take you weeks? Does it take you months? One of the initial inspirations for just starting Vercel and making deployments really easy was, how difficult is it for the average company to change in their footer of their website is this copyright 2022? And you have—it's a new year. You have to bump it to copyright 2023. How long do you think it takes that engineer to, A, run the stack locally, so they can actually see the change; deploy it, but deploy to what we call the preview environment, so they can grab that URL and send [it to 00:18:15], Corey, and say, “Corey, does it look good? I updated [laugh] I updated the year in the footer.” And then you tell me, “Looks good, let’s ship it to production.” Or you tell me, “No, no, no, it’s risky. Let’s divide it into two cohorts: 50% of traffic gets 2022, 50% of traffic gets 2023.” Obviously, this is a joke, but consider the implications of how difficult it is and the average organization to actually do this thing. Corey: Oh, I find things like that all the time, especially on microservices that I built to handle some of my internal workflows here, and I haven’t touched in two or three years. And okay, now it’s time for me to update them to reflect some minor change. And first, I wind up in the screaming node warnings and I have to update things so that they actually work in a reasonable way. And, on some level, making a one-line change can take half a day. Now, in the real world, when people are working on these apps day-in and day-out, it gets a lot easier to roll those changes in over time, but coming back to something unmaintained, that becomes a project the longer you let it sit. Part of me wishes that there were easier ways around it, but there are trade-offs in almost any decision you make. If you’re building something from the beginning of, well, I want to be able to automatically update the copyright year, you can even borderline make that something that automatically happens based upon the global time, whereas when you’re trying to retrofit it afterwards, yeah, it becomes a project. Guillermo: Yeah, and now think, that’s just a simple example of changing a string. That might be difficult for a product engineering in any organization. Or it may be slow, or it may be not as streamlined, or maybe it works really well for the first project that that company created. What about every incremental project thereafter? So, now I said—let’s stop talking about a string, right? Let’s think about you’re about an e-commerce website where what we hear from our customers on average, like, 10% of revenue flows through the homepage. Now, I have to change a primary component that renders on the hero of the page, and I have to collaborate with every department in the organization. I have to collaborate with the design team, I have to collaborate with marketing, I have to collaborate with the business owners to track the analytics appropriately. So, what is the cost of every incremental experiment that you want to put in production? The other thing that’s particularly interesting about front-end as it relates to cloud infrastructure is, scaling up front-end is a very difficult thing. What ends up happening is most front-ends are actually static websites. They’re cached at the edge—or they’re literally statically generated—and then they push all of the dynamism to the client side. So, you end up with this spaghetti of script tags on the client, you end up accumulating a lot of tech debt in the [shipping 00:20:56] huge bundles of JavaScript to the client to try to recover some dynamism, to try and run these experiments. So, everyone is in this, kind of, mess of the yes, maybe we can experiment, but we kind of offloaded the rendering work to the client. That in turn makes me—basically, I’m making the website slower for the visitor. I’m making them do the rendering work. And I’m trying to sell them something. I’m trying to speed up some processes. It’s my responsibility to make it fast. So, what we ended up finding out is that yes, the cloud moved this forward a lot in terms of having these awesome building blocks, these awesome infrastructure primitives, but both in the developer experience, just changing something about your web product and also the end-user experiences, that web product renders really fast, those things really didn’t happen with this first chapter of the cloud. And I think we’re entering a new generation of higher-level clouds like Vercel that are optimizing for these things. Corey: I think that there’s a historical focus on things that have not happened before. And that was painful and terrible, so we’re not going to be focusing on what’s happening in the future, we’re going to build a process or a framework or something that winds up preventing that thing that hurt us from hurting us again. Now, that’s great in moderation, but at some point—we see this at large companies from time-to-time—where you have so much process that is ossified scar tissue, basically, that it becomes almost impossible to get things done. Because oh, I want to make that—for example, that one-line change to a copyright date, well, here’s the 5000 ways deploys have screwed us before, so we need to have three humans sign off on the one-line change, and a bunch of other stuff before it ever sees the light of day. Now, I’m exaggerating slightly, just as you are, but that feels like it acts as a serious brake on innovation. On the exact opposite side, where we see massive acceleration has been around the world of generative AI. Yes, it is massively hyped in a bunch of ways. I don’t think it is going to be a defined way that changes the nature of humanity the way that some of these people are going after, but it’s also clearly more than a parlor trick. Guillermo: I’m kind of in that camp. So, like you, I’ve been writing code for many years. I’m pretty astonished by the AI’s ability to enhance my output. And of course, now I’m not writing code full time, so there is a sense of, okay because I don’t have time, because I’m doing a million things, any minute I have seems like AI has just made it so much more worthwhile and I can squeeze so much more productivity out of it. But one of the areas that I’m really excited about is this idea of generative UI, which is not just autocompleting code in a text editor, but is the idea that you can use natural language to describe an interface and have the AI generate that interface. So, Vercel created this product called v0—you can check it out at v0.dev https://v0.dev—where to me, it’s really astonishing that you can get these incredibly high quality user interfaces, and basically all you have to do is input [laugh] a few English words. I have this personal experience of, I’ve been learning JavaScript and perfecting all my knowledge around it for, like, 20 or so years. I created Next.js. And Next.js itself powers a lot of these AI products. Like the front-end of ChatGPT is built on Next.js. And I used v0 to create… to basically recreate my blog. Like, I created rauchg.com https://rauchg.com, I deployed it on Vercel, but every pixel of that UI, I handcrafted. And as we were working on v0, I said okay, “I’m going to challenge myself to put myself back in the shoes of, like, I’m going to redesign this and I’m going to start over with just human language.” Not only did I arrive to the right look and feel of what I wanted to get, the code that it produced was better than I would have written by hand. Concretely, it was more accessible. So, there were areas of the UI where, like, some icons were rendered where I had not filled in those gaps. I just didn’t know how to do that. The AI did. So, I really believe that AI will transform our lives as [laugh] programmers, at least I think, in many other areas in very profound ways. Corey: This is very similar to a project that I’ve been embarked on for the last few days where I described the app I wanted into Chat-Gippity and follow the instructions, and first, it round up point—sending me down a rabbit hole of the wrong Framework version that had been deprecated, and whatnot, and then I brought it all into VS Code where Jif-Ub Copilot, it kept switching back and forth between actively helpful, and ooh, the response matches publicly available code, so I’m not going to tell you the answer, despite the fact that feature has never been enabled on my account. So yeah, of course, it matches publicly available code. This is quite literally the React tutorial starter project. And it became incredibly frustrating, but it also would keep generating things in bursts, so my code is not at all legible or well organized or consistent for that matter. But it’s still better than anything I’d be able to write myself. I’m looking forward to using v0 or something like it to see how that stacks up for some of my ridiculous generation ideas for these things. Guillermo: Yeah, you touched on a very important point is, the code has to work. The code has to be shippable. I think a lot of AI products have gotten by by giving you an approximation of the result, right? Like, they hallucinate sometimes, they get something wrong. It’s still very helpful because sometimes it’s sending you the right direction. But for us, the bar is that these things have to produce code that’s useful, and that you can ship, and that you can iterate on. So, going back to that idea of iteration velocity, we call it v0 because we wanted it to be the first version. We still very much believe there is humans in the loop and folks will be iterating a lot on the initial draft that this thing is giving you, but it’s so much better than starting with an empty code editor, [laugh] right? Like, and this applies, by the way to, like, not just new projects, but I always talk about, like, our customers have a few really important landing pages, key pages, maybe it’s the product detail page in e-commerce, maybe it’s your homepage and, like, your key product pages for a marketing website. Maybe it’s where—and the checkout, for example, extremely important. But then there’s a lot of incremental UIs that you have to add every single day. The banner for [laugh] accepting cookies or not, the consent management dialog. There’s a lot of things that the worst case scenario is that you offload them again to some third-party script, to some iframe of sorts because you really don’t have the bandwidth, time, or resources to build it yourself. And then you sacrifice speed, you sacrifice brand fidelity. And again because we’re the front-end cloud, we’re obsessed with your ability to ship UI that’s native to your product, that is a streamline, that works really well. So, I think AI is going to have a significant effect there where a lot of things where you were sending someone to some other website because you just didn’t have the bandwidth to create that UI, you can now own the experience end to end. Corey: That is no small thing. A last question I have, before we wind up calling this episode is, there was a period of time—I don’t know if we’re still in it or not—where it felt like every time I got up to get a cup of coffee and came back, there would be three JavaScript frameworks that launched during that interim. So, Next.js was at 1.1 of those when someone got up to get a cup of coffee. But that’s shown a staying power that is, frankly, remarkable. Why? I don’t know enough about the ecosystem to have an opinion on that, but I noticed when things stand out, and Next does. Guillermo: Yeah, I think it’s a number of factors. Number one, we, as an industry I think, we coalesced, and we found the right engine to build our car. And that engine became React. Most folks building UI today are choosing React or a similar engine, but React has really become the gold standard for a lot of engineers. Now, what ended up happening next is that people realized I want a car. I want the full product. I need to drive. I don’t want to assemble this freaking car every single time I have a new project. And Next.js filled a very important gap in the world where what you were looking for was not a library; what you were looking for is a framework that has opinions, but those opinions are very in line with how the web is supposed to work. We took a page from, basically, the beginnings of the web. We make a lot of jokes that in many ways, our inspiration was PHP, where server rendering is the default, where it’s very expressive, it’s very easy to reach for data. It just works for a lot of people. Again, that’s the old [stack 00:30:03] in the olden days. And so, it obviously didn’t quite work, but the inspiration was, can we make something that is a streamline for creating web interfaces at scale? At scale. And to your point, there’s also a sense of, like, maybe it doesn’t make sense anymore to build all this infrastructure from scratch every single time I started a project. So, Next filled in that gap. The other thing we did really well, I think, is that we gave people a universal model for how to use not just the server side, but also the client side strategically. So, I’ll give you an example. When you go to ChatGPT, a lot of things on the screen are server rendered, but when you start doing interactions as a user, that requires for something like you’d say, “Hey Dali, generate an image.” That stuff requires a lot of optimistic UI. It requires features that are more like what a mobile native application can do. So, we can give folks the best of both worlds: the speed, interactivity, and fluidity of a native app, but we had those, sort of, fundamentals of how a website should work that even Perl and PHP had gotten right, once upon a time. So, I think we found that right blend of utility and flexibility, and folks love it, and I think, yeah, we’re excited to continue to help steward this project as a standard for building on the web. Corey: I really want to thank you for taking the time to talk about a lot of the genesis of this stuff and how you view it, which I think gives us a pretty decent idea of how you’re going to approach the evolution of what you’ve built. If people want to learn more, where’s the best place for them to find you? Guillermo: So, head to vercel.com https://vercel.com to learn about our platform. You can check out v0.dev https://v0.dev, which we’ll be opening broadly to the public soon, if you want to get started with this idea of generative UI. And myself, I’m always tweeting on X, twitter.com https://twitter.com/rauchg or x.com/rauchg https://x.com/rauchg to find me. Corey: One of these days we’ll be able to kick that habit, I hope [laugh]. Guillermo: [laugh]. Yeah. Corey: Thank you so much for being so generous with your time. I appreciate it. Guillermo: Thank you. Corey: Guillermo Rauch, founder and CEO of Vercel, and creator of Next.js. I’m Cloud Economist Corey Quinn, and this is . If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry, insulting comment that will be almost impossible for you to submit because that podcast platform did not pay attention to user experience. Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com https://duckbillgroup.com to get started. __ Vercel: https://vercel.com/ https://vercel.com/ v0.dev: https://v0.dev https://v0.dev Personal website: https://rauchg.com https://rauchg.com Personal twitter: https://twitter.com/rauchg __ Transcript Announcer: Hello, and welcome to with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is . Corey: Welcome to . I’m Corey Quinn. I don’t talk a lot about front-end on this show, primarily because I am very bad at front-end, and in long-standing tech tradition, if I’m not good at something, apparently I’m legally obligated to be dismissive of it and not give it any attention. Strangely enough, I spent the last week beating on some front-end projects, and now I’m not just dismissive, I’m angry about it. Here to basically suffer the outpouring of frustration and confusion is Guillermo Rauch, founder and CEO of Vercel https://vercel.com/, but also the creator of Next.js. Guillermo, thank you for joining me. Guillermo: Great to be here. Thanks for setting me up with that awesome intro. Corey: It’s true, if I were talking to someone who looked at what I’ve done, and for some Godforsaken reason, they wanted to follow in my footsteps, well, that path has been closed, so learning a bunch of Perl early on and translating it all to bad bash scripts and the rest, and then maybe picking up Python isn’t really the way that I would advise someone getting started today. The de facto lingua franca of the internet is JavaScript, whether we like it or not, and I would strongly suggest that be someone’s first language despite the fact that I’m bad at it, I don’t understand it, and therefore it makes me angry. Guillermo: Yeah, it’s so funny because it sounds like my story. And my personal journey was, when I was a kid, I had a—I knew I wanted to hack around with computers, and reverse engineer them, and improve them, and just create my own things, and I had these options for what programming language I could go with. And I tried it all: PHP, Perl, [Mod PHP 00:02:12], [Mod Perl 00:02:13], Apache, LAMP, cgi-bin folders, all the whole nine yards. And regardless of what back-end technology I used, I encountered this striking fact, which was… the thing that can make your product really stand out in a web browser is typically involving JavaScript in some fashion. So, when Google came out with suggestions as you would type in a search box, my young kid Argentinian brain blew up. I was like, “Holy crap, they can suggest, they can read my mind, and they can render suggestions without a full page refresh? What is that magic?” And then more products like that came out. Google Docs, Gmail Chat, Facebook’s real time newsfeed, all the great things on the internet seemed to have this common point of, there’s this layer of interactivity, real-time data, customization, personalization, and it seems uniquely enabled by the front-end. So, I just went all in. I taught myself how to code, I taught myself—I became a front-end engineering expert, I joined some of the early projects that shaped the ecosystem. Like, there was this library called MooTools, and a lot of folks might not have heard that name. It’s in the annals of JavaScript history. And later on, you know, what I realized is, what if front-end can actually be the starting point of how you develop the best applications, right, rather than this thing that people, like, reluctantly frown upon, like yourself. I mention that as an opportunity rather than a dis because when you create a great front-end experience, now the data has proven you run a better business, you run a more dynamic business, probably are running an AI-powered business, like, all of the AI products on the planet today are using this technology to stream text in front of your eyes in real time and do all this awesome things. So yeah, I became obsessed with front-end, and I founded this company Vercel, which is a front-end cloud. So, you come here to basically build the best products. Now, you don’t have to build the back-end, so you can use back-ends that are off the shelf, you can connect to your existing back-ends, and we piggyback on the world’s best infrastructure to make this possible, but we offer developers a very streamlined path to create these awesome products on the internet. Corey: I have to say that I have been impressed when I’ve used Vercel for a number of projects. And what impresses me is less the infrastructure powering it, less the look at how performance it is and all the stuff that most people talk about, but as mentioned, I am not good at front-end or frankly programming at all. And so, many products in this space fall into the very pernicious trap of, “Oh, well, everyone who’s using this is at least this tall on the board of how smart you are to get on the amusement park ride.” So, I feel that when I’m coming at this from a—someone who is not a stranger to computers but is definitely new to this entire ecosystem, everything just made sense in a way that remarkably few products can pull off. I don’t know if you would call that user experience, developer experience, or what the terminology you bias for there is, but it is a transformational difference. Guillermo: Thank you. I think it’s a combination of things. So, developer experience has definitely always been a focus for me. I was that weird person that obsessed about the CLI parameters of the tool, and the output of the tool, and just like how it feels for the engineer. I did combine that with—and I think this is where Vercel really stands out—I did combine that with a world-class infrastructure bit because what I realized after creating lots and lots of very popular open-source projects—like, one is called Socket.io, and other one called Mongoose—DX, or developer experience, in the absence of an enticing carrot for the business doesn’t work. Maybe it has some short term adoption, maybe it has raving fans on Twitter or X [laugh], but at the end of the day, you have to deliver something that’s tangible to the end-user and to the business. [unintelligible 00:06:33] Vercel focusing on the front-end has found a magical combination there of I can make the developer lives easier. Being a developer myself, I just tremendously empathize with that, but it can also make more profit for the business. When they make your website faster and render more dynamic data that serve as better recommendations for a product on e-commerce or in a marketing channel, I can help you roll out more experiments, then they make your business better, and I think that’s one of the magical combinations there. The other thing, frankly, is that we’d started doing fewer things. So, when you come to Vercel, you typically come with a framework that you’ve already chosen. It could be Next.js, it could be View, [unintelligible 00:07:18], there’s 35-plus frameworks. But we basically told the market, you have to use one of these developer tools in order to guide your development. And what companies were doing before—I mean, this almost seems obvious in retrospect, that we would optimize for her certain patterns and certain tools—but what the market was doing before was rolling out your own frameworks. Like, every company was, basically—React, for example, is a very popular way of building interfaces, and our framework actually is built on top of React. But when I would go to and talk to all these principal engineers that all these companies, they were saying, oh yeah, “We’re creating our own framework. We’re creating our own tools.” And I think that to me now feels almost like a zero interest rate phenomenon. Like, what business do you have in creating frameworks, tools, and bespoke infra when you’re really in the business of creating delightful experiences for your customers? Corey: What I think is lost on a lot of folks is that if you are trying to learn something new and use a tool, and the developer experience is bad, the takeaway—at least for me and a lot of people that I talk to is not, “Oh, this tool has terrible ergonomics. That’s it’s problem.” Instead, the takeaway is very much, “Oh, I’m dumb because I don’t understand this thing.” And I know intellectually that I am not usually the dumbest person in the world when it comes to a particular tool or technology, but I keep forgetting that on a visceral level. It’s, “I just wish I was smart enough to understand that.” No, I don’t. I wish it was presented in a way that was more understandable and the documentation was better. When you’re just starting out and building something in your spare time, the infrastructure cost is basically nothing, but your time is the expensive part in it. So, if you have to spend three hours to track down something just because it wasn’t clearly explained, the burden of adopting that tool is challenging. I would argue that one of the reasons that AWS sees some of the success that it does is not necessarily because it’s great so much as because everyone knows how it breaks. That’s important. I’m not saying their infrastructure isn’t world-class—please, don’t come at me in the comments on this one—but I am saying that we know where its sharp edges are, and that means that we’re more comfortable building with it. But the idea of learning a brand-new cloud with different sharp edges in other areas. That's terrifying. I’d rather stick with the devil I know. Guillermo: Exactly. I just think that you’re not going to be able to make a difference for customers in 2023 by creating another bespoke cloud that is general purpose, it doesn’t really optimize around anything, and you have to learn all the sharp edges from scratch. I think we saw that with the rise of cloud-native companies like Stripe and Twilio where they were going after these amazingly huge markets like financial infrastructure or communications infrastructure, but the angle was, “Here’s this awesome developer experience.” And that’s what we’re doing with Vercel for the front-end and for building products, right? There has to be an opinionated developer experience that guides you to success. And I agree with you that there’s really, these days in the developer communities, zero tolerance for sharp edges, and we’ve spent a lot of time in—even documentation, like, it used to be that your startup would make or break it by whether you had great documentation. I think in the age of frameworks, I would even dare say that documentation, of course, is extremely important, but if I can have the tool itself guide you to success, at that point, you’re not even reading documentation. We’re now seeing this with AI and, like, generative AI for code. At Vercel, we’re investing in generative AI for user interfaces. Do you actually need to read documentation at that point? So, I think we’re optimizing for the absolute minimal amount of friction required to be successful with these platforms. Corey: I think that there’s a truth in that of meeting customers in where they are. Your documentation can be spectacular, but people don’t generally read the encyclopedia for fun either. And the idea of that is that—at least ideally—I should not need to go diving into the documentation, and so many tools get this wrong, where, “Oh, I want to set up a new project,” and it bombards you with 50 questions, and each one of these feels pretty… momentous. Like, what one-way door am I passing through that I don’t realize the consequences of until I’m 12 hours into this thing, and then have to backtrack significantly. I like, personally, things that bias for having a golden path, but also make it easy to both deviate from it, as well as get back onto it. Because there’s more than one way to do it is sort of the old Perl motto. That is true in spades in anything approaching the JavaScript universe. Guillermo: Yeah. I have a lot of thoughts on that. On the first point, I completely agree that the golden path of the product cannot be documentation-mediated. One of the things that I’ve become obsessive about—and this is an advice that I share with a lot of other startup founders is, when it comes to your landing page, the primary call to action has to be this golden path to success, like, 2, 3, 4 clicks later, I have to have something tangible. That was our inspiration. And when we made it the primary call to action for Vercel is deploy now. Start now. Get it out now. Ship it now. And the way that you test out the platform is by deploying a template. What do we do is we create a Git repo for you, it sets up the entire CI/CD pipeline, and then at that point, you already have something working, something in the cloud, you spent zero time reading documentation, and you can start iterating. And even though that might not be the final thing you do in Vercel, I always hear the stories of CTOs that are now deploying Vercel at really large scale, and they always tell me, “I started with your hobby tier, I started with free tier, I deployed a template, I hacked on a product during the weekend.” Now, a lot of our AI examples are very popular in this crowd. And yeah, there’s a golden path that requires zero documentation. Now, you also mentioned that, what about complexity? This is an enterprise-grade platform. What about escape hatches? What about flexibility? And that’s where our platform also shines because we have the entire power of a Turing-complete language, which is JavaScript and TypeScript, to customize every aspect of the platform. And you have a framework that actually answered a lot of the problems that came with serverless solutions in the past, which is that you couldn’t run any of that on your local machine. The beauty of Vercel and Next.js is we kind of pioneered this concept that we called ‘Framework Defined Infrastructure.’ You start with the framework, the framework has this awesome property that you can install on your computer, it has a dev command—like, it literally runs on your computer—but then when you push it to the cloud, it now defines the infrastructure. It creates all of the resources that are highly optimal. This creates—basically converts what was a single node system on your computer to a globally distributed system, which is a very complex and difficult engineering challenge, but Vercel completely automates it away. And now for folks that are looking for, like, more advanced solutions, they can now start poking into the outputs of that compilation process and say, “Okay, I can now have an influence or I can reconfigure some aspects of this pipeline.” And of course, if you don’t think about those escape hatches, then the product just ends up being limiting and frustrating, so we had to think really hard about meeting both ends of the spectrum. Corey: In my own experimentations early on with Chat-Gippity—which is how I insist on pronouncing ChatGPT—a lot of what I found was that it was a lot more productive for me to, instead of asking just the question and getting the answer was, write a Python script to— Guillermo: Yes. Corey: Query this API to get me that answer. Because often it would be wrong. Sometimes very convincingly wrong, and I can at least examine it in various ways and make changes to it and iterate forward, whereas when everything is just a black box, that gets very hard to do. The idea of building something that can be iterated on is key. Guillermo: I love that. The way that Vercel actually first introduced itself to the world was this idea of immutable deployments and immutable infrastructure. And immutable sounds like a horrible word because I want to mutate things, but it was inspired by this idea of functional programming where, like, each iteration to the state, each data change, can be tracked. So, every time you deploy in Vercel, you get this unique URL that represents a point-in-time infrastructure deployment. You can go back in time, you can revert, you can use this as a way of collaborating with other engineers in your team, so you can send these hyperlinks around to your front-end projects. And it gives you a lot of confidence. Now, you can iterate knowing that before things go out, there’s a lot of scrutiny, there’s a lot of QA, there’s a lot of testing processes that you can kick off against this serverless infrastructure that was created for each deployment. The conclusion for us so far has been that our role in the world is to increase iteration velocity. So, iteration speed is the faster horse of the cloud, right? Like, instead of getting a car, you get a faster horse. When you say, “Okay, I made the build pipeline 10% faster,” or, “I brought the TLS termination 10% closer to the visitor, and, like, I have more [pops 00:17:10],” things like that. That to me, is the speed. You can do those things, and they’re awesome, but if you don’t have a direction—which is velocity—then you don’t know what you’re building next. You don’t know if your customers are happy. You don’t know if you’re delivering value. So, we built an entire platform that optimizes around, what should you ship next? What is the friction involved in getting your next iteration out? Is launching an experiment on your homepage, for example, is that a costly endeavor? Does it take you weeks? Does it take you months? One of the initial inspirations for just starting Vercel and making deployments really easy was, how difficult is it for the average company to change in their footer of their website is this copyright 2022? And you have—it's a new year. You have to bump it to copyright 2023. How long do you think it takes that engineer to, A, run the stack locally, so they can actually see the change; deploy it, but deploy to what we call the preview environment, so they can grab that URL and send [it to 00:18:15], Corey, and say, “Corey, does it look good? I updated [laugh] I updated the year in the footer.” And then you tell me, “Looks good, let’s ship it to production.” Or you tell me, “No, no, no, it’s risky. Let’s divide it into two cohorts: 50% of traffic gets 2022, 50% of traffic gets 2023.” Obviously, this is a joke, but consider the implications of how difficult it is and the average organization to actually do this thing. [midroll 00:18:41] Corey: Oh, I find things like that all the time, especially on microservices that I built to handle some of my internal workflows here, and I haven’t touched in two or three years. And okay, now it’s time for me to update them to reflect some minor change. And first, I wind up in the screaming node warnings and I have to update things so that they actually work in a reasonable way. And, on some level, making a one-line change can take half a day. Now, in the real world, when people are working on these apps day-in and day-out, it gets a lot easier to roll those changes in over time, but coming back to something unmaintained, that becomes a project the longer you let it sit. Part of me wishes that there were easier ways around it, but there are trade-offs in almost any decision you make. If you’re building something from the beginning of, well, I want to be able to automatically update the copyright year, you can even borderline make that something that automatically happens based upon the global time, whereas when you’re trying to retrofit it afterwards, yeah, it becomes a project. Guillermo: Yeah, and now think, that’s just a simple example of changing a string. That might be difficult for a product engineering in any organization. Or it may be slow, or it may be not as streamlined, or maybe it works really well for the first project that that company created. What about every incremental project thereafter? So, now I said—let’s stop talking about a string, right? Let’s think about you’re about an e-commerce website where what we hear from our customers on average, like, 10% of revenue flows through the homepage. Now, I have to change a primary component that renders on the hero of the page, and I have to collaborate with every department in the organization. I have to collaborate with the design team, I have to collaborate with marketing, I have to collaborate with the business owners to track the analytics appropriately. So, what is the cost of every incremental experiment that you want to put in production? The other thing that’s particularly interesting about front-end as it relates to cloud infrastructure is, scaling up front-end is a very difficult thing. What ends up happening is most front-ends are actually static websites. They’re cached at the edge—or they’re literally statically generated—and then they push all of the dynamism to the client side. So, you end up with this spaghetti of script tags on the client, you end up accumulating a lot of tech debt in the [shipping 00:20:56] huge bundles of JavaScript to the client to try to recover some dynamism, to try and run these experiments. So, everyone is in this, kind of, mess of the yes, maybe we can experiment, but we kind of offloaded the rendering work to the client. That in turn makes me—basically, I’m making the website slower for the visitor. I’m making them do the rendering work. And I’m trying to sell them something. I’m trying to speed up some processes. It’s my responsibility to make it fast. So, what we ended up finding out is that yes, the cloud moved this forward a lot in terms of having these awesome building blocks, these awesome infrastructure primitives, but both in the developer experience, just changing something about your web product and also the end-user experiences, that web product renders really fast, those things really didn’t happen with this first chapter of the cloud. And I think we’re entering a new generation of higher-level clouds like Vercel that are optimizing for these things. Corey: I think that there’s a historical focus on things that have not happened before. And that was painful and terrible, so we’re not going to be focusing on what’s happening in the future, we’re going to build a process or a framework or something that winds up preventing that thing that hurt us from hurting us again. Now, that’s great in moderation, but at some point—we see this at large companies from time-to-time—where you have so much process that is ossified scar tissue, basically, that it becomes almost impossible to get things done. Because oh, I want to make that—for example, that one-line change to a copyright date, well, here’s the 5000 ways deploys have screwed us before, so we need to have three humans sign off on the one-line change, and a bunch of other stuff before it ever sees the light of day. Now, I’m exaggerating slightly, just as you are, but that feels like it acts as a serious brake on innovation. On the exact opposite side, where we see massive acceleration has been around the world of generative AI. Yes, it is massively hyped in a bunch of ways. I don’t think it is going to be a defined way that changes the nature of humanity the way that some of these people are going after, but it’s also clearly more than a parlor trick. Guillermo: I’m kind of in that camp. So, like you, I’ve been writing code for many years. I’m pretty astonished by the AI’s ability to enhance my output. And of course, now I’m not writing code full time, so there is a sense of, okay because I don’t have time, because I’m doing a million things, any minute I have seems like AI has just made it so much more worthwhile and I can squeeze so much more productivity out of it. But one of the areas that I’m really excited about is this idea of generative UI, which is not just autocompleting code in a text editor, but is the idea that you can use natural language to describe an interface and have the AI generate that interface. So, Vercel created this product called v0—you can check it out at v0.dev https://v0.dev—where to me, it’s really astonishing that you can get these incredibly high quality user interfaces, and basically all you have to do is input [laugh] a few English words. I have this personal experience of, I’ve been learning JavaScript and perfecting all my knowledge around it for, like, 20 or so years. I created Next.js. And Next.js itself powers a lot of these AI products. Like the front-end of ChatGPT is built on Next.js. And I used v0 to create… to basically recreate my blog. Like, I created rauchg.com https://rauchg.com, I deployed it on Vercel, but every pixel of that UI, I handcrafted. And as we were working on v0, I said okay, “I’m going to challenge myself to put myself back in the shoes of, like, I’m going to redesign this and I’m going to start over with just human language.” Not only did I arrive to the right look and feel of what I wanted to get, the code that it produced was better than I would have written by hand. Concretely, it was more accessible. So, there were areas of the UI where, like, some icons were rendered where I had not filled in those gaps. I just didn’t know how to do that. The AI did. So, I really believe that AI will transform our lives as [laugh] programmers, at least I think, in many other areas in very profound ways. Corey: This is very similar to a project that I’ve been embarked on for the last few days where I described the app I wanted into Chat-Gippity and follow the instructions, and first, it round up point—sending me down a rabbit hole of the wrong Framework version that had been deprecated, and whatnot, and then I brought it all into VS Code where Jif-Ub Copilot, it kept switching back and forth between actively helpful, and ooh, the response matches publicly available code, so I’m not going to tell you the answer, despite the fact that feature has never been enabled on my account. So yeah, of course, it matches publicly available code. This is quite literally the React tutorial starter project. And it became incredibly frustrating, but it also would keep generating things in bursts, so my code is not at all legible or well organized or consistent for that matter. But it’s still better than anything I’d be able to write myself. I’m looking forward to using v0 or something like it to see how that stacks up for some of my ridiculous generation ideas for these things. Guillermo: Yeah, you touched on a very important point is, the code has to work. The code has to be shippable. I think a lot of AI products have gotten by by giving you an approximation of the result, right? Like, they hallucinate sometimes, they get something wrong. It’s still very helpful because sometimes it’s sending you the right direction. But for us, the bar is that these things have to produce code that’s useful, and that you can ship, and that you can iterate on. So, going back to that idea of iteration velocity, we call it v0 because we wanted it to be the first version. We still very much believe there is humans in the loop and folks will be iterating a lot on the initial draft that this thing is giving you, but it’s so much better than starting with an empty code editor, [laugh] right? Like, and this applies, by the way to, like, not just new projects, but I always talk about, like, our customers have a few really important landing pages, key pages, maybe it’s the product detail page in e-commerce, maybe it’s your homepage and, like, your key product pages for a marketing website. Maybe it’s where—and the checkout, for example, extremely important. But then there’s a lot of incremental UIs that you have to add every single day. The banner for [laugh] accepting cookies or not, the consent management dialog. There’s a lot of things that the worst case scenario is that you offload them again to some third-party script, to some iframe of sorts because you really don’t have the bandwidth, time, or resources to build it yourself. And then you sacrifice speed, you sacrifice brand fidelity. And again because we’re the front-end cloud, we’re obsessed with your ability to ship UI that’s native to your product, that is a streamline, that works really well. So, I think AI is going to have a significant effect there where a lot of things where you were sending someone to some other website because you just didn’t have the bandwidth to create that UI, you can now own the experience end to end. Corey: That is no small thing. A last question I have, before we wind up calling this episode is, there was a period of time—I don’t know if we’re still in it or not—where it felt like every time I got up to get a cup of coffee and came back, there would be three JavaScript frameworks that launched during that interim. So, Next.js was at 1.1 of those when someone got up to get a cup of coffee. But that’s shown a staying power that is, frankly, remarkable. Why? I don’t know enough about the ecosystem to have an opinion on that, but I noticed when things stand out, and Next does. Guillermo: Yeah, I think it’s a number of factors. Number one, we, as an industry I think, we coalesced, and we found the right engine to build our car. And that engine became React. Most folks building UI today are choosing React or a similar engine, but React has really become the gold standard for a lot of engineers. Now, what ended up happening next is that people realized I want a car. I want the full product. I need to drive. I don’t want to assemble this freaking car every single time I have a new project. And Next.js filled a very important gap in the world where what you were looking for was not a library; what you were looking for is a framework that has opinions, but those opinions are very in line with how the web is supposed to work. We took a page from, basically, the beginnings of the web. We make a lot of jokes that in many ways, our inspiration was PHP, where server rendering is the default, where it’s very expressive, it’s very easy to reach for data. It just works for a lot of people. Again, that’s the old [stack 00:30:03] in the olden days. And so, it obviously didn’t quite work, but the inspiration was, can we make something that is a streamline for creating web interfaces at scale? At scale. And to your point, there’s also a sense of, like, maybe it doesn’t make sense anymore to build all this infrastructure from scratch every single time I started a project. So, Next filled in that gap. The other thing we did really well, I think, is that we gave people a universal model for how to use not just the server side, but also the client side strategically. So, I’ll give you an example. When you go to ChatGPT, a lot of things on the screen are server rendered, but when you start doing interactions as a user, that requires for something like you’d say, “Hey Dali, generate an image.” That stuff requires a lot of optimistic UI. It requires features that are more like what a mobile native application can do. So, we can give folks the best of both worlds: the speed, interactivity, and fluidity of a native app, but we had those, sort of, fundamentals of how a website should work that even Perl and PHP had gotten right, once upon a time. So, I think we found that right blend of utility and flexibility, and folks love it, and I think, yeah, we’re excited to continue to help steward this project as a standard for building on the web. Corey: I really want to thank you for taking the time to talk about a lot of the genesis of this stuff and how you view it, which I think gives us a pretty decent idea of how you’re going to approach the evolution of what you’ve built. If people want to learn more, where’s the best place for them to find you? Guillermo: So, head to vercel.com https://vercel.com to learn about our platform. You can check out v0.dev https://v0.dev, which we’ll be opening broadly to the public soon, if you want to get started with this idea of generative UI. And myself, I’m always tweeting on X, twitter.com https://twitter.com/rauchg or x.com/rauchg https://x.com/rauchg to find me. Corey: One of these days we’ll be able to kick that habit, I hope [laugh]. Guillermo: [laugh]. Yeah. Corey: Thank you so much for being so generous with your time. I appreciate it. Guillermo: Thank you. Corey: Guillermo Rauch, founder and CEO of Vercel, and creator of Next.js. I’m Cloud Economist Corey Quinn, and this is . If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry, insulting comment that will be almost impossible for you to submit because that podcast platform did not pay attention to user experience. Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com https://duckbillgroup.com to get started.

33m
Dec 21, 2023
How Tailscale Builds for Users of All Tiers with Maya Kaczorowski

Maya Kaczorowski, Chief Product Officer at Tailscale, joins Corey on Screaming in the Cloud to discuss what sets the Tailscale product approach apart, for users of their free tier all the way to enterprise. Maya shares insight on how she evaluates feature requests, and how Tailscale’s unique architecture sets them apart from competitors. Maya and Corey discuss the importance of transparency when building trust in security, as well as Tailscale’s approach to new feature roll-outs and change management. ABOUT MAYA Maya is the Chief Product Officer at Tailscale, providing secure networking for the long tail. She was mostly recently at GitHub in software supply chain security, and previously at Google working on container security, encryption at rest and encryption key management. Prior to Google, she was an Engagement Manager at McKinsey & Company, working in IT security for large enterprises. Maya completed her Master's in mathematics focusing on cryptography and game theory. She is bilingual in English and French. Outside of work, Maya is passionate about ice cream, puzzling, running, and reading nonfiction. LINKS REFERENCED: __ Tailscale: https://tailscale.com/ https://tailscale.com/ Tailscale features:__ __ Tailscale on AWS Marketplace: https://aws.amazon.com/marketplace/pp/prodview-nd5zazsgvu6e6  __ Transcript Announcer: Hello, and welcome to with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is . Corey: Welcome to . I’m Corey Quinn, and I am joined today on this promoted guest episode by my friends over at Tailscale https://tailscale.com/. They have long been one of my favorite products just because it has dramatically changed the way that I interact with computers, which really should be enough to terrify anyone. My guest today is Maya Kaczorowski, Chief Product Officer at Tailscale. Maya, thanks for joining me. Maya: Thank you so much for having me. Corey: I have to say originally, I was a little surprised to—“Really? You’re the CPO? I really thought I would have remembered that from the last time we hung out in person.” So, congratulations on the promotion. Maya: Thank you so much. Yeah, it’s exciting. Corey: Being a product person is probably a great place to start with this because we’ve had a number of conversations, here and otherwise, around what Tailscale is and why it’s awesome. I don’t necessarily know that beating the drum of why it’s so awesome is going to be covering new ground, but I’m sure we’re going to come up for that during the conversation. Instead, I’d like to start by talking to you about just what a product person does in the context of building something that is incredibly central not just to critical path, but also has massive security ramifications as well, when positioning something that you’re building for the enterprise. It’s a very hard confluence of problems, and there are days I am astonished that enterprises can get things done based purely upon so much of the mitigation of what has to happen. Tell me about that. How do you even function given the tremendous vulnerability of the attack surface you’re protecting? Maya: Yeah, I don’t know if you—I feel like you’re talking about the product, but also the sales cycle of talking [laugh] and working with enterprise customers. Corey: The product, the sales cycle, the marketing aspects of it, and— Maya: All of it. Corey: —it all ties together. It’s different facets of frankly, the same problem. Maya: Yeah. I think that ultimately, this is about really understanding who the customer that is buying the product is. And I really mean that, like, buying the product, right? Because, like, look at something like Tailscale. We’re typically used by engineers, or infrastructure teams in an organization, but the buyer might be the VP of Engineering, but it might be the CISO, or the CTO, or whatever, and they’re going to have a set of requirements that’s going to be very different from what the end-user has as a set of requirements, so even if you have something like bottom-up adoption, in our case, like, understanding and making sure we’re checking all the boxes that somebody needs to actually bring us to work. Enterprises are incredibly demanding, and to your point, have long checklists of what they need as part of an RFP or that kind of thing. I find that some of the strictest requirements tend to be in security. So like, how—to your point—if we’re such a critical part of your network, how are you sure that we’re always available, or how are you sure that if we’re compromised, you’re not compromised, and providing a lot of, like, assurances and controls around making sure that that’s not the case. Corey: I think that there’s a challenge in that what enterprise means to different people can be wildly divergent. I originally came from the school of obnoxious engineering where oh, as an engineer, whenever I say something is enterprise grade, that’s not a compliment. That means it’s going to be slow and moribund. But that is a natural consequence of a company’s growth after achieving success, where okay, now we have actual obligations to customers and risk mitigation that needs to be addressed. And how do you wind up doing that without completely hobbling yourself when it comes to accelerating feature velocity? It’s a very delicate balancing act. Maya: Yeah, for sure. And I think you need to balance, to your point, kind of creating demand for the product—like, it’s actually solving the problem that the customer has—versus checking boxes. Like, I think about them as features, or you know, feature requests versus feature blockers or deal blockers or adoption blockers. So, somebody wants to, say, connect to an AWS VPC, but then the person who has to make sure that that’s actually rolled out properly also wants audit logs and SSH session recording and RBAC-based controls and lots of other things before they’re comfortable deploying that in their environment. And I’m not even talking about the list of, you know, legal, kind of, TOS requirements that they would have for that kind of situation. I think there’s a couple of things that you need to do to even signal that you’re in that space. One of the things that I was—I was talking to a friend of mine the other day how it feels like five years ago, like, nobody had SOC 2 reports, or very few startups had SOC 2 reports. And it’s probably because of the advent of some of these other companies in this space, but like, now you can kind of throw a dart, and you’ll hit five startups that have SOC 2 reports, and the amount that you need to show that you’re ready to sell to these companies has changed. Corey: I think that there’s a definite broadening of the use case. And I’ve been trying to avoid it, but let’s go diving right into it. I used to view Tailscale as, oh it’s a VPN. The end. Then it became something more where it effectively became the mesh overlay where all of the various things that I have that speak Tailscale—which is frankly, a disturbing number of things that I’d previously considered to be appliances—all talk to one another over a dedicated network, and as a result, can do really neat things where I don’t have to spend hours on end configuring weird firewall rules. It’s more secure, it’s a lot simpler, and it seems like every time I get that understanding down, you folks do something that causes me to yet again reevaluate where you stand. Most recently, I was doing something horrifying in front-end work, and in VS Code the Tailscale extension popped up. “Oh, it looks like you’re running a local development server. Would you like to use Tailscale Funnel to make it available to the internet?” And my response to that is, “Good lord, no, I’m ashamed of it, but thanks for asking.” Every time I think I get it, I have to reevaluate where it stands in the ecosystem. What is Tailscale now? I feel like I should get the official description of what you are. Maya: Well, I sure hope I’m not the official description. I think the closest is a little bit of what you’re saying: a mesh overlay network for your infrastructure, or a programmable network that lets you mesh together your users and services and services and services, no matter where they are, including across different infrastructure providers and, to your point, on a long list of devices you might have running. People are running Tailscale on self-driving cars, on robots, on satellites, on elevators, but they’re also running Tailscale on Linux running in AWS or a MacBook they have sitting under their desk or whatever it happens to be. The phrase that I like to use for that is, like, infrastructure agnostic. We’re just a building block. Your infrastructure can be whatever infrastructure you want. You can have the cheapest GPUs from this cloud, or you can use the Android phone to train the model that you have sitting on your desk. We just help you connect all that stuff together so you can build your own cloud whatever way you want. To your point, that’s not really a VPN [laugh]. The word VPN doesn’t quite do it justice. For the remote access to prod use case, so like a user, specifically, like, a developer infra team to a production network, that probably looks the most like a zero-trust solution, but we kind of blur a lot of the lines there for what we can do. Corey: Yeah, just looking at it, at the moment, I have a bunch of Raspberries Pi, perhaps, hanging out on my tailnet. I have currently 14 machines on there, I have my NAS downstairs, I have a couple of EC2 instances, a Google Cloud instance, somewhere, I finally shut down my old Oracle Cloud instance, my pfSense box speaks it natively. I have a Thinkst Canary hanging out on there to detect if anything starts going ridiculously weird, my phone, my iPad, and a few other things here and there. And they all just talk seamlessly over the same network. I can identify them via either IP address, if I’m old, or via DNS if I want to introduce problems that will surprise me at one point or another down the road. I mean, I even have an exit node I share with my brother’s Tailscale account for reasons that most people would not expect, namely that he is an American who lives abroad. So, many weird services like banks or whatnot, “Oh, you can’t log in to check your bank unless you’re coming from US IP space.” He clicks a button, boom, now he doesn’t get yelled at to check his own accounts. Which is probably not the primary use case you’d slap on your website, but it’s one of those solving everyday things in somewhat weird ways. Maya: Oh, yeah. I worked at a bank maybe ten years ago, and they would block—this little bank on the east coast of the US—they would block connections from Hawaii because why would any of your customers ever be in Hawaii? And it was like, people travel and maybe you’re— Corey: How can you be in Hawaii? You don’t have a passport. Maya: [laugh]. People travel. They still need to do banking. Like, it doesn’t change, yeah. The internet, we’ve built a lot of weird controls that are IP-based, that don’t really make any sense, that aren’t reflective. And like, that’s true for individuals—like you’re describing, people who travel and need to bank or whatever they need to do when they travel—and for corporations, right? Like the old concept—this is all back to the zero trust stuff—but like, the old concept that you were trusted just because you had an IP address that was in the corp IP range is just not true anymore, right? Somebody can walk into your office and connect to the Wi-Fi and a legitimate employee can be doing their job from home or from Starbucks, right? Those are acceptable ways to work nowadays. Corey: One other thing that I wanted to talk about is, I know that in previous discussions with you folks—sometimes on the podcast sometimes when I more or less corner someone a Tailscale at your developer conference—one of the things that you folks talk about is Tailscale SSH, which is effectively a drop-in replacement for the SSH binary on systems. Full disclosure, I don’t use it, mostly because I’m grumpy and I’m old. I also like having some form of separation of duties where you’re the network that ties it all together, but something else winds up acting as that authentication step. That said, if I were that interesting that someone wanted to come after me, there are easier ways to get in, so I’m mostly just doing this because I’m persnickety. Are you seeing significant adoption of Tailscale SSH? Maya: I think there’s a couple of features that are missing in Tailscale SSH for it to be as adopted by people like you. The main one that I would say is—so right now if you use Tailscale SSH, it runs a binary on the host, you can use your Tailscale credentials, and your Tailscale private key, effectively, to SSH something else. So, you don’t have to manage a separate set of SSH keys or certs or whatever it is you want to do to manage that in your network. Your identity provider identity is tied to Tailscale, and then when you connect to that device, we still need to have an identity on the host itself, like in Unix. Right now, that’s not tied to Tailscale. You can adopt an identity of something else that’s already on the host, but it’s not, like, corey@machine. And I think that’s the number one request that we’re getting for Tailscale SSH, to be able to actually generate or tie to the individual users on the host for an identity that comes from, like, Google, or GitHub, or Okta, or something like that. I’m not hearing a lot of feedback on the security concerns that you’re expressing. I think part of that is that we’ve done a lot of work around security in general so that you feel like if Tailscale were to be compromised, your network wouldn’t need to be compromised. So, Tailscale itself is end-to-end encrypted using WireGuard. We only see your public keys; the private keys remain on the device. So, in some sense the, like, quote-unquote, “Worst” that we could do would be to add a node to your network and then start to generate traffic from that or, like, mess with the configuration of your network. These are questions that have come up. In terms of adding nodes to your network, we have a feature called tailnet lock that effectively lets you sign and verify that all the nodes on your network are supposed to be there. One of the other concerns that I’ve heard come up is, like, what if the binary was compromised. We develop in open-source so you can see that that’s the case, but like, you know, there’s certainly more stuff we could be doing there to prevent, for example, like a software supply chain security attack. Yeah. Corey: Yeah, but you also have taken significant architectural steps to ensure that you are not placed in a position of undue trust around a lot of these things. Most recently, you raised a Series B, that was $100 million, and the fact that you have not gone bankrupt in the year since that happened tells me that you are very clearly not routing all customer traffic through you folks, at least on one of the major cloud providers. And in fact, a little bit of playing a-slap-and-tickle with Wireshark affirm this, that the nodes talk to each other; they do not route their traffic through you folks, by design. So one, great for the budget, I have respect for that data transfer pattern, but also it means that you are in the position of being a global observer in a way that can be, in many cases, exploited. Maya: I think that’s absolutely correct. So, it was 18 months ago or so that we raised our Series B. When you use Tailscale, your traffic connects peer-to-peer directly between nodes on your network. And that has a couple of nice properties, some of what you just described, which is that we don’t see your traffic. I mean, one, because it’s end-to-end encrypted, but even if we could capture it, and then—we’re not in the way of capturing it, let alone decrypting it. Another nice property it has is just, like, latency, right? If your user is in the UK, and they’re trying to access something in Scotland, it’s not, you know, hair-pinning, bouncing all the way to the West Coast or something like that. It doesn’t have to go through one of our servers to get there. Another nice property that comes with that is availability. So, if our network goes down, if our control plane goes down, you’re temporarily not able to add nodes or change your configuration, but everything in your network can still connect to each other, so you’re not dependent on us being online in order for your network to work. And this is actually coming up more and more in customer conversations where that’s a differentiator for us versus a competitor. Different competitors, also. There’s a customer case study on our website about somebody who was POC’ing us with a different option, and literally during the POC, the competitor had an outage, unfortunately for them, and we didn’t, and they sort of looked at our model, our deployment model and went, “Huh, this really matters to us.” And not having an outage on our network with this solution seems like a better option. Corey: Yeah, when the network is down, the computers all turn into basically space heaters. Maya: [laugh]. Yeah, as long as they’re not down because, I guess, unplugged or something. But yeah, [laugh] I completely agree. Yeah. But I think there’s a couple of these kinds of, like, enterprise things that people are—we’re starting to do a better job of explaining and meeting customers where they are, but it’s also people are realizing actually does matter when you’re deploying something at this scale that’s such a key part of your network. So, we talked a bit about availability, we talked a bit about things like latency. On the security side, there’s a lot that we’ve done around, like I said, tailnet lock or that type of thing, but it’s like some of the basic security features. Like, when I joined Tailscale, probably the first thing I shipped in some sense as a PM was a change log. Here’s the change log of everything that we’re shipping as part of these releases so that you can have confidence that we’re telling you what’s going on in your network, when new features are coming out, and you can trust us to be part of your network, to be part of your infrastructure. Corey: I do want to further call out that you have a—how should I frame this—a typically active security notification page. Maya: [laugh]. Corey: And I think it is easy to misconstrue that as look at how terrifyingly insecure this is? Having read through it, I would argue that it is not that you are surprisingly insecure, but rather that you are extraordinarily transparent about things that are relatively minor issues. And yes, they should get fixed, but, “Oh, that could be a problem if six other things happen to fall into place just the right way.” These are not security issues of the type, “Yeah, so it turns out that what we thought was encrypting actually wasn’t and we’re just expensive telnet.” No, there’s none of that going on. It’s all been relatively esoteric stuff, but you also address it very quickly. And that is odd, as someone who has watched too many enterprise-facing companies respond to third-party vulnerability reports with rather than fixing the problem, more or less trying to get them not to talk about it, or if they do, to talk about it only using approved language. I don’t see any signs of that with what you’ve done there. Was that a challenging internal struggle for you to pull off? Maya: I think internally, it was recognizing that security was such an important part of our value proposition that we had to be transparent. But once we kind of got past that initial hump, we’ve been extremely transparent, as you say. We think we can build trust through transparency, and that’s the most important thing in how we respond to security incidents. But code is going to have bugs. It’s going to have security bugs. There’s nothing you can do to prevent that from happening. What matters is how you—and like, you should. Like, you should try to catch them early in the development process and, you know, shift left and all that kind of stuff, but some things are always going to happen [laugh] and what matters in that case is how you respond to them. And having another, you know, an app update that just says “Bug fixes” doesn’t help you figure out whether or not you should actually update, it doesn’t actually help you trust us. And so, being as public and as transparent as possible about what’s actually happening, and when we respond to security issues and how we respond to security issues is really, really important to us. We have a policy that talks about when we will publish a bulletin. You can subscribe to our bulletins. We’ll proactively email anyone who has a security contact on file, or alternatively, another contact that we have if you haven’t provided us a security contact when you’re subject to an issue. I think by far and large, like, Tailscale has more security bulletins just because we’re transparent about them. It’s like, we probably have as many bugs as anybody else does. We’re just lucky that people report them to us because they see us react to them so quickly, and then we’re able to fix them, right? It’s a net positive for everyone involved. Corey: It’s one of those hard problems to solve for across the board, just because I’ve seen companies in the past get more or less brutalized by the tech press when they have been overly transparent. I remember that there was a Reuters article years ago about Slack, for example, because they would pull up their status history and say, “Oh, look at all of these issues here. You folks can’t keep your website up.” But no, a lot of it was like, “Oh, file uploads for a small subset of our users is causing a problem,” and so on and so forth. These relatively minor issues that, in aggregate, are very hard to represent when you’re using traffic light signaling. So, then you see people effectively going full-on AWS status page where there’s a significant outage lasting over a day, last month, and what you see on this is if you go really looking for it is this yellow thing buried in his absolute sea of green lights, even though that was one of the more disruptive things to have happened this year. So, it’s a consistent and constant balance, and I really have a lot of empathy no matter where you wind up landing on that? Maya: Yeah, I think that’s—you’re saying it’s sort of about transparency or being able to find the right information. I completely agree. And it’s also about building trust, right? If we set expectations as to how we will respond to these things then we consistently respond to them, people believe that we’re going to keep doing that. And that is almost more important than, like, committing to doing that, if that makes any sense. I remember having a conversation many years ago with an eng manager I worked with, and we were debating what the SLO for a particular service should be. And he sort of made an interesting point. He’s like, “It doesn’t really matter what the SLO is. It matters what you actually do because then people are going to start expecting [laugh] what you actually do.” So, being able to point at this and say, “Yes, here’s what we say and here’s what we actually do in practice,” I think builds so much more trust in how we respond to these kinds of things and how seriously we take security. I think one of the other things that came out of the security work is we realized—and I think you talked to Avery, the CEO of Tailscale on a prior podcast about some of this stuff—but we realized that platforms are broken, and we don’t have a great way of pushing automatic updates on a lot of platforms, right? You know, if you’re using the macOS store, or the Android Play Store, or iOS or whatever, you can automatically update your client when there is a security issue. On other platforms, you’re kind of stuck. And so, as a result of us wanting to make sure that the fleet is as updated as possible, we’ve actually built an auto-update feature that’s available on all of our major clients now, so people can opt in to getting those updates as quickly as needed when there is a security issue. We want to expose people to as little risk as possible. Corey: I am not a Tailscale customer. And that bugs me because until I cross that chasm into transferring $1 every month from my bank account to yours, I’m just a whiny freeloader in many respects, which is not at all how you folks who never made me feel I want to be very clear on that. But I believe in paying for the services that empower me to do my job more effectively, and Tailscale absolutely qualifies. Maya: Yeah, understood, I think that you still provide value to us in ways that aren’t your data, but then in ways that help our business. One of them is that people like you tend to bring Tailscale to work. They tend to have a good experience at home connecting to their Synology, helping their brother connect to his bank account, whatever it happens to be, and they go, “Oh.” Something kind of clicks, and then they see a problem at work that looks very similar, and then they bring it to work. That is our primary path of adoption. We are a bottom-up adoption, you know, product-led growth product [laugh]. So, we have a blog post called “How Our Free Plan Stays Free” that covers some of that. I think the second thing that I don’t want to undersell that a user like you also does is, you have a problem, you hit an issue, and you write into support, and you find something that nobody else has found yet [laugh]. Corey: I am very good at doing that entirely by accident. Maya: [laugh]. But that helps us because that means that we see a problem that needs to get fixed, and we can catch it way sooner than before it’s deployed, you know, at scale, at a large bank, and you know, it’s a critical, kind of, somebody’s getting paged kind of issue, right? We have a couple of bugs like that where we need, you know, we need a couple of repros from a couple different people in a couple different situations before we can really figure out what’s going on. And having a wide user base who is happy to talk to us really helps us. Corey: I would say it goes beyond that, too. I have—I see things in the world of Tailscale that started off as features that I requested. One of the more recent ones is, it is annoying to me to see on the Tailscale machines list everything I have joined to the tailnet with that silly little up arrow next to it of, “Oh, time to go back and update Tailscale to the latest,” because that usually comes with decent benefits. Great, I have to go through iteratively, or use Ansible, or something like that. Well, now there’s a Tailscale update option where it will keep itself current on supported operating systems. For some unknown reason, you apparently can’t self-update the application on iOS or macOS. Can’t imagine why. But those things tend to self-update based upon how the OS works due to all the sandboxing challenges. The only challenge I’ve got now is a few things that are, more or less, embedded devices that are packaged by the maintainer of that embedded system, where I’m beholden to them. Only until I get annoyed enough to start building a CI/CD system to replace their package. Maya: I can’t wait till you build that CI/CD system. That’ll be fun. Corey: “We wrote this code last night. Straight to the bank with it.” Yeah, that sounds awesome. Maya: [laugh] You’d get a couple of term sheets for that, I’m sure. Corey: There are. I am curious, looping back to the start of our conversation, we talked about enterprise security requirements, but how do you address enterprise change management? I find that that’s something an awful lot of companies get dreadfully wrong. Most recently and most noisily on my part is Slack, a service for which I paid thousands of dollars a year, decided to roll out a UI redesign that, more or less, got in the way of a tremendous number of customers and there was no way to stop it or revert it. And that made me a lot less likely to build critical-flow business processes that depended upon Slack behaving a certain way. Just, “Oh, we decided to change everything in the user interface today just for funsies.” If Microsoft pulled that with Excel, by lunchtime they’d have reverted it because an entire universe of business users would have marched on Redmond to burn them out otherwise. That carries significant cost for businesses. Yet I still see Tailscale shipping features just as fast as you ever have. How do you square that circle? Maya: Yeah. I think there’s two different kinds of change management really, which is, like—because if you think about it, it’s like, an enterprise needs a way to roll out a product or a feature internally and then separately, we need a way to roll out new things to customers, right? And so, I think on the Tailscale side, we have a change log that tells you about everything that’s changing, including new features, and including changes to the client. We update that religiously. Like, it’s a big deal, if something doesn’t make it the day that it’s supposed to make it. We get very kind of concerned internally about that. A couple of things that were—that are in that space, right, we just talked about auto-updates to make it really easy for you to maintain what’s actually rolled out in your infrastructure, but more importantly, for us to push changes with a new client release. Like, for example, in the case of a security incident, we want to be able to publish a version and get it rolled out to the fleet as quickly as possible. Some of the things that we don’t have here, but although I hear requests for is the ability to, like, gradually roll out features to a customer. So like, “Can we change the configuration for 10% of our network and see if anything breaks before rolling back, right before rolling forward.” That’s a very traditional kind of infra change management thing, but not something I’ve ever seen in, sort of, the networking security space to this degree, and something that I’m hearing a lot of customers ask for. In terms of other, like, internal controls that a customer might have, we have a feature called ACL Tests. So, if you’re going to change the configuration of who can access what in your network, you can actually write tests. Like, your permission file is written in HuJSON and you can write a set of things like, Corey should be able to access prod. Corey should not be able to access test, or whatever it happens to be—actually, let’s flip those around—and when you have a policy change that doesn’t pass those tests, you actually get told right away so you’re not rolling that out and accidentally breaking a large part of your network. So, we built several things into the product to do it. In terms of how we notify customers, like I said, that the primary method that we have right now is something like a change log, as well as, like, security bulletins for security updates. Corey: Yeah, it’s one of the challenges, on some level, of the problem of oh, I’m going to set up a service, and then I’m going to go sail around the world, and when I come back in a year or two—depending on how long I spent stranded on an island somewhere—now I get to figure out what has changed. And to your credit, you have to affirmatively enable all of the features that you have shipped, but you’ve gone from, “Oh, it’s a mesh network where everything can talk to each other,” to, “I can use an exit node from that thing. Oh, now I can seamlessly transfer files from one node to another with tail drop,” to, “Oh, Tailscale Funnel. Now, I can expose my horrifying developer environment to the internet.” I used that one year to give a talk at a conference, just because why not? Maya: [crosstalk 00:27:35]. Corey: Everything evolves to become [unintelligible 00:27:37] email on Microsoft Outlook, or tries to be Microsoft Excel? Oh, no, no. I want you to be building Microsoft PowerPoint for me. And we eventually get there, but that is incredibly powerful functionality, but also terrifying when you think you have a handle on what’s going on in a large-scale environment, and suddenly, oh, there’s a whole new vector we need to think about. Which is why your—the thought and consideration you put into that is so apparent and so, frankly, welcome. Maya: Yeah, you actually kind of made a statement there that I completely missed, which is correct, which is, we don’t turn features on by default. They are opt-in features. We will roll out features by default after they’ve kind of baked for an incredibly long period of time and with, like, a lot of fanfare and warning. So, the example that I’ll give is, we have a DNS feature that was probably available for maybe 18 months before we turned it on by default for new tailnets. So didn’t even turn it on for existing folks. It’s called Magic DNS. We don’t want to touch your configuration or your network. We know people will freak out when that happens. Knowing, to your point, that you can leave something for a year and come back, and it’s going to be the same is really important. For everyone, but for an enterprise customer as well. Actually, one other thing to mention there. We have a bunch of really old versions of clients that are running in production, and we want them to keep working, so we try to be as backward compatible as possible. I think the… I think we still have clients from 2019 that are running and connecting to corp that nobody’s updated. And like, it’d be great if they would update them, but like, who knows what situation they’re in and if they can connect to them, and all that kind of stuff, but they still work. And the point is that you can have set it up four years ago, and it should still work, and you should still be able to connect to it, and leave it alone and come back to it in a year from now, and it should still work and [laugh] still connect without anything changing. That’s a very hard guarantee to be able to make. Corey: And yet, somehow you’ve been able to do that, just from the perspective of not—I’ve never yet seen you folks make a security-oriented decision that I’m looking at and rolling my eyes and amazed that you didn’t make the decision the other way. There are a lot of companies that while intending very well have done, frankly, very dumb things. I’ve been keeping an eye on you folks for a long time, and I would have caught that in public. I just haven’t seen anything like that. It’s kind of amazing. Last year, I finally took the extraordinary step of disabling SSH access anywhere except the tailnet to a number of my things. It lets my logs fill up a lot less, and you’ve built to that level of utility-like reliability over the series of longtime experimentation. I have yet to regret having Tailscale in the mix, which is, frankly, not something I can say about almost any product. Maya: Yeah. I’m very proud to hear that. And like, maintaining that trust—back to a lot of the conversation about security and reliability and stuff—is incredibly important to us, and we put a lot of effort into it. Corey: I really appreciate your taking the time to talk to me about how things continue to evolve over there. Anything that’s new and exciting that might have gotten missed? Like, what has come out in, I guess, the last six months or so that are relevant to the business and might be useful for people looking to use it themselves? Maya: I was hoping you’re going to ask me what came out in the last, you know, 20 minutes while we were talking, and the answer is probably nothing, but you never know. But [laugh]— Corey: With you folks, I wouldn’t doubt it. Like, “Oh, yeah, by the way, we had to do a brand treatment redo refresh,” or something on the website? Why not? It now uses telepathy just because. Maya: It could, that’d be pretty cool. No, I mean, lots has gone on in the last six months. I think some of the things that might be more interesting to your listeners, we’re now in the AWS Marketplace, so if you want to purchase Tailscale through AWS Marketplace, you can. We have a Kubernetes operator that we’ve released, which lets you both ingress and egress from a Kubernetes cluster to things that are elsewhere in the world on other infrastructure, and also access the Kubernetes control plane and the API server via Tailscale. I mentioned auto-updates. You mentioned the VS Code extension. That’s amazing, the fact that you can kind of connect directly from within VS Code to things on your tailnet. That’s a lot of the exciting stuff that we’ve been doing. And there’s boring stuff, you know, like audit log streaming, and that kind of stuff. But it’s good. Corey: Yeah, that stuff is super boring until suddenly, it’s very, very exciting. And those are not generally good days. Maya: [laugh]. Yeah, agreed. It’s important, but boring. But important. Corey: [laugh]. Well, thank you so much for taking the time to talk through all the stuff that you folks are up to. If people want to learn more, where’s the best place for them to go to get started? Maya: tailscale.com https://tailscale.com is the best place to go. You can download Tailscale from there, get access to our documentation, all that kind of stuff. Corey: Yeah, I also just want to highlight that you can buy my attention but never my opinion on things and my opinion on Tailscale remains stratospherically high, so thank you for not making me look like a fool, by like, “Yes. And now we’re pivoting to something horrifying is a business model and your data.” Thank you for not doing exactly that. Maya: Yeah, we’ll keep doing that. No, no, blockchains in our future. Corey: [laugh]. Maya Kaczorowski, Chief Product Officer at Tailscale. I’m Cloud Economist Corey Quinn, and this is . This episode has been brought to us by our friends at Tailscale. If you enjoyed this episode, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, insulting comment that will never actually make it back to us because someone screwed up a firewall rule somewhere on their legacy connection. Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com https://duckbillgroup.com to get started.

33m
Dec 19, 2023
Using DevOps to Ignite a Chain Reaction of Productivity and Happiness with Dave Mangot

Dave Mangot, CEO and founder of Mangoteque, joins Coreyon Screaming in the Cloud to explain how leveraging DevOps improves the lives of engineers and results in stronger businesses. Dave talks about the importance of exclusively working for private equity firms that act ethically, the key difference between venture capital and private equity, and how conveying issues and ideas to your CEO using language he understands leads to faster results. Corey and Dave discuss why successful business are built on two things: infrastructure as code and monitoring. ABOUT DAVE Dave Mangot, author of DevOps Patterns for Private Equity, helps portfolio companies get good at delivering software.  He is a leading consultant, author, and speaker as the principal at Mangoteque.  A DevOps veteran, Dave has successfully led digital, SRE, and DevOps transformations at companies such as Salesforce, SolarWinds, and Cable & Wireless. He has a proven track record of working with companies to quickly mature their existing culture to improve the speed, frequency, and resilience of their software service delivery. LINKS REFERENCED: __ Mangoteque: https://www.mangoteque.com : https://www.amazon.com/DevOps-Patterns-Private-Equity-organization/dp/B0CHXVDX1K “How to Talk Business: A Short Guide for Tech Leaders”: https://itrevolution.com/articles/how-to-talk-business-a-short-guide-for-tech-leaders/ __ Transcript Announcer: Hello, and welcome to with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is . Corey: Welcome to . I’m Corey Quinn. My guest today is someone that I have known for, well, longer than I’ve been doing this show. Dave Mangot is the founder and CEO at Mangoteque https://www.mangoteque.com. Dave, thank you for joining me. Dave: Hey, Corey, it’s great to be here. Nice to see you again. Corey: I have to say, your last name is Mangot and the name of your company is Mangoteque, spelled M-A-N-G-O-T-E-Q-U-E, if I got that correctly, which apparently I did. What an amazing name for a company. How on earth did you name a company so well? Dave: Yeah, I don’t know. I have to think back, a few years ago, I was just getting started in consulting, and I was talking to some friends of mine who were giving me a bunch of advice—because they had been doing consulting for quite some time—about what my rates should be, about all kinds of—you know, which vendors I should work with for my legal advice. And I said, “I’m having a lot of trouble coming up with a name for the company.” And this guy, Corey Quinn, was like, “Hey, I got a name for you.” [laugh]. Corey: I like that story, just because it really goes to show the fine friends of mine over at all of the large cloud services companies—but mostly AWS—that it’s not that hard to name something well. The trick, I think, is just not to do it in committee. Dave: Yeah. And you know, it was a very small committee obviously of, like, three. But yeah, it’s been great. I have a lot of compliments on the name of my company. And I was like, oh, “You know that guy, the QuinnyPig dude?” And they’re like, “Yeah?” “Oh, yeah, it was—that was his idea.” And I liked it. And it works really well for the things that I do. Corey: It seems to. So, talk to you about what it is that you do because back when we first met and many, many years ago, you were an SRE manager at a now defunct observability company. This was so long ago, I don’t think that they used the term observability. It was Librato, which, “What do you do?” “We do monitoring,” back when that didn’t sound like some old-timey thing. Like, “Oh, yeah. Right, between the blacksmith and the cobbler.” But you’ve evolved significantly since you were doing the mundane, pedestrian tasks of keeping the service up and running. What do you do these days? Dave: Yeah, that was before the observability wars [laugh] [whatever you like 00:02:55] to call it. But over time, that company was owned by SolarWinds and I wound up being responsible for all the SolarWinds cloud company SRE organizations. So, started—ran a global organization there. And they were owned by a couple of private equity firms. And I got to know one of the firms rather well, and then when I left SolarWinds, I started working with private equity firm portfolio companies, especially software investments. And what I like to say is I teach people how to get good at delivering software. Corey: So, you recently wrote a book, and I know this because I make it a point to get a copy of the book—usually by buying it, but you beat me to it by gifting me one—of every guest I have on the show who’s written a book. Sometimes that means I wind up with the eclectic collections of poetry, other times, I wind up with a number of different books around the DevOps and cloud space. And one of these days, I’m going to wind up talking to someone who wound up writing an encyclopedia or something, to where I have to back the truck around. But what I wanted to ask is about your title, of all things. It’s called DevOps Patterns for Private Equity https://www.amazon.com/DevOps-Patterns-Private-Equity-organization/dp/B0CHXVDX1K. And I have to ask, what makes private equity special? Dave: I think as a cloud economist, what you also just told me, is you owe me $17.99 for the book because it was gifted. Corey: Is that how expensive books are these days? My God, I was under the impression once you put the word ‘DevOps’ in the title, that meant you’re above 40 bucks, just as, you know, entrance starting fees here. Dave: I think I need to talk to my local cloud economist on how to price things. Yeah, the book is about things that I’ve basically seen at portfolio companies over the years. The thing about, you know, why private equity, I think it would be one question, just because I’ve been involved in the DevOps movement since pretty much the start, when John Willis calls me a DevOps OG, which I think is a compliment. But the thing that I like about working with private equity, and more specifically, private equity portfolio companies is, like I wrote in the book, they’re serious. And serious means that they’re not afraid to make a big investment, they’re not afraid to change things quickly, they’re not afraid to reorganize, or rethink, or whatever because a lot of these private equity firms have, how they describe it as a three to five year investment thesis. So, in three to five years, they want to have some kind of an exit event, which means that they can’t just sit around and talk about things and try it and see what happens— Corey: In the fullness of time, 20 years from now. Yeah, it doesn’t work that well. But let’s back up a little bit here because something that I have noticed over the years is that, especially when it comes to financial institutions, the general level of knowledge is not terrific. For a time, a lot of people were very angry at Goldman Sachs, for example. But okay, fair enough. What does Goldman Sachs do? And the answer was generally incoherent. And again, I am in no way, shape or form, different from people who form angry opinions without having all of the facts. I do that myself three times before breakfast. My last startup was acquired by BlackRock, and I was the one that raised our hand internally, at the 40-person company when that was announced, as everyone was sort of sitting there stunned: “What’s a BlackRock?” Because I had no idea. Well, for the next nine months, I assure you, I found out what a BlackRock is. But what is private equity? Because I see a lot of them getting beaten up for destroying companies. Everyone wants to bring up the Toys-R-Us story as a for instance. But I don’t get the sense that that is the full picture. Tell me more. Dave: Yes. So, I’m probably not the best spokesperson for private equity. But— Corey: Because you don’t work for a private equity firm, you only work with them, that makes you a terrific spokesperson because you’re not [in 00:06:53] this position of, “Well, justify what your company does here,” situation, there’s something to be said for objectivity. Dave: So, you know, like I wrote in the book, there are approximately 10,000 private equity firms in the United States. They are not all going to be ethical. That is just not a thing. I choose to work with a specific segment of private equity companies, and these private equity companies want to make a good business. That’s what they’re going for. And you and I, having had worked at many companies in our careers, know that there’s a lot of companies out there that aren’t a good business. You’re like, “Why are we doing this? This doesn’t make any sense. This isn’t a good investment. This”—there’s a lot of things and what I would call the professional level private equity firms, the ones at the top—and not all of them at the top are ethical, don’t get me wrong; I have a blacklist here of companies I won’t work for. I will not say who those companies are. Corey: I am in the same boat. I think that anyone who works in an industry at all and doesn’t have a list of companies that they would not do business with, is, on some level, either haven’t thought it through, hasn’t been in business long enough, or frankly, as long as you’re paying them, everything you can do is a-okay. And you know, I’m not going to sit here and say that those are terrible people, but I never wanted to do that soul-searching. I always thought the only way to really figure out where you stand is to figure it out in advance before there’s money on the table. Like, do you want to go do contracting for a defense company? Well no, objectively, I don’t, but that’s a lot harder to say when they’re sitting on the table with $20 million in front of you of, “Do you want to work with a defense company?” Because you can rationalize your way into anything when the stakes are high enough. That’s where I’ve always stood on it. But please, continue. Dave: I’d love to be in that situation to turn down $20 million [laugh]. Corey: Yeah, that’s a hard situation to find yourself in, right? Dave: But regardless, there’s a lot of different kinds of private equity firms. Generally the firms that I work with, they all want—not generally; the ones I work with want to make better companies. I have had operating partners at these companies tell me—because this always comes up with private equity—there’s no way to cut your way to a good company. So, the private equity firms that I work with invest in these companies. Do they sell off unprofitable things? Of course they do. Do they try to streamline some things sometimes so that the company is only focused on X or Y, and then they tuck other companies into it—that’s called a buy and build strategy or a platform strategy—yes. But the purpose of that is to make a better company. The thing that I see a lot of people in our industry—meaning, like, us tech kind of folks—get confused about is what the difference is between venture capital and private equity. And private equity, in general, is the thing that is the kind of financing that follows on after venture capital. So, in venture capital, you are trying to find product-market fit. The venture capitalists are putting all their bets down like they’re in Vegas at re:Invent, and trying to figure out which bet is going to pay off, but they have no expectation that all of the bets are going to pay off. With private equity, the companies have product-market fit, they’re profitable. If they’re not profitable, they have a very clear line to profitability. And so, what these private equity firms are trying to do, no matter what the size of the company is, whether it’s a 50-person company or a 5000-person company, they’re trying to get these companies up to another level so that they’re more profitable and more valuable, so that either a larger fish will gobble them up or they’ll go out on the public markets, like onto the stock market, those kinds of things, but they’re trying to make a company that’s more valuable. And so, not everything looks so good [laugh] when you’re looking at it from the outside, not understanding what these people are trying to do. That’s not to say they’re not complete jerks who are in private equity because there are. Corey: Because some parts are missing. Kidding. Kidding. Kidding. Dave: [laugh]. Corey: It’s a nuanced area, and it’s complicated, just from the perspective of… finance is deceptively complicated. It looks simple, on some level, because on some level, you can always participate in finance. I have $10. I want to buy a thing that costs $7. How does that work? But it gets geometrically more complex the further you go. Financial engineering is very much a thing. And it is not at all obvious how those things interplay with different dynamics. One of the private equity outcomes, as you alluded to a few minutes ago, is the idea that they need to be able to rapidly effect change. It becomes a fast turnaround situation, and then have an exit event of some kind. So, the DevOps patterns that you write about are aligned with an idea of being effective, presumably, rather than, well, here’s how you slowly introduce a sweeping cultural mindset shift across the organization. Like, that’s great, but some of us don’t have that kind of runway for what we’re trying to achieve to be able to pull that off. So, I’m assuming that a lot of the patterns you talk about are emphasizing rapid results. Dave: Well, I think the best way to describe this, right, is what we’ve talked about is they want to make a better company. And for those of us who have worked in the DevOps movement for all these years, what’s one great way of making a better company? Adopting DevOps principles, right? And so, for me, one of the things I love about my job is I get to go in and make engineers’ lives better. No more working on weekends, no more we’re only going to do deployments at 11 o’clock at night, no more we’re going to batch things up and ship them three or four times a year, which all of us who’ve done DevOps stuff for years know, like, fastest way to have a catastrophe is batch up as many things as possible and release them all at once. So like, for me, I’m going in making engineers’ lives better. When their lives are better, they produce better results because they’re not stressed out, they’re not burned out, they get to spend time with their families, all those kinds of things. When they start producing better results, the executives are happier. The executives can go to the investors and show all the great results they’re getting, so the investors are happier. So, for me, I always say, like, I’m super lucky because I have a job that’s win, win, win. And like, I’m helping them to make a better company, I’m helping them to ship faster, I’m helping them do things in the cloud, I’m helping them get more reliability, which helps them retain customers, all these things. Because we know from the—you know, remember the : highest performers are twice as likely to meet or exceed their organization’s performance goals, and those can be customer retention, revenue, whatever those goals are. And so, I get to go in and help make a better company because I’m making people’s lives better and, kind of, everybody wins. And so, for me, it’s super rewarding. Corey: That’s a good way of framing it. I have to ask, since the goal for private equity, as you said, is to create better companies, to effectively fix a bunch of things that, for better or worse, had not been working optimally. Let me ask the big, dumb, naive question here. Isn’t that ostensibly the goal of every company? Now, everyone says it’s their goal, but whether that is their goal or not, I think, is a somewhat separate question. Dave: Yeah. I—that should be the goal of every company, I agree. There are people who read my book and said, “Hey, this stuff applies far beyond private equity.” And I say, “Yeah, it absolutely does.” But there are constraints—[gold rat 00:15:10]—within private equity, about the timing, about the funding, about whatever, to get the thing to another level. And that’s an interesting thing that I’ve seen is I’ve seen private equity companies take a company up to another level, have some kind of exit event, and then buy that company again years later. Which, like, what? Like, how could that be? Corey: I’ve seen that myself. It feels, on some level, like that company goes public, and then goes private, then goes public, then goes private to the same PE firm, and it’s like, are you really a PE company or are you just secretly a giant cat, perpetually on the wrong side of a door somewhere? Dave: But that’s because they will take it to a level, the company does things, things happen out in the market, and then they see another opportunity to grow them again. Where in a regular company—in theory—you’re going to want to just get better all the time, forever. This is the Toyota thesis about continual improvement. Corey: I am curious as far as what you are seeing changing in the market with the current macroeconomic conditions, which is a polite way to say the industry going wonky after ten years of being relatively up and to the right. Dave: Yeah, well, I guess the fun thing is, we have interest rates, we had a pandemic, we had [laugh], like, all this exciting stuff. There’s, you know, massive layoffs, [unintelligible 00:16:34] and then all this, kind of like, super churn-y things. I think the fun thing for me is, I went to a private equity conference in San Francisco, I don’t know, a month ago or something like that, and they had all these panelists on stage pontificating about this and that and the other thing, and one of the women said something that I thought was really great, especially for someone like me. She said, “The next five to ten years in private equity are going to be about growth and operational efficiency.” And I was like, “That’s DevOps. That’s awesome.” [laugh]. That really works well for me because, like, we want to have people twice as likely to meet or exceed their organization’s performance goals. That’s growth. And we want operational efficiency, right? Like, stop manually copying files around, start putting stuff in containers, do all these things that enable us to go fast speed and also do that with high quality. So, if the next five to ten years are going to be about growth and operational efficiency, I think it’s a great opportunity for people to take in a lot of these DevOps principles. And so, the being on the podcast, like, I think cloud is a huge part of that. I think that’s a big way to get growth and operational efficiency. Like, how better to be able to scale? How better to be able to Deming’s PDSA cycle, right—Plan, Do, Study, Act—how better to run all these experiments to find out, like, how to get better, how to be more efficient, how to meet our customers’ demands. I think that’s a huge part of it. Corey: That is, I think, a very common sentiment as far as how folks are looking at things from a bigger picture these days. I want to go back as well to something you said earlier that I was joking around at the start of the episode about, “Wow, what an amazing name for the company. How did you come up with it?” And you mentioned that you had been asking a bunch of people for advice—or rather, you mentioned you had gotten advice from people. I want to clarify, you were in fact asking. I wasn’t basically the human form of Clippy popping up, “It looks like you’re starting a business. Let me give you unsolicited advice on what you should be doing.” What you’ve done, I think, is a terrific example of the do what I say not what I do type of problem, where you have focused on your positioning on a specific segment of the market: private equity firms and their portfolio companies. If I had been a little bit smarter, I would have done something similar in my own business. I would fix AWS bills for insurance companies in the Pacific Northwest or something like that, where people can hear the type of company they are reflected in the name of what it is that you do. I was just fortunate enough or foolish enough to be noisy enough in order to talk about what I do in a way that I was able to overcome that. But targeting the way that you have, I think is just so spot on. And it’s clearly working out for you. Dave: I think a Corey Quinn Clippy would be very distracting in [laugh] my Microsoft Word, first of all [laugh]. Second of all— Corey: They’re calling it Copilot now. Dave: [laugh]—there’s this guy Corey and his partner Mike who turned me on to this guy, Jonathan Stark, who has his theory about your business. He calls it, like, elucidating, like, a Rolodex moment. So, if somebody’s talking about X or Y, and they say, “Oh, yeah. You want to talk to Corey about that.” Or, “You want to talk to Mike about that.” And so, for me, working with private equity portfolio companies, that’s a Rolodex moment. When people are like, “I’m at a portfolio company. We just got bought. They’re coming in, and they want to understand what our spend is on the cloud, and this and that. Like, I don’t know what I’m supposed to do here.” A lot of times people think of me because I tend to work on those kinds of problems. And so, it doesn’t mean I can’t work on other things, and I definitely do work on other things, I’ve definitely worked with companies that are not owned by private equity, but for me, that’s really a place that I enjoy working, and thankfully, I get Rolodex moments from those things. Corey: That’s the real value that I’ve found. The line I’ve heard is always it’s not just someone at a party popping up and saying, “Oh, yeah, I have that problem.” But, “Oh, my God, you need to talk to this person I know who has that problem.” It’s the introduction moment. In my case at least, it became very hard for me to find people self-identifying as having large AWS bills, just because, yeah, individual learners or small startup founders, for example, might talk about it here and there, but large companies do not tend to complain about that in Twitter because that tends to, you know, get them removed from their roles when they start going down that path. Do you find that it is easier for you to target what you do to people because it’s easier to identify them in public? Because I assure you, someone with a big AWS bill is hard to spot out of a crowd. Dave: Well, I think you need to meet people where they are, I think is probably the best way of saying that. So, if you are—and this isn’t something I need to explain to you, obviously, so this is more for your listeners, but like, if you’re going to talk about, “Hey, I’m looking for companies with large AWS bills,” [pthhh] like that’s, maybe kind of whatever. But if you say, “Hey, I want to improve your margins and your operational efficiencies,” all of a sudden, you’re starting to speak their language, right? And that language is where people start to understand that, “Hey, Corey’s talking about me.” Corey: A large part of how I talk about this was shaped by some of the early conversations I had. The way that I think about this stuff and the way that I talk is not necessarily what terms my customers use. Something that I found that absolutely changed my approach was having an investigative journalist—or a former investigative journalist, in this case—interview people I’d worked with to get case studies and testimonials from them. But what she would also do was get the exact phrasing that they use to describe the value that I did, and how they talked about what we’d done. Because that became something that was oh, you’re effectively writing the rough draft of my marketing copy when you do that. Speaking in the language of your customer is so important, and I meet a lot of early-stage startups that haven’t quite unlocked that bit of insight yet. Dave: And I think looking at that from a slightly different perspective is also super important. So, not only speaking the language of your customer, but let’s say you’re not a consultant like me or you. Let’s say you work inside of a company. You need to learn to speak the language of business, right? And this is, like, something I wrote about in the beginning of the book about the guy in San Francisco who got locked up for not giving away the Cisco passwords, and Gavin Newsom had to go to his jail cell and all this other crazy stuff that happened is, technologists often think that the reason that they go to work is to play with technology. The reason we go to work is to enable the business. And—so shameless plug here I—wrote a paper https://itrevolution.com/articles/how-to-talk-business-a-short-guide-for-tech-leaders/ that came out, like, two months ago with IT Revolution—so the people who do , and , and , and all that other stuff, I wrote this paper with, like, Courtney Kissler, and Paul Gaffney, and Scott Nasello, and a whole bunch of amazing technologists, but it’s about speaking the language of business. And as technologists, if we want to really contribute and feel like the work that we’re doing is contributing and valuable, you need to start understanding how those other people are talking. So, you and I were just talking about, like, operational efficiencies, and margins, and whatever. What is all that stuff? And figuring that out and being able to have that conversation with your CEO or whoever, those are the things that get people to understand exactly what you’re trying to do, and what you’re doing, and why this thing is so important. I talk to so many engineers that are like, “Ah, I talked to management and they just don’t understand, and [da-dah].” Yeah, they don’t understand because you’re speaking technology language. They don’t want to hear about, like, CNCF compliant this, that, and the—that doesn’t mean anything to them. You need to understand in their lang—talk to them and their language and say like, “Hey, this is why this is good for the business.” And I think that’s a really important thing for people to start to learn. Corey: So, a question that I have, given that you have been doing this stuff, I think, longer than I have, back when cloud wasn’t really a thing, and then it was a thing, but it seemed really irresponsible to do. And then it went through several more iterations to the point where now it’s everywhere. What’s your philosophy of cloud? Dave: So, I’ll go back to something that just came out, the just came out. I follow those things pretty closely. One of the things they talked about in the paper is one of the key differentiators to get your business to have what they call high organizational performance—again, this [laugh] is going back to business talk again—is what they call infrastructure flexibility. And I just don’t think you can get infrastructure flexibility if you’re not in the cloud. Can you do it? Absolutely. You know, back over a decade ago, I built out a bunch of stuff in a data center on what I called cloud principles. We could shoot things in the head, get new ones back, we did all kinds of things, we identified SKUs of, like, what kind of classes of machines we had. All that looks like a lot of stuff that you would just do in AWS, right? Like, I know, my C instances are compute. I know my M instances are memory. Like, they’re all just SKUs, right? Corey: Yeah, that changed a little bit now to the point where they have so many different instance families that some of their names look like dumps of their firmware. Dave: [laugh]. That is probably true. But like, this idea that, like, I want to have this infrastructure flexibility isn’t just my idea that it’s going to turn out well. Like, the kind of proves it. And so, for me, like, I go back to some of the principles of the DevOps movement, and like, if you look at the DORA metrics, let’s say you’ve got deployment frequency and lead time for changes. That’s speed: how fast can I do something? And you’ve got time-to-recover, and you’ve got change failure rate. That’s quality: how much can I ship without having problems, and how fast can I recover when I do? And I think this is one of the things I teach to a lot of my clients about moving into the cloud. If you want to be successful, you have to deliver with speed and quality. Speed: Infrastructure as Code, full stop. If I want to be able to go fast, I need to be able to destroy an environment, bring a new environment up, I need to be able to do that in minutes. That’s speed. And then the second requirement, and the only other requirement, is build monitoring in from the start. Everything gets monitored. And that’s quality. Like, if I monitor stuff, I know when I’ve deployed something that’s spiking CPU. If that’s monitored, I know that this thing is costing me a hell of a lot more than other things. I know all this stuff. And I can do capacity planning, I can do whatever the heck I want. But those are the two fundamental things: Infrastructure as Code and monitoring. And yes, like you said, I worked at a monitoring or observability company, so perhaps I’m slightly biased, but what I’ve seen is, like, companies that adopt those two principles, and everything else comes from that—so all my Kubernetes stuff and all those other things are not at odds with those principles—those are the people who actually wind up doing really well. And I think those are the people that have——infrastructure flexibility, and that enables them to have higher organizational performance. Corey: I think you’re onto something. Like, I still remember the days of having to figure out the number of people who you had in your ops team versus how many servers they could safely and reasonably run. And now that question has little, if any, meaning. If someone asked me, “Okay, so we’re running right now 10,000 instances in our cloud environment. How many admins should it take us to run those?” The correct response is, “How the heck are you running those things?” Like, tell me more because the answer is probably terrifying. Because right now, if you do that correctly, it’s you want to make a change to all of them or some subset of them? You change a parameter somewhere and computers do the heavy lifting. Dave: Yeah, I ran a content delivery network for cable and wireless. We had three types of machines. You know, it was like Windows Media Server and some squid-cache thing or whatever. And it didn’t matter how many we had. It’s all the same. Like, if I had 10,000 and I had 50,000, it’s irrelevant. Like, they’re all the same kind of crap. It’s not that hard to manage a bunch of stuff that’s all the same. If I have 10,000 servers and each one is a unique, special snowflake because I’m running in what I call a hosted configuration, I have 10,000 customers, therefore I have 10,000 servers, and each of them is completely different than the other, then that’s going to be a hell of a lot harder to manage than 10,000 things that the load balancer is like [bbbrrrp bbbrrrp] [laugh] like, just lay it out. So, it’s sort of a… kind of a nonsense question at this point. Like you’re saying, like, it doesn’t really matter how many. It’s complexity. How much complexity do I have? And as we all say, in the DevOps movement, complexity isn’t free. Which I’ll bet is a large component of how you save companies money with The Duckbill Group. Corey: It goes even beyond that because cloud infrastructure is always less expensive than the people working on it, unless you do something terrifying. Otherwise, everything should be running an EC2 instances. Nothing higher-level built on top of it because if people’s time is free, the cheapest thing you’re going to get is a bunch of instances. The end. That is not really how you should be thinking about this. Dave: [laugh]. I know a lot of private equity firms that would love to find a place where time was free [laugh]. They could make a lot of money. Corey: Yeah. Pretty sure that the biggest—like, “What’s your biggest competitive headwind?” You know [laugh], “Wage laws.” Like it doesn’t work that way. I’m sorry, but it doesn’t [laugh]. I really want to thank you for taking the time to talk to me about what you’re up to, how things are going over in your part of the universe. If people want to learn more, where’s the best place for them to go to find you? Dave: They can go to mangoteque.com https://mangoteque.com. I’ve got all the links to my blog, my mailing list. Definitely, if you’re interested in this intersection of DevOps and private equity, sign up for the mailing list. For people who didn’t get Corey’s funky spelling of my last name, it is a play on the fact that it is French and I also work with technology companies. So, it’s M-A-N-G-O-T-E-Q-U-E dot com. If you type that in—Mangoteque—to any search engine, obviously, you will find me. I am not difficult to find on the internet because I’ve been doing this for quite some time. But thank you for having me on the show. It’s always great to catch up with you. I love hearing about what you’re doing. I super appreciate you’re asking me about the things that I’m working on, and you know, been a big help. Corey: No, it’s deeply fascinating. It’s neat to watch you continue to meet your market in a variety of different ways. Dave Mangot, CEO and founder of Mangoteque, which is excellently named. I’m Cloud Economist Corey Quinn, and this is . If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this episode, please leave a five-star review on your podcast platform of choice, along with an angry comment almost certainly filled with incoherent screaming because you tuned out just as soon as you heard the words ‘private equity.’ Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com https://duckbillgroup.com to get started.

34m
Dec 14, 2023
Using SRE to Solve the Obvious Problems with Laura Nolan

Laura Nolan, Principal Software Engineer at Stanza, joins Corey on Screaming in the Cloud to offer insights on how to use SRE to avoid disastrous and lengthy production delays. Laura gives a rich history of her work with SREcon, why her approach to SRE is about first identifying the biggest fire instead of toiling with day-to-day issues, and why the lack of transparency in systems today actually hurts new engineers entering the space. Plus, Laura explains to Corey why she dedicates time to work against companies like Google who are building systems to help the government (inefficiently) select targets during wars and conflicts. ABOUT LAURA Laura Nolan is a software engineer and SRE. She has contributed to several books on SRE, such as the Site Reliability Engineering book, Seeking SRE, and 97 Things Every SRE Should Know. Laura is a Principal Engineer at Stanza, where she is building software to help humans understand and control their production systems. Laura also serves as a member of the USENIX Association board of directors. In her copious spare time after that, she volunteers for the Campaign to Stop Killer Robots, and is half-way through the MSc in Human Factors and Systems Safety at Lund University. She lives in rural Ireland in a small village full of medieval ruins. LINKS REFERENCED: __ Company Website: https://www.stanza.systems/ Twitter: https://twitter.com/lauralifts LinkedIn: https://www.linkedin.com/in/laura-nolan-bb7429/ __ Transcript Announcer: Hello, and welcome to with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is . Corey: Welcome to . I’m Corey Quinn. My guest today is someone that I have been low-key annoying to come onto this show for years, and finally, I have managed to wear her down. Lauren Nolan is a Principal Software Engineer over at Stanza https://www.stanza.systems/. At least that’s what you’re up to today, last I’ve heard. Is that right? Laura: That is correct. I’m working at Stanza, and I don’t want to go on and on about my startup, but I’m working with Niall Murphy and Joseph Bironas and Matthew Girard and a bunch of other people who more recently joined us. We are trying to build a load management SaaS service. So, we’re interested in service observability out of the box, knowing if your critical user journeys are good or bad out of the box, being able to prioritize your incoming requests by what’s most critical in terms of visibility to your customers. So, an emerging space. Not in the Gartner Group Magic Circle yet, but I’m sure at some point [laugh]. Corey: It is surreal to me to hear you talk about your day job because for, it feels like, the better part of a decade now, “Laura, Laura… oh, you mean USENIX Laura?” Because you are on the USENIX board of directors, and in my mind, that is what is always short-handed to what you do. It’s, “Oh, right. I guess that isn’t your actual full-time job.” It’s weird. It’s almost like seeing your teacher outside of the elementary school. You just figure that they fold themselves up in the closet there when you’re not paying attention. I don’t know what you do when SREcon is not in process. I assume you just sit there and wait for the next one, right? Laura: Well, no. We’ve run four of them in the last year, so there hasn’t been very much waiting. I’m afraid. Everything got a little bit smooshed up together during the pandemic, so we’ve had a lot of events coming quite close together. But no, I do have a full-time day job. But the work I do with USENIX is just as a volunteer. So, I’m on the board of directors, as you say, and I’m on the steering committee for all of the global SREcon events, and typically is often served by the program committee as well. And I’m sort of there, annoying the chairs to, “Hey, do your thing on time,” very much like an elementary school teacher, as you say. Corey: I’ve been a big fan of USENIX for a while. One of the best interview processes I ever saw was closely aligned with evaluating candidates along with USENIX SAGE levels to figure out what level of seniority are they in different areas. And it was always viewed through the lens of in what types of consulting engagements will the candidate shine within, not the idea of, “Oh, are you good or are you crap? And spoiler, if I’m asking the question, I’m of course defaulting myself to goading you to crap.” Like the terrible bespoke artisanal job interview process that so many companies do. I love how this company had built this out, and I asked them about it, and, “Oh, yeah, it comes—that dates back to the USENIX SAGE things.” That was one of my first encounters with what USENIX actually did. And the more I learned, the more I liked. How long have you been involved with the group? Laura: A relatively short period of time. I think I first got involved with USENIX in around 2015, going to [Lisa 00:03:29] and then going on to SREcon. And it was all by accident, of course. I fell onto the SREcon program committee somehow because I was around. And then because I was still around and doing stuff, I got eventually—you know, got co-opted into chairing and onto the steering committee and so forth. And you know, it’s like everything volunteer. I mean, people who stick around and do stuff tend to be kept around. But USENIX is quite important to me. We have an open access policy, which is something that I would like to see a whole lot more of, you know, we put everything right out there for free as soon as it is ready. And we are constantly plagued by people saying, “Hey, where’s my SREcon video? The conference was like two weeks ago.” And we’re like, “No, no, we’re still processing the videos. We’ll be there; they’ll be there.” We’ve had people, like, literally offer to pay extra money to get the videos sooner, but [laugh] we’re, like, we are open access. We are not keeping the videos away from you. We just aren’t ready yet. So, I love the open access policy and I think what I like about it more than anything else is the fact that it’s… we are staunchly non-vendor. We’re non-technology specific and non-vendor. So, it’s not, like, say, AWS re:Invent for example or any of the big cloud vendor conferences. You know, we are picking vendor-neutral content by quality. And as well, as anyone who’s ever sponsored SREcon or any of the other events will also tell you that that does not get you a talk in the conference program. So, the content selection is completely independent, and in fact, we have a complete Chinese wall between the sponsorship organization and the content organization. So, I mean, I really like how we’ve done that. I think, as well, it’s for a long time been one of the family of conferences that our organizations have conferences that has had the best diversity. Not perfect, but certainly better than it was, although very, very unfortunately, I see conference diversity everywhere going down after the pandemic, which is—particularly gender diversity—which is a real shame. Corey: I’ve been a fan of the SREcon conferences for a while before someone—presumably you; I’m not sure—screwed up before the pandemic and apparently thought they were talking about someone else, and I was invited to give a keynote at SREcon in EMEA that I co-presented with John Looney. Which was fun because he and I met in person for the first time three hours beforehand, beat together our talk, then showed up an hour beforehand, found there will be no confidence monitor, went away for the next 45 minutes and basically loaded it all into short term cash and gave a talk that we could not repeat if we had to for a million dollars, just because it was so… you’re throwing the ball to your partner on stage and really hoping they’re going to be able to catch it. And it worked out. It was an anger subtext translator skit for a bit, which was fun. All the things that your manager says but actually means, you know, the fun sort of approach. It was zany, ideally had some useful takeaways to it. But I loved the conference. That was one of the only SREcons that I found myself not surprised to discover was coming to town the next week because for whatever reason, there’s presumably a mailing list that I’m not on somewhere where I get blindsided by, “Oh, yeah, hey, didn’t you know SREcon is coming up?” There’s probably a notice somewhere that I really should be paying attention to, but on the plus side, I get to be delightfully surprised every time. Laura: Indeed. And hopefully, you’ll be delightfully surprised in March 2024. I believe it’s the 18th to the 20th, when SREcon will be coming to town in San Francisco, where you live. Corey: So historically, in addition to, you know, the work with USENIX, which is, again, not your primary occupation most days, you spent over five years at Google, which of course means that you have strong opinions on SRE. I know that that is a bit dated, where the gag was always, it’s only called SRE if it comes from the Mountain View region of California, otherwise it’s just sparkling DevOps. But for the initial take of a lot of the SRE stuff was, “Here’s how to work at Google.” It has progressed significantly beyond that to the point where companies who have SRE groups are no longer perceived incorrectly as, “Oh, we just want to be like Google,” or, “We hired a bunch of former Google people.” But you clearly have opinions to this. You’ve contributed to multiple books on SRE, you have spoken on it at length. You have enabled others to speak on it at length, which in many ways, is by far the better contribution. You can only go so far scaling yourself, but scaling other people, that has a much better multiplier on it, which feels almost like something an SRE might observe. Laura: It is indeed something an SRE might observe. And also, you know, good catch because I really felt you were implying there that you didn’t like my book contributions. Oh, the shock. Corey: No. And to be clear, I meant [unintelligible 00:08:13], strictly to speaking. Laura: [laugh]. Corey: Books are also a great one-to-many multiplier because it turns out, you can only shove so many people into a conference hall, but books have this ability to just carry your words beyond the room that you’re in a way that video just doesn’t seem to. Laura: Ah, but open access video that was published on YouTube, like, six weeks ahead [laugh]. That scales. Corey: I wish. People say they want to write a book and I think they’re all lying. I think they want to have written the book. That’s my philosophy on it. I do not understand people who’ve written a book. Like, “So, what are you going to do now?” “I’m going to write another book.” “Okay.” I’m going to smile, not take my eyes off you for a second and back away slowly because I do not understand your philosophy on that. But you’ve worked on multiple books with people. Laura: I actually enjoy writing. I enjoy the process of it because I always learn something when I write. In fact, I learn a lot of things when I write, and I enjoy that crafting. I will say I do not enjoy having written things because for me, any achievement once I have achieved it is completely dead. I will never think of it again, and I will think only of my excessively lengthy-to do list, so I clearly have problems here. But nevertheless. It’s exactly the same with programming projects, by the way. But back to SRE we were talking about SRE. SRE is 20 now. SRE can almost drink alcohol in the US, and that is crazy. Corey: So, 2003 was the founding of it, then. Laura: Yes. Corey: Yay, I can do simple arithmetic in my head, still. I wondered how far my math skills had atrophied. Laura: Yes. Good job. Yes, apparently invented in roughly 2003. So, the—I mean, from what I understand Google’s publishing of the, “20 years of SRE at Google,” they have, in the absence of an actual definite start date, they’ve simply picked. Ben Treynor’s start date at Google as the start date of SRE. But nevertheless, [unintelligible 00:09:58] about 20 years old. So, is it all grown up? I mean, I think it’s become heavily commodified. My feeling about SRE is that it’s always been this—I mean, you said it earlier, like, it’s about, you know, how do I scale things? How do I optimize my systems? How do I intervene in systems to solve problems to make them better, to see where we’re going to be in pain and six months, and work to prevent that? That’s kind of SRE work to me is, figure out where the problems are, figure out good ways to intervene and to improve. But there’s a lot of SRE as bureaucracy around at the moment where people are like, “Well, we’re an SRE team, so you know, you will have your SLO Golden Signals, and you will have your Production Readiness Checklists, which will be the things that we say, no matter how different your system is from what we designed this checklist for, and that’s it. We’re doing SRE now. It’s great.” So, I think we miss a lot there. My personal way of doing SRE is very much more about thinking, not so much about the day-to-day SLO [excursion-type 00:10:56] things because—not that they’re not important; they are important, but they will always be there. I always tend to spend more time thinking about how do we avoid the risk of, you know, a giant production fire that will take you down for days, or God forbid, more than days, you know? The sort of, big Roblox fire or the time that Meta nearly took down the internet in late-2021, that kind of thing. So, I think that modern SRE misses quite a lot of that. It’s a little bit like… so when BP, when they had the Deepwater Horizon disaster on that very same day, they received an award for minimizing occupational safety risks in their environment. So, you know, [unintelligible 00:11:41] things like people tripping and— Corey: Must have been fun the next day. “Yeah, we’re going to need that back.” Laura: [laugh] people tripping and falling, and you know, hitting themselves with a hammer, they got an award because it was so safe, they had very little of that. And then this thing goes boom. Corey: And now they’ve tried to pivot into an optimization award for efficiency, like, we just decided to flash fry half the sea life in the Gulf at once. Laura: Yes. Extremely efficient. So, you know, I worry that we’re doing SRE a little bit like BP. We’re doing it back before Deepwater Horizon. Corey: I should disclose that I started my technical career as a grumpy old Unix sysadmin—because it’s not like you ever see one of those who’s happy or young; didn’t matter that I was 23 years old, I was grumpy and old—and I have viewed the evolution since then have going from calling myself a sysadmin to a DevOps engineer to an SRE to a platform engineer to whatever we’re calling it this week, I still view it as fundamentally the same job, in the sense that the responsibility has not changed, and that is keep the site or environment up. But the tools, the processes and the techniques we apply to it have evolved. Is that accurate? Does it sound like I’m spouting nonsense? You’re far closer to the SRE world than I ever was, but I’m curious to get your take on that perspective. And please feel free to tell me I’m wrong. Laura: No, no. I think you’re completely right. And I think one of the ways that I think is shifted, and it’s really interesting, but when you and I were, when we were young, we could see everything that was happening. We were deploying on some sort of Linux box or other sort of Unix box somewhere, most likely, and if we wanted, we could go and see the entire source code of everything that our software was running on. And kids these days, they’re coming up, and they are deploying their stuff on RDS and ECS and, you know, how many layers of abstraction are sitting between them and— Corey: “I run Kubernetes. That means I don’t know where it runs, and neither does anyone else.” It’s great. Laura: Yeah. So, there’s no transparency anymore in what’s happening. So, it’s very easy, you get to a point where sometimes you hit a problem, and you just can’t figure it out because you do not have a way to get into that system and see what’s happening. You know, even at work, we ran into a problem with Amazon-hosted Prometheus. We were like, “This will be great. We’ll just do that.” And we could not get some particular type of remote write operation to work. We just could not. Okay, so we’ll have to do something else. So, one of the many, many things I do when I’m not, you know, trying to run the SREcon conference or do actual work or definitely not write a book, I’m studying at Lund University at the moment. I’m doing this master’s degree in human factors and system safety. And one of the things I’ve realized since doing that program is, in tech, we missed this whole 1980s and 1990s discipline of cognitive systems theory, cognitive systems engineering. This is what people were doing. They were like, how can people in the control room in nuclear plants and in the cockpit in the airplane, how can they get along with their systems and build a good mental model of the automation and understand what’s going on? We missed all that. We came of age when safety science was asking questions like how can we stop organizational failures like Challenger and Columbia, where people are just not making the correct decisions? And that was a whole different sort of focus. So, we’ve missed all of this 1980s and 1990s cognitive system stuff. And there’s this really interesting idea there where you can build two types of systems: you can build a prosthesis which does all your interaction with a system for you, and you can see nothing, feel nothing, do nothing, it’s just this black box, or you can have an amplifier, which lets you do more stuff than you could do just by yourself, but lets you still get into the details. And we build mostly prostheses. We do not build amplifiers. We’re hiding all the details; we’re building these very, very opaque abstractions. And I think it’s to the detriment of—I mean, it makes our life harder in a bunch of ways, but I think it also makes life really hard for systems engineers coming up because they just can’t get into the systems as easily anymore unless they’re running them themselves. Corey: I have to confess that I have a certain aversion to aspects of SRE, and I’m feeling echoes of it around a lot of the human factor stuff that’s coming out of that Lund program. And I think I know what it is, and it’s not a problem with either of those things, but rather a problem with me. I have never been a good academic. I have an eighth grade education because school is not really for me. And what I loved about being a systems administrator for years was the fact that it was like solving puzzles every day. I got to do interesting things, I got to chase down problems, and firefight all the time. And what SRE is represented is a step away from that to being more methodical, to taking on keeping the site up as a discipline rather than an occupation or a task that you’re working on. And I think that a lot of the human factors stuff plays directly into it. It feels like the field is becoming a lot more academic, which is a luxury we never had, when holy crap, the site is down, we’re going to go out of business if it isn’t back up immediately: panic mode. Laura: I got to confess here, I have three master’s degrees. Three. I have problems, like I said before. I got what you mean. You don’t like when people are speaking in generalizations and sort of being all theoretical rather than looking at the actual messy details that we need to deal with to get things done, right? I know. I know what you mean, I feel it too. And I’ve talked about the human factors stuff and theoretical stuff a fair bit at conferences, and what I always try to do is I always try and illustrate with the details. Because I think it’s very easy to get away from the actual problems and, you know, spend too much time in the models and in the theory. And I like to do both. I will confess, I like to do both. And that means that the luxury I miss out on is mostly sleep. But here we are. Corey: I am curious as far as what you’ve seen as far as the human factors adoption in this space because every company for a while claimed to be focused on blameless postmortems. But then there would be issues that quickly turned into a blame Steve postmortem instead. And it really feels, at least from a certain point of view, that there was a time where it seemed to be gaining traction, but that may have been a zero interest rate phenomenon, as weird as that sounds. Do you think that the idea of human factors being tied to keeping systems running in a computer sense has demonstrated staying power or are you seeing a recession? It could be I’m just looking at headlines too much. Laura: It’s a good question. There’s still a lot of people interested in it. There was a conference in Denver last February that was decently well attended for, you know, a first initial conference that was focusing on this issue, and this very vibrant Slack community, the LFI and the Learning from Incidents in Software community. I will say, everything is a little bit stretched at the moment in industry, as you know, with all the layoffs, and a lot of people are just… there’s definitely a feeling that people want to hunker down and do the basics to make sure that they’re not seen as doing useless stuff and on the line for layoffs. But the question is, is this stuff actually useful or not? I mean, I contend that it is. I contend that we can learn from failures, we can learn from what we’re doing day-to-day, and we can do things better. Sometimes you don’t need a lot of learning because what’s the biggest problem is obvious, right [laugh]? You know, in that case, yeah, your focus should just be on solving your big obvious problem, for sure. Corey: If there was a hierarchy of needs here, on some level, okay, step one, is the building— Laura: Yes. Corey: Currently on fire? Maybe solve that before thinking about the longer-term context of what this does to corporate culture. Laura: Yes, absolutely. And I’ve gone into teams before where people are like, “Oh, well, you’re an SRE, so obviously, you wish to immediately introduce SLOs.” And I can look around and go, “Nope. Not the biggest problem right now. Actually, I can see a bunch of things are on fire. We should fix those specific things.” I actually personally think that if you want to go in and start improving reliability in a system, the best thing to do is to start a weekly production meeting if the team doesn’t have that, actually create a dedicated space and time for everyone to be able to get together, discuss what’s been happening, discuss concerns and risks, and get all that stuff out in the open. I think that’s very useful, and you don’t need to spend however long it takes to formally sit down and start creating a bunch of SLOs. Because if you’re not dealing with a perfectly spherical web service where you can just use the Golden Signals and if you start getting into any sorts of thinking about data integrity, or backups, or any sorts of asynchronous processing, these sorts of things, they need SLOs that are a lot more interesting than your standard error rate and latency. Error rate and latency gets you so far, but it’s really just very cookie-cutter stuff. But people know what’s wrong with their systems, by and large. They may not know everything that’s wrong with their systems, but they’ll know the big things, for sure. Give them space to talk about it. Corey: Speaking of bigger things and turning into the idea of these things escaping beyond pure tech, you have been doing some rather interesting work in an area that I don’t see a whole lot of people that I talked to communicating about. Specifically, you’re volunteering for the campaign to stop killer robots, which ten years ago would have made you sound ridiculous, and now it makes you sound like someone who is very rationally and reasonably calling an alarm on something that is on our doorstep. What are you doing over there? Laura: Well, I mean, let’s be real, it sounds ridiculous because it is ridiculous. I mean, who would let a computer fly around to the sky and choose what to shoot at? But it turns out that there are, in fact, a bunch of people who are building systems like that. So yeah, I’ve been volunteering with the campaign for about the last five years, since roughly around the time that I left Google, in fact, because I got interested in that around about the time that Google was doing the Project Maven work, which was when Google said, “Hey, wouldn’t it be super cool if we took all of this DoD video footage of drone video footage, and, you know, did a whole bunch of machine-learning analysis on it and figured out where people are going all the time? Maybe we could click on this house and see, like, a whole timeline of people’s comings and goings and which other people they are sort of in a social network with.” So, I kind of said, “Ahh… maybe I don’t want to be involved in that.” And I left Google. And I found out that there was this campaign. And this campaign was largely lawyers and disarmament experts, people of that nature—philosophers—but also a few technologists. And for me, having run computer systems for a large number of years at this point, the idea that you would want to rely on a big distributed system running over some janky network with a bunch of 18-year-old kids running it to actually make good decisions about who should be targeted in a conflict seems outrageous. And I think almost every [laugh] software operations person, or in fact, software engineer that I’ve spoken to, tends to feel the same way. And yet there is this big practical debate about this in international relations circles. But luckily, there has just been a resolution in the UN just in the last day or two as we record this, the first committee has, by a very large majority, voted to try and do something about this. So hopefully, we’ll get some international law. The specific interventions that most of us in this field think would be good would be to limit the amount of force that autonomous weapon, or in fact, an entire set of autonomous weapons in a region would be able to wield because there’s a concern that should there be some bug or problem or a sort of weird factor that triggers these systems to— Corey: It’s an inevitability that there will be. Like, that is not up for debate. Of course, it’s going to break in 2020, the template slide deck that AWS sent out for re:Invent speakers had a bunch of clip art, and one of them was a line art drawing of a ham with a bone in it. So, I wound up taking that image, slapping it on a t-shirt, captioning it “AWS Hambone,” and selling that as a fundraiser for 826 National. Laura: [laugh]. Corey: Now, what happened next is that for a while, anyone who tweeted the phrase “AWS Hambone” would find themselves banned from Twitter for the next 12 hours due to some weird algorithmic thing where it thought that was doxxing or harassment or something. And people on the other side of the issue that you’re talking about are straight face-idly suggesting that we give that algorithm [unintelligible 00:24:32] tool a gun. Laura: Or many guns. Many guns. Corey: I’m sorry, what? Laura: Absolutely. Corey: Yes, or missiles or, heck, let’s build a whole bunch of them and turn them loose with no supervision, just like we do with junior developers. Laura: Exactly. Yes, so many people think this is a great idea, or at least they purport to think this is a great idea, which is not always the same thing. I mean, there’s lots of different vested interests here. Some people who are proponents of this will say, well, actually, we think that this will make targeting more accurate, less civilians will actually will die as a result of this. And the question there that you have to ask is—there’s a really good book called by Chamayou, Grégoire Chamayou, and he says that there’s actually three meanings to accuracy. So, are you hitting what you’re aiming at is one of it—one thing. And that’s a solved problem in military circles for quite some time. You got, you know, laser targeting, very accurate. Then the other question is, how big is the blast radius? So, that’s just a matter of, you know, how big an explosion are you going to get? That’s not something that autonomy can help with. The only thing that autonomy could even conceivably help with in terms of accuracy is better target selection. So, instead of selecting targets that are not valid targets, selecting more valid targets. But I don’t think there’s any good reason to think that computers can solve that problem. I mean, in fact, if you read stuff that military experts write on this, and I’ve got, you know, lots of academic handbooks on military targeting processes, they will tell you, it’s very hard and there’s a lot of gray areas, a lot of judgment. And that’s exactly what computers are pretty bad at. Although mind you, I’m amused by your Hambone story and I want to ask if AWS Hambone is a database? Corey: Anything is a database, if you hold it wrong. Laura: [laugh]. Corey: It’s fun. I went through a period of time where, just for fun, I would ask people to name an AWS service and I would talk about how you could use it incorrectly as a database. And then someone mentioned, “What about AWS Neptune,” which is their graph database, which absolutely no one understands, and the answer there is, “I give up. It’s impossible to use that thing as a database.” But everything else can be. Like, you know, the tagging system. Great, that has keys and values; it’s a database now. Welcome aboard. And I didn’t say it was a great database, but it is a free one, and it scales to a point. Have fun with it. Laura: All I’ll say is this: you can put labels on anything. Corey: Exactly. Laura: We missed you at the most recent SREcon EMEA. There was a talk about Google’s internal Chubby system and how people started using it as a database. And I did summon you in Slack, but you didn’t show up. Corey: No. Sadly, I’ve gotten a bit out of the SRE space. And also, frankly, I’ve gotten out of the community space for a little while, when it comes to conferences. And I have a focused effort at the start of 2024 to start changing that. I am submitting CFPs left and right. My biggest fear is that a conference will accept one of these because a couple of them are aspirational. “Here’s how I built the thing with generative AI,” which spoiler, I have done no such thing yet, but by God, I will by the time I get there. I have something similar around Kubernetes, which I’ve never used in anger, but soon will if someone accepts the right conference talk. This is how I learned Git: I shot my mouth off in a CFP, and I had four months to learn the thing. It was effective, but I wouldn’t say it was the best approach. Laura: [laugh]. You shouldn’t feel bad about lying about having built things in Kubernetes, and with LLMs because everyone has, right? Corey: Exactly. It’ll be true enough by the time I get there. Why not? I’m not submitting for a conference next week. We’re good. Yeah, Future Corey is going to hate me. Laura: Have it build you a database system. Corey: I like that. I really want to thank you for taking the time to speak with me today. If people want to learn more, where’s the best place for them to find you these days? Laura: Ohh, I’m sort of homeless on social media since the whole Twitter implosion, but you can still find me there. I’m @lauralifts https://twitter.com/lauralifts on Twitter and I have the same tag on BlueSky, but haven’t started to use it yet. Yeah, socials are hard at the moment. I’m on LinkedIn https://www.linkedin.com/in/laura-nolan-bb7429/. Please feel free to follow me there if you wish to message me as well. Corey: And we will, of course, put links to that in the [show notes 00:28:31]. Thank you so much for taking the time to speak with me. I appreciate it. Laura: Thank you for having me. Corey: Laura Nolan, Principal Software Engineer at Stanza. I’m Cloud Economist Corey Quinn, and this is . If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry, insulting comment that soon—due to me screwing up a database system—will be transmogrified into a CFP submission for an upcoming SREcon. Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com https://duckbillgroup.com to get started.

29m
Dec 12, 2023
Terraform and The Art of Teaching Tech with Ned Bellavance

Ned Bellavance worked in the world of tech for more than a decade before joining the family profession as an educator. He joins Corey on Screaming in the Cloud to discuss his shift from engineer to educator and content creator, the intricacies of Terraform, and how changes in licensing affect the ecosystem. ABOUT NED Ned is an IT professional with more than 20 years of experience in the field. He has been a helpdesk operator, systems administrator, cloud architect, and product manager. In 2019, Ned founded Ned in the Cloud LLC to work as an independent educator, creator, and consultant. In this new role, he develops courses for Pluralsight, runs multiple podcasts, writes books, and creates original content for technology vendors. Ned is a Microsoft MVP since 2017 and a HashiCorp Ambassador since 2020. Ned has three guiding principles: embrace discomfort, fail often, and be kind. LINKS REFERENCED: __ : https://nedinthecloud.com/ LinkedIn: https://www.linkedin.com/in/ned-bellavance/ __ Transcript Announcer: Hello, and welcome to with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is . Corey: Welcome to . I’m Corey Quinn. My guest today is Ned Bellavance, who’s the founder and curious human over at Ned in the Cloud https://nedinthecloud.com/. Ned, thank you for joining me. Ned: Yeah, it’s a pleasure to be here, Corey. Corey: So, what is ? There are a bunch of easy answers that I feel don’t give the complete story like, “Oh, it’s a YouTube channel,” or, “Oh no, it’s the name that you wound up using because of, I don’t know, easier to spell the URL or something.” Where do you start? Where do you stop? What are you exactly? Ned: What am I? Wow, I didn’t know we were going to get this deep into philosophical territory this early. I mean, you got to ease me in with something. But so, is the name of my blog from back in the days when we all started up a blog and hosted on WordPress and had fun. And then I was also at the same time working for a value-added reseller as a consultant, so a lot of what went on my blog was stuff that happened to me in the world of consulting. And you’re always dealing with different levels of brokenness when you go to clients, so you see some interesting things, and I blogged about them. At a certain point, I decided I want to go out and do my own thing, mostly focused on training and education and content creation and I was looking for a company name. And I went through—I had a list of about 40 different names. And I showed them to my wife, and she’s like, “Why don’t you go ? Why are you making this more complicated than it needs to be?” And I said, “Well, I’m an engineer. That is my job, by definition, but you’re probably right. I should just go with .” So, Ned in the Cloud now is a company, just me, focused on creating educational content for technical learners on a variety of different platforms. And if I’m delivering educational content, I am a happy human, and if I’m not doing that, I’m probably out running somewhere. Corey: I like that, and I’d like to focus on education first. There are a number of reasons that people will go in that particular direction, but what was it for you? Ned: I think it’s kind of in the heritage of my family. It’s in my blood to a certain degree because my dad is a teacher, my mom is a teacher-turned-librarian, my sister is a teacher, my wife is a teacher, her mother is a teacher. So, there was definitely something in the air, and I think at a certain point, I was the black sheep in the sense that I was the engineer. Look, this guy over here. And then I ended up deciding that I really liked training people and learning and teaching, and became a teacher of sorts, and then they all went, “Welcome to the fold.” Corey: It’s fun when you get to talk to people about the things that they’re learning because when someone’s learning something I find that it’s the time when their mind is the most open. I don’t think that that’s something that you don’t get to see nearly as much once someone already, quote-unquote, “Knows a thing,” because once that happens, why would you go back and learn something new? I have always learned the most—even about things that I’ve built myself—by putting it in the hands of users and seeing how they honestly sometimes hold it wrong and make mistakes that don’t make sense to me, but absolutely make sense to them. Learning something—or rather, teaching something—versus building that thing is very much an orthogonal skill set, and I don’t think that there’s enough respect given to that understanding. Ned: It’s an interesting sphere of people who can both build the thing and then teach somebody else to build the thing because you’re right, it’s very different skill sets. Being able to teach means that you have to empathize with the human being that you’re teaching and understand that their perspective is not yours necessarily. And one of the skills that you build up as an instructor is realizing when you’re making a whole bunch of assumptions because you know something really well, and that the person that you’re teaching is not going to have that context, they’re not going to have all those assumptions baked in, so you have to actually explain that stuff out. Some of my instruction has been purely online video courses through, like, Pluralsight; less of a feedback loop there. I have to publish the entire course, and then I started getting feedback, so I really enjoy doing live trainings as well because then I get the questions right away. And I always insist, like, if I’m delivering a lecture, and you have a question, please don’t wait for the end. Please interrupt me immediately because you’re going to forget what that question is, you’re going to lose your train of thought, and then you’re not going to ask it. And the whole class benefits when someone asks a question, and I benefit too. I learn how to explain that concept better. So, I really enjoy the live setting, but making the video courses is kind of nice, too. Corey: I learned to speak publicly and give conference talks as a traveling contract trainer for Puppet years ago, and that was an eye-opening experience, just because you don’t really understand something until you’re teaching other people how it works. It’s how I learned Git. I gave a conference talk that explained Git to people, and that was called a forcing function because I had four months to go to learn this thing I did not fully understand and welp, they’re not going to move the conference for me, so I guess I’d better hustle. I wouldn’t necessarily recommend that approach. These days, it seems like you have a, let’s say, disproportionate level of focus on the area of Infrastructure as Code, specifically you seem to be aiming at Terraform. Is that an accurate way of describing it? Ned: That is a very accurate way of describing it. I discovered Terraform while I was doing my consulting back in 2016 era, so this was pretty early on in the product’s lifecycle. But I had been using CloudFormation, and at that time, CloudFormation only supported JSON, which meant it was extra punishing. And being able to describe something more succinctly and also have access to all these functions and loops and variables, I was like, “This is amazing. Where were you a year ago?” And so, I really just jumped in with both feet into Terraform. And at a certain point, I was at a conference, and I went past the Pluralsight booth, and they mentioned that they were looking for instructors. And I thought to myself, well, I like talking about things, and I’m pretty excited about this Terraform thing. Why don’t I see if they’re looking for someone to do a Terraform course? And so, I went through their audition process and sure enough, that is exactly what they were looking for. They had no getting started course for Terraform at the time. I published the course in 2017, and it has been in the top 50 courses ever since on Pluralsight. So, that told me that there’s definitely an appetite and maybe this is an area I should focus on a little bit more. Corey: It’s a difficult area to learn. About two months ago, I started using Terraform for the first time in anger in ages. I mean, I first discovered it when I was on my way back from one of those Puppet trainings, and the person next to me was really excited about this thing that we’re about to launch. Turns out that was Mitchell Hashimoto and Armon was sitting next to him on the other side. Why he had a middle seat, I’ll never know. But it was a really fun conversation, just talking about how he saw the world and what he was planning on doing. And a lot of that vision was realized. What I figured out a couple months ago is both that first, I’m sort of sad that Terraform is as bad as it is, but it’s the best option we’ve got because everything else is so much worse. It is omnipresent, though. Effectively, every client I’ve ever dealt with on AWS billing who has a substantial estate is managing it via Terraform. It is the lingua franca of cloud across the board. I just wish it didn’t require as much care and feeding, especially for the getting-started-with-a-boilerplate type of scenario. So, much of what you type feels like it's useless stuff that should be implicit. I understand why it’s not, but it feels that way. It’s hard to learn. Ned: It certainly can be. And you’re right, there’s a certain amount of boilerplate and [sigh] code that you have to write that seems pointless. Like, do I have to actually spell this all out? And sometimes the answer is yes, and sometimes the answer is you should use a module for that. Why are you writing this entire VPC configuration out yourself? And that’s the sort of thing that you learn over time is that there are shortcuts, there are ways to make the code simpler and require less care and feeding. But I think ultimately, your infrastructure, just like your software, evolves, changes has new requirements, and you need to manage it in the same way that you want to manage your software. And I wouldn’t tell a software developer, “Oh, you know, you could just write it once and never go back to it. I’m sure it’s fine.” And by the same token, I wouldn’t tell an infrastructure developer the same thing. Now, of course, people do that and never go back and touch it, and then somebody else inherits that infrastructure and goes, “Oh, God. Where’s the state data?” And no one knows, and then you’re starting from scratch. But hopefully, if you have someone who’s doing it responsibly, they’ll be setting up Terraform in such a way that it is maintainable by somebody else. Corey: I’d sure like to hope so. I have encountered so many horrible examples of code and wondering what malicious person wrote this. And of course, it was me, 6 or 12 months ago. Ned: Always [laugh]. Corey: I get to play architect around a lot of these things. In fact, that’s one of the problems that I’ve had historically with an awful lot of different things that I’ve basically built, called it feature complete, let it sit for a while using the CDK or whatnot, and then oh, I want to make a small change to it. Well, first, I got to spend half a day during the entire line dependency updates and seeing what’s broken and how all of that works. It feels like for better or worse, Terraform is a lot more stable than that, as in, old versions of Terraform code from blog posts from 2016 will still effectively work. Is that accurate? I haven’t done enough exploring in that direction to be certain. Ned: The good thing about Terraform is you can pin the version of various things that you’re using. So, if you’re using a particular version of the AWS provider, you can pin it to that specific version, and it won’t automatically upgrade you to the latest and greatest. If you didn’t do that, then you’ll get bit by the update bug, which certainly happens to some folks when they changed the provider from version 3 to version 4 and completely changed how the S3 bucket object was created. A lot of people’s scripts broke that day, so I think that was the time for everyone to learn what the version argument is and how it works. But yeah, as long as you follow that general convention of pinning versions of your modules and of your resource provider, you should be in a pretty stable place when you want to update it. Corey: Well, here’s the $64,000 question for you, then. Does Dependabot on your GitHub repo begin screaming at you as soon as you’ve done that because in one of its dependencies in some particular weird edge cases when they’re dealing with unsanitized, internet-based input could wind up taking up too many system resources, for example? Which is, I guess, in an ideal world, it wouldn’t be an issue, but in practice, my infrastructure team is probably not trying to attack the company from the inside. They have better paths to get there, to be very blunt. Ned: [laugh]. Corey: Turns out giving someone access to a thing just directly is way easier than making them find it. But that’s been one of the frustrating parts where, especially when it encounters things like, I don’t know, corporate security policies of, “Oh, you must clear all of these warnings,” which well-intentioned, poorly executed seems to be the takeaway there. Ned: Yeah, I’ve certainly seen some implementations of tools that do static scanning of Terraform code and will come up with vulnerabilities or violations of best practice, then you have to put exceptions in there. And sometimes it’ll be something like, “You shouldn’t have your S3 bucket public,” which in most cases, you shouldn’t, but then there’s the one team that’s actually publishing a front-facing static website in the S3 bucket, and then they have to get, you know, special permission from on high to ignore that warning. So, a lot of those best practices that are in the scanning tools are there for very good reasons and when you onboard them, you should be ready to see a sea of red in your scan the first time and then look through that and kind of pick through what’s actually real, and we should improve in our code, and what’s something that we can safely ignore because we are intentionally doing it that way. Corey: I feel like there’s an awful lot of… how to put this politely… implicit dependencies that are built into things. I’ll wind up figuring out how to do something by implementing it and that means I will stitch together an awful lot of blog posts, things I found on Stack Overflow, et cetera, just like a senior engineer and also Chat-Gippity will go ahead and do those things. And then the reason—like, someone asks me four years later, “Why is that thing there?” And… “Well, I don’t know, but if I remove it, it might stop working, so…” there was almost a cargo-culting style of, well, it’s always been there. So, is that necessary? Is it not? I’m ashamed by how often I learned something very fundamental in a system that I’ve been using for 20 years—namely, the command line—just by reading the man page for a command that I already, quote-unquote, “Already know how to use perfectly well.” Yeah, there’s a lot of hidden gems buried in those things. Ned: Oh, my goodness, I learned something about the Terraform CLI last week that I wish I’d known two years ago. And it’s been there for a long time. It’s like, when you want to validate your code with the TERRAFORM VALIDATE, you can initialize without initializing the back-end, and for those who are steeped in Terraform, that means something and for everybody else, I’m sorry [laugh]. But I discovered that was an option, and I was like, “Ahhh, this is amazing.” But to get back to the sort of dependency problems and understanding your infrastructure better—because I think that’s ultimately what’s happening when you have to describe something using Infrastructure as Code—is you discover how the infrastructure actually works versus how you thought it worked. If you look at how—and I’m going to go into Azure-land here, so try to follow along with me—if you go into Azure-land and you look at how they construct a load balancer, the load balancer is not a single resource. It’s about eight different resources that are all tied together. And AWS has something similar with how you have target groups, and you have the load balancer component and the listener and the health check and all that. Azure has the same thing. There’s no actual load balancer object, per se. There’s a bunch of different components that get slammed together to form that load balancer. When you look in the portal, you don’t see any of that. You just see a load balancer, and you might think this is a very simple resource to configure. When it actually comes time to break it out into code, you realize, oh, this is eight different components, each of which has its own options and arguments that I need to understand. So, one of the great things that I have seen a lot of tooling up here around is doing the import of existing infrastructure into Terraform by pointing the tool at a collection of resources—whatever they are—and saying, “Go create the Terraform code that matches that thing.” And it’s not going to be the most elegant code out there, but it will give you a baseline for what all the settings actually are, and other resource types are, and then you can tweak it as needed to add in input variables or remove some arguments that you’re not using. Corey: Yeah, I remember when they first announced the importing of existing state. It’s wow, there’s an awful lot of stuff that it can be aware of that I will absolutely need to control unless I want it to start blowing stuff away every time I run the—[unintelligible 00:15:51] supposedly [unintelligible 00:15:52] thing against it. And that wasn’t a lot of fun. But yeah, this is the common experience of it. I only recently was reminded of the fact that I once knew, and I’d forgotten that a public versus private subnet in AWS is a human-based abstraction, not something that is implicit to the API or the way they envision subnets existing. Kind of nice, but also weird when you have to unlearn things that you’ve thought you’d learned. Ned: That’s a really interesting example of we think of them as very different things, and when we draw nice architecture diagrams there—these are the private subnets and these are the public ones. And when you actually go to create one using Terraform—or really another tool—there’s no box that says ‘private’ or ‘make this public.’ It’s just what does your route table look like? Are you sending that traffic out the internet gateway or are you sending it to some sort of NAT device? And how does traffic come back into that subnet? That’s it. That’s what makes it private versus public versus a database subnet versus any other subnet type you want to logically assign within AWS. Corey: Yeah. It’s kind of fun when that stuff hits. Ned: [laugh]. Corey: I am curious, as you look across the ecosystem, do you still see that learning Terraform is a primary pain point for, I guess, the modern era of cloud engineer, or has that sunk below the surface level of awareness in some ways? Ned: I think it’s taken as a given to a certain degree that if you’re a cloud engineer or an aspiring cloud engineer today, one of the things you’re going to learn is Infrastructure as Code, and that Infrastructure as Code is probably going to be Terraform. You can still learn—there’s a bunch of other tools out there; I’m not going to pretend like Terraform is the end-all be-all, right? We’ve got—if you want to use a general purpose programming language, you have something like Pulumi out there that will allow you to do that. If you want to use one of the cloud-native tools, you’ve got something like CloudFormation or Azure has Bicep. Please don’t use ARM templates because they hurt. They’re still JSON only, so at least CloudFormation added YAML support in there. And while I don’t really like YAML, at least it’s not 10,000 lines of code to spin up, like, two domain controllers in a subnet. Corey: I personally wind up resolving the dichotomy between oh, should we go with JSON or should we go with YAML by picking the third option everyone hates more. That’s why I’m a staunch advocate for XML. Ned: [laugh]. I was going to say XML. Yeah oh, as someone who dealt with SOAP stuff for a while, yeah, XML was particularly painful, so I’m not sad that went away. JSON for me, I work with it better, but YAML is more readable. So, it’s like it’s, pick your poison on that. But yeah, there’s a ton of infrastructure tools out there. They all have basically the same concepts behind them, the same core concepts because they’re all deploying the same thing at the end of the day and there’s only so many ways you can express that concept. So, once you learn one—say you learned CloudFormation first—then Terraform is not as big of a leap. You’re still declaring stuff within a file and then having it go and make those things exist. It’s just nuances between the implementation of Terraform versus CloudFormation versus Bicep. Corey: I wish that there were more straightforward abstractions, but I think that as soon as you get those, that inherently limits what you’re able to do, so I don’t know how you square that circle. Ned: That’s been a real difficult thing is, people want some sort of universal cloud or infrastructure language and abstraction. I just want a virtual machine. I don’t care what kind of platform I’m on. Just give me a VM. But then you end up very much caring [laugh] what kind of VM, what operating system, what the underlying hardware is when you get to a certain level. So, there are some workloads where you’re like, I just needed to run somewhere in a container and I really don’t care about any of the underlying stuff. And that’s great. That’s what Platform as a Service is for. If that’s your end goal, go use that. But if you’re actually standing up infrastructure for any sort of enterprise company, then you need an abstraction that gives you access to all the underlying bits when you want them. So, if I want to specify different placement groups about my VM, I need access to that setting to create a placement group. And if I have this high-level of abstraction of a virtual machine, it doesn’t know what a placement group is, and now I’m stuck at that level of abstraction instead of getting down to the guts, or I’m going into the portal or the CLI and modifying it outside of the tool that I’m supposed to be using. Corey: I want to change gears slightly here. One thing that has really been roiling some very particular people with very specific perspectives has been the BSL license change that Terraform has wound up rolling out. So far, the people that I’ve heard who have the strongest opinions on it tend to fall into one of three categories: either they work at HashiCorp—fair enough, they work at one of HashiCorp’s direct competitors—which yeah, okay, sure, or they tend to be—how to put this delicately—open-source evangelists, of which I freely admit I used to be one and then had other challenges I needed to chase down in other ways. So, I’m curious as to where you, who are not really on the vendor side of this at all, how do you see it shaking out? Ned: Well, I mean, just for some context, essentially what HashiCorp decided to do was to change the licensing from Mozilla Public licensing to BSL for, I think eight of their products and Terraform was amongst those. And really, this sort of tells you where people are. The only one that anybody really made any noise about was Terraform. There’s plenty of people that use Vault, but I didn’t see a big brouhaha over the fact that Vault changed its licensing. It’s really just about Terraform. Which tells you how important it is to the ecosystem. And if I look at the folks that are making the most noise about it, it’s like you said, they basically fall into one of two camps: it’s the open-source code purists who believe everything should be licensed in completely open-source ways, or at least if you start out with an open-source license, you can’t convert to something else later. And then there is a smaller subset of folks who work for HashiCorp competitors, and they really don’t like the idea of having to pay HashiCorp a regular fee for what used to be ostensibly free to them to use. And so, what they ended up doing was creating a fork of Terraform, just before the licensing change happened and that fork of Terraform was originally called OpenTF, and they had an OpenTF manifesto. And I don’t know about you, when I see the word ‘manifesto,’ I back away slowly and try not to make any sudden moves. Corey: You really get the sense there’s going to be a body count tied to this. And people are like, “What about the Agile Manifesto?” “Yeah, what about it?” Ned: [laugh]. Yeah, I’m just—when I see ‘manifesto,’ I get a little bit nervous because either someone is so incredibly passionate about something that they’ve kind of gone off the deep end a little bit, or they’re being somewhat duplicitous, and they have ulterior motives, let’s say. Now, I’m not trying to cast aspersions on anybody. I can’t read anybody’s mind and tell you exactly what their intention was behind it. I just know that the manifesto reads a little bit like an open-source purist and a little bit like someone having a temper tantrum, and vacillating between the two. But cooler heads prevailed a little bit, and now they have changed the name to OpenTofu, and it has been accepted by the Linux Foundation as a project. So, it’s now a member of the Linux Foundation, with all the gravitas that that comes with. And some people at HashiCorp aren’t necessarily happy about the Linux Foundation choosing to pull that in. Corey: Yeah, I saw a whole screed, effectively, that their CEO wound up brain-dumping on that frankly, from a messaging perspective, he would have been better served as not to say anything at all, to be very honest with you. Ned: Yeah, that was a bit of a yikes moment for me. Corey: It’s very rare that you will listen yourself into trouble as opposed to opening your mouth and getting yourself into trouble. Ned: Exactly. Corey: You wouldn’t think I would be one of those—of all people who would have made that observation, you wouldn’t think I would be on that list, yet here I am. Ned: Yeah. And I don’t think either side is entirely blameless. I understand the motivations behind HashiCorp wanting to make the change. I mean, they’re a publicly traded company now and ostensibly that means that they should be making some amount of money for their investors, so they do have to bear that in mind. I don’t necessarily think that changing the licensing of Terraform is the way to make that money. I think in the long-term, it’s not going—it may not hurt them a lot, but I don’t think it’s going to help them out a lot, and it’s tainted the goodwill of the community to a certain degree. On the other hand, I don’t entirely trust what the other businesses are saying as well in their stead. So, there’s nobody in this that comes out a hundred percent clean [laugh] on the whole process. Corey: Yeah, I feel like, to be direct, the direct competitors to HashiCorp along its various axes are not the best actors necessarily to complain about what is their largest competitor no longer giving them access to continue to compete against them with their own product. I understand the nuances there, but it also doesn’t feel like they are the best ambassadors for that. I also definitely understand where HashiCorp is coming from where, why are we investing all this time, energy, and effort for people to basically take revenue away from us? But there’s also the bigger problem, which is, by and large, compared to how many sites are running Terraform and the revenues that HashiCorp puts up for it, they’re clearly failing to capture the value they have delivered in a massive way. But counterpoint, if they hadn’t been open-source for their life until this point, would they have ever captured that market share? Probably not. Ned: Yeah, I think ultimately, the biggest competitor to their paid offering of Terraform is their free version of Terraform. It literally has enough bells and whistles already included and plenty of options for automating those things and solving the problems that their enterprise product solves that their biggest problem is not other competitors in the Terraform landscape; it’s the, “Well, we already have something, and it’s good enough.” And I’m not sure how you sell to that person, that’s why I’m not in marketing, but I think that is their biggest competitor is the people who already have a solution and are like, “Why do I need to pay for your thing when my thing works well enough?” Corey: That’s part of the strange thing that I’m seeing as I look across this entire landscape is it feels like this is not something that is directly going to impact almost anyone out there who’s just using this stuff, either the open-source version as a paying customer of any of these things, but it is going to kick up a bunch of dust. And speaking of poor messaging, HashiCorp is not really killing it this quarter, where the initial announcement led to so many questions that were unclear, such as—like, they fixed this later in the frequently asked questions list, but okay, “I’m using Terraform right now and that’s fine. I’m building something else completely different. Am I going to lose my access to Terraform if you decide to launch a feature that does what my company does?” And after a couple of days, they put up an indemnity against that. Okay, fine. Like, when Mongo did this, there was a similar type of dynamic that was emerging, but a lot fewer people are writing their own database engine to then sell onward to customers that are provisioning infrastructure on behalf of their customers. And where the boundaries lay for who was considered a direct Terraform competitor was unclear. I’m still not convinced that it is clear enough to bet the business on for a lot of these folks. It comes down to say what you mean, not—instead of hedging, you’re not helping your cause any. Ned: Yeah, I think out of the different products that they have, some are very clear-cut. Like, Vault is a server that runs as a service, and so that’s very clear what that product is and where the lines of delineation are around Vault. If I go stand up a bunch of Vault servers and offer them as a service, then that is clearly a competitor. But if I have an automation pipeline service and people can technically automate Terraform deployments with my service, even if that’s not the core thing that I’m looking to do, am I now a competitor? Like, it’s such a fuzzy line because Terraform isn’t an application, it’s not a server that runs somewhere, it’s a CLI tool and a programming language. So yeah, those lines are very, very fuzzy. And I… like I said, it would be better if they say what they meant, as opposed to sort of the mealy-mouthed language that they ended up using and the need to publish multiple revisions of that FAQ to clarify their position on very specific niche use cases. Corey: Yeah, I’m not trying to be difficult or insulting or anything like that. These are hard problems that everyone involved is wrestling with. It just felt a little off, and I think the messaging did them no favors when that wound up hitting. And now, everyone is sort of trying to read the tea leaves and figure out what does this mean because in isolation, it doesn’t mean anything. It is a forward-looking thing. Whatever it is you’re doing today, no changes are needed for you, until the next version comes out, in which case, okay, now do we incorporate the new thing or don’t we? Today, to my understanding, whether I’m running Terraform or OpenTofu entirely comes down to which binary am I invoking to do the apply? There is no difference of which I am aware. That will, of course, change, but today, I don’t have to think about that. Ned: Right. OpenTofu is a literal fork of Terraform, and they haven’t really added much in the way of features, so it should be completely compatible with Terraform. The two will diverge in the future as feature as new features get added to each one. But yeah, for folks who are using it today, they might just decide to stay on the version pre-fork and stay on that for years. I think HashiCorp has pledged 18 months of support for any minor version of Terraform, so you’ve got at least a year-and-a-half to decide. And we were kind of talking before the recording, 99% of people using Terraform do not care about this. It does not impact their daily workflow. Corey: No. I don’t see customers caring at all. And also, “Oh, we’re only going to use the pre-fork version of Terraform,” they’re like, “Thanks for the air cover because we haven’t updated any of that stuff in five years, so tha”— Ned: [laugh]. Corey: “Oh yeah, we’re doing it out of license concern. That’s it. That’s the reason we haven’t done anything recent with it.” Because once it’s working, changes are scary. Ned: Yeah. Corey: Terraform is one of those scary things, right next to databases, that if I make a change that I don’t fully understand—and no one understands everything, as we’ve covered—then this could really ruin my week. So, I’m going to be very cautious around that. Ned: Yeah, if metrics are to be believed across the automation platforms, once an infrastructure rollout happens with a particular version of Terraform, that version does not get updated. For years. So, I have it on good authority that there’s still Terraform version 0.10 and 0.11 running on these automation platforms for really old builds where people are too scared to upgrade to, like, post 0.12 where everything changed in the language. I believe that. People don’t want to change it, especially if it’s working. And so, for most people, this licensing chain doesn’t matter. And all the constant back and forth and bickering just makes people feel a little nervous, and it might end up pushing people away from Terraform as a platform entirely, as opposed to picking a side. Corey: Yeah, and I think that that is probably the fair way to view it at this point where right now—please, friends at HashiCorp and HashiCorp competitors don’t yell at me for this—it’s basically a nerd slap-fight at the moment. Ned: [laugh]. Corey: And of one of the big reasons that I also stay out of these debates almost entirely is that I married a corporate attorney who used to be a litigator and I get frustrated whenever it comes down to license arguments because you see suddenly a bunch of engineers who get to cosplay as lawyers, and reading the comments is infuriating once you realize how a little bit of this stuff works, which I’ve had 15 years of osmotic learning on this stuff. Whenever I want to upset my wife, I just read some of these comments aloud and then our dinner conversation becomes screaming. It’s wonderful. Ned: Bad legal takes? Yeah, before— Corey: Exactly. Ned: Before my father became a social studies teacher, he was a lawyer for 20 years, and so I got to absorb some of the thought process of the lawyer. And yeah, I read some of these takes, and I’m like, “That doesn’t sound right. I don’t think that would hold up in any court of law.” Though a lot of the open-source licensing I don’t think has been tested in any sort of court of law. It’s just kind of like, “Well, we hope this stands up,” but nobody really has the money to check. Corey: Yeah. This is the problem with these open-source licenses as well. Very few have never been tested in any meaningful way because I don’t know about you, but I don’t have a few million dollars in legal fees lying around to prove the point. Ned: Yeah. Corey: So, it’s one of those we think this is sustainable, and Lord knows the number of companies that have taken reliances on these licenses, they’re probably right. I’m certainly not going to disprove the fact—please don’t sue me—but yeah, this is one of those things that we’re sort of assuming is the case, even if it’s potentially not. I really want to thank you for taking the time to discuss how it is you view these things and talk about what it is you’re up to. If people want to learn more, where’s the best place for them to find you? Ned: Honestly, just go to my website. It’s nedinthecloud.com https://nedinthecloud.com. And you can also find me on LinkedIn https://www.linkedin.com/in/ned-bellavance/. I don’t really go for Twitter anymore. Corey: I envy you. I wish I could wean myself off of it. But we will, of course, include a link to that in the show notes. Thank you so much for being so generous with your time. It’s appreciated. Ned: It’s been a pleasure. Thanks, Corey. Corey: Net Bellavance, founder and curious human at Ned in the Cloud. I’m Cloud Economist Corey Quinn, and this is . If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment that I will then fork under a different license and claim as my own. Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com https://duckbillgroup.com to get started.

35m
Dec 07, 2023
Creating Value in Incident Management with Robert Ross

Robert Ross, CEO and Co-Founder at FireHydrant, joins Corey on Screaming in the Cloud to discuss how being an on-call engineer fighting incidents inspired him to start his own company. Robert explains how FireHydrant does more than just notify engineers of an incident, but also helps them to be able to effectively put out the fire. Robert tells the story of how he “accidentally” started a company as a result of a particularly critical late-night incident, and why his end goal at FireHydrant has been and will continue to be solving the problem, not simply choosing an exit strategy. Corey and Robert also discuss the value and pricing models of other incident-reporting solutions and Robert shares why he feels surprised that nobody else has taken the same approach FireHydrant has.  ABOUT ROBERT Robert Ross is a recovering on-call engineer, and the CEO and co-founder at FireHydrant. As the co-founder of FireHydrant, Robert plays a central role in optimizing incident response and ensuring software system reliability for customers. Prior to founding FireHydrant, Robert previously contributed his expertise to renowned companies like Namely and Digital Ocean.  LINKS REFERENCED: __ FireHydrant: https://firehydrant.com/ https://firehydrant.com/ Twitter: https://twitter.com/bobbytables https://twitter.com/bobbytables __ Transcript Announcer: Hello, and welcome to with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is . Corey: Developers are responsible for more than ever these days. Not just the code they write, but also the containers and cloud infrastructure their apps run on. And a big part of that responsibility is app security — from code to cloud. That’s where Snyk comes in. Snyk is a frictionless security platform that meets teams where they are, automating application security controls across their existing tools, workflows, and the AWS application stack — including seamless integrations with AWS CodePipeline, Amazon EKS, Amazon Inspector and several others. I’m a customer myself. Deploy on AWS. Secure with Snyk. Learn more at snyk.co/scream. That’s S-N-Y-K-dot-C-O/scream. Corey: Welcome to , I’m Corey Quinn. And this featured guest episode is brought to us by our friends at FireHydrant https://firehydrant.com/ and for better or worse, they’ve also brought us their CEO and co-founder, Robert Ross, better known online as Bobby Tables. Robert, thank you for joining us. Robert: Super happy to be here. Thanks for having me. Corey: Now, this is the problem that I tend to have when I’ve been tracking companies for a while, where you were one of the only people that I knew of at FireHydrant. And you kind of still are, so it’s easy for me to imagine that, oh, it’s basically your own side project that turned into a real job, sort of, side hustle that’s basically you and maybe a virtual assistant or someone. I have it on good authority—and it was also signaled by your Series B—that there might be more than just you over there now. Robert: Yes, that’s true. There’s a little over 60 people now at the company, which is a little mind-boggling for me, starting from side projects, building this in Starbucks to actually having people using the thing and being on payroll. So, a little bit of a crazy thing for me. But yes, over 60. Corey: So, I have to ask, what is it you folks do? When you say ‘fire hydrant,’ the first thing that I think I was when I was a kid getting yelled at by the firefighter for messing around with something I probably shouldn’t have been messing around with. Robert: So, it’s actually very similar where I started it because I was messing around with software in ways I probably shouldn’t have and needed a fire hydrant to help put out all the fires that I was fighting as an on-call engineer. So, the name kind of comes from what do you need when you’re putting out a fire? A fire hydrant. So, what we do is we help people respond to incidents really quickly, manage them from ring to retro. So, the moment you declare an incident, we’ll do all the timeline tracking and eventually help you create a retrospective at the very end. And it’s been a labor of love because all of that was really painful for me as an engineer. Corey: One of the things that I used to believe was that every company did something like this—and maybe they do, maybe they don’t—I’m noticing these days an increasing number of public companies will never admit to an incident that very clearly ruined things for their customers. I’m not sure if they’re going to talk privately to customers under NDAs and whatnot, but it feels like we’re leaving an era where it was an expectation that when you had a big issue, you would do an entire public postmortem explaining what had happened. Is that just because I’m not paying attention to the right folks anymore, or are you seeing a downturn in that? Robert: I think that people are skittish of talking about how much reliability they—or issues they may have because we’re having this weird moment where people want to open more incidents like the engineers actually want to say we have more incidents and officially declare those, and in the past, we had these, like, shadow incidents that we weren’t officially going to say it was an incident, but was a pretty big deal, but we’re not going to have a retro on it so it’s like it didn’t happen. And kind of splitting the line between what’s a SEV1, when should we actually talk about this publicly, I think companies are still trying to figure that out. And then I think there’s also opposing forces. We talk to folks and it’s, you know, public relations will sometimes get involved. My general advice is, like, you should be probably talking about it no matter what. That’s how you build trust. It’s trust, with incidences, lost in buckets and gained back in drops, so you should be more public about it. And I think my favorite example is a major CDN had a major incident and it took down, like, the UK government website. And folks can probably figure out who I’m talking about, but their stock went up the next day. You would think that a major incident taking down a large portion of the internet would cause your stock to go down. Not the case. They were on it like crazy, they communicated about it like crazy, and lo and behold, you know, people were actually pretty okay with it as far as they could be at the end of the day. Corey: The honest thing that really struck me about that was I didn’t realize that CDN that you’re referencing was as broadly deployed as it was. Amazon.com took some downtime as a result of this. Robert: Yeah. Corey: It’s, “Oh, wow. If they’re in that many places, I should be taking them more seriously,” was my takeaway. And again, I don’t tend to shame folks for incidents because as soon as you do that, they stopped talking about them. They still have them, but then we all lose the ability to learn from them. I couldn’t help but notice that the week that we’re recording this, so there was an incident report put out by AWS for a Lambda service event in Northern Virginia. It happened back in June, we’re recording this late in October. So, it took them a little bit of time to wind up getting it out the door, but it’s very thorough, very interesting as far as what it talks about as far as their own approach to things. Because otherwise, I have to say, it is easy as a spectator slash frustrated customer to assume the absolute worst. Like, you’re sitting around there and like, “Well, we have a 15-minute SLA on this, so I’m going to sit around for 12 minutes and finish my game of solitaire before I answer the phone.” No, it does not work that way. People are scrambling behind the scenes because as systems get more complicated, understanding the interdependencies of your own system becomes monstrous. I still remember some of the very early production engineering jobs that I had where—to what you said a few minutes ago—oh, yeah, we’ll just open an incident for every alert that goes off. Then we dropped a [core switch 00:05:47] and Nagio sent something like 8000 messages inside of two minutes. And we would still, 15 years later, not be done working through that incident backlog had we done such a thing. All of this stuff gets way harder than you would expect as soon as your application or environment becomes somewhat complicated. And that happens before you realize it. Robert: Yeah, much faster. I think that, in my experience, there’s a moment that happens for companies where maybe it’s the number of customers you have, number of servers you’re running in production, that you have this, like, “Oh, we’re running a big workload right now in a very complex system that impacts people’s lives, frankly.” And the moment that companies realize that is when you start to see, like, oh, process change, you build it, you own it, now we have an SRE team. Like, there’s this catalyst that happens in all of these companies that triggers this. And it’s—I don’t know, from my perspective, it’s coming at a faster rate than people probably realize. Corey: From my perspective, I have to ask you this question, and my apologies in advance if it’s one of those irreverent ones, but do you consider yourself to be an observability company? Robert: Oh, great question. No. No, actually. We think that we are the baton handoff between an observability tool and our platform. So, for example, we think that that’s a good way to kind of, you know, as they say, monitor the system, give reports on that system, and we are the tool that based on that monitor may be going off, you need to do something about it. So, for example, I think of it as like a smoke detector in some cases. Like, in our world, like that’s—the smoke detector is the thing that’s kind of watching the system and if something’s wrong, it’s going to tell you. But at that point, it doesn’t really do anything that’s going to help you in the next phase, which is managing the incident, calling 911, driving to the scene of the fire, whatever analogies you want to use. But I think the value-add for the observability tools and what they’re delivering for businesses is different than ours, but we touch each other, like, very much so. Corey: Managing an incident when something happens and diagnosing what is the actual root cause of it, so to speak—quote-unquote, “Root cause.” I know people have very strong opinions on— Robert: Yeah, say the word [laugh]. Corey: —that phrase—exactly—it just doesn’t sound that hard. It is not that complicated. It’s, more or less, a bunch of engineers who don’t know what they’re actually doing, and why are they running around chasing this stuff down is often the philosophy of a lot of folks who have never been in the trenches dealing with these incidents themselves. I know this because before I was exposed to scale, that’s what I thought and then, oh, this is way harder than you would believe. Now, for better or worse, an awful lot of your customers and the executives at those customers did, for some strange reason, not come up through production engineering as the thing that they’ve done. They are executives, so it feels like it would be a challenging conversation to have with them, but one thing that you’ve got in your back pocket, which I always love talking to folks about, is before this, you were an engineer and then you became a CEO of a reasonably-sized company. That is a very difficult transition. Tell me about it. Robert: Yeah. Yeah, so a little of that background. I mean, I started writing code—I’ve been writing code for two-thirds of my life. So, I’m 32 now; I’m relatively young. And my first job out of high school—skipping college entirely—was writing code. I was 18, I was working in a web dev shop, I was making good enough money and I said, you know what? I don’t want to go to college. That sounds—I’m making money. Why would I go to college? And I think it was a good decision because I got to be able—I was right kind of in the centerpiece of when a lot of really cool software things were happening. Like, DevOps was becoming a really cool term and we were seeing the cloud kind of emerge at this time and become much more popular. And it was a good opportunity to see all this confluence of technology and people and processes emerge into what is, kind of like, the base plate for a lot of how we build software today, starting in 2008 and 2009. And because I was an on-call engineer during a lot of that, and building the systems as well, that I was on call for, it meant that I had a front-row seat to being an engineer that was building things that was then breaking, and then literally merging on GitHub and then five minutes later [laugh], seeing my phone light up with an alert from our alerting tool. Like, I got to feel the entire process. And I think that that was nice because eventually one day, I snapped. And it was after a major incident, I snapped and I said, “There’s no tool that helps me during this incident. There’s no tool that kind of helps me run a process for me.” Because the only thing I care about in the middle of the night is going back to bed. I don’t have any other priority [laugh] at 2 a.m. So, I wanted to solve the problem of getting to the fire faster and extinguishing it by automating as much as I possibly could. The process that was given to me in an outdated Confluence page or Google Doc, whatever it was, I wanted to automate that part so I could do the thing that I was good at as an engineer: put out the fire, take some notes, and then go back to bed, and then do a retrospective sometime next day or in that week. And it was a good way to kind of feel the problem, try to build a solution for it, tweak a little bit, and then it kind of became a company. I joke and I say on accident, actually. Corey: I’ll never forget one of the first big, hairy incidents that I had to deal with in 2009, where my coworker had just finished migrating the production environment over to LDAP on a Thursday afternoon and then stepped out for a three-day weekend, and half an hour later, everything started exploding because LDAP will do that. And I only had the vaguest idea of how LDAP worked at all. This was a year into my first Linux admin job; I’d been a Unix admin before that. And I suddenly have the literal CEO of the company breathing down my neck behind me trying to figure out what’s going on and I have no freaking idea of myself. And it was… feels like there’s got to be a better way to handle these things. We got through. We wound up getting it back online, no one lost their job over it, but it was definitely a touch-and-go series of hours there. And that was a painful thing. And you and I went in very different directions based upon experiences like that. I took a few more jobs where I had even worse on-call schedules than I would have believed possible until I started this place, which very intentionally is centered around a business problem that only exists during business hours. There is no 2 a.m. AWS billing emergency. There might be a security issue masquerading as one of those, but you don’t need to reach me out of business hours because anything that is a billing problem will be solved in Seattle’s timeline over a period of weeks. You leaned into it and decided, oh, I’m going to start a company to fix all of this. And okay, on some level, some wit that used to work here, wound up once remarking that when an SRE doesn’t have a better idea, they start a monitoring company. Robert: [laugh]. Corey: And, on some level, there’s some validity to it because this is the problem that I know, and I want to fix it. But you’ve differentiated yourself in a few key ways. As you said earlier, you’re not an observability company. Good for you. Robert: Yeah. That’s a funny quote. Corey: Pete Cheslock. He has a certain way with words. Robert: Yeah [laugh]. I think that when we started the company, it was—we kind of accidentally secured funding five years ago. And it was because this genuinely was something I just, I bought a laptop for because I wanted to own the IP. I always made sure I was on a different network, if I was going to work on the company and the tool. And I was just writing code because I just wanted to solve the problem. And then some crazy situation happened where, like, an investor somehow found FireHydrant because they were like, “Oh, this SRE thing is a big space and incidents is a big part of it.” And we got to talking and they were like, “Hey, we think what you’re building is valuable and we think you should build a company here.” And I was—like, you know, the Jim Carrey movie, ? Like, that was kind of me in that moment. I was like, “Sure.” And here we are five years later. But I think the way that we approached the problem was let’s just solve our own problem and let’s just build a company that we want to work at. And you know, I had two co-founders join me in late 2018 and that’s what we told ourselves. We said, like, “Let’s build a company that we want to work for, that solves problems that we have had, that we care about solving.” And I think it’s worked out, you know? We work with amazing companies that use our tool—much to their chagrin [laugh]—multiple times a day. It’s kind of a problem when you build an incident response tool is that it’s a good thing when people are using it, but a bad thing for them. Corey: I have to ask of all of the different angles to approach this from, you went with incident management as opposed to focusing on something that is more purely technical. And I don’t say that in any way that is intended to be sounding insulting, but it’s easier from an engineering mind to—having been one myself—to come up with, “Here’s how I make one computer talk to his other computer when the following event happens.” That’s a much easier problem by orders of magnitude than here’s how I corral the humans interacting with that computer’s failure to talk to another computer in just the right way. How did you get onto this path? Robert: Yeah. The problem that we were trying to solve for it was the getting the right people in the room problem. We think that building services that people own is the right way to build applications that are reliable and stable and easier to iterate on. Put the right people that build that software, give them, like, the skin in the game of also being on call. And what that meant for us is that we could build a tool that allowed people to do that a lot easier where allowing people to corral the right people by saying, “This service is broken, which powers this functionality, which means that these are the people that should get involved in this incident as fast as possible.” And the way we approached that is we just built up part of our functionality called Runbooks, where you can say, “When this happens, do this.” And it’s catered for incidents. So, there’s other tools out there, you can kind of think of as, like, we’re a workflow tool, like Zapier, or just things that, like, fire webhooks at services you build and that ends up being your incident process. But for us, we wanted to make it, like, a really easy way that a project manager could help define the process in our tool. And when you click the button and say, “Declare Incident: LDAP is Broken,” and I have a CEO standing behind me, our tool just would corral the people for you. It was kind of like a bat signal in the air, where it was like, “Hey, there’s this issue. I’ve run all the other process. I just need you to arrive at and help solve this problem.” And we think of it as, like, how can FireHydrant be a mech suit for the team that owns incidents and is responsible for resolving them? Corey: There are a few easier ways to make a product sound absolutely ridiculous than to try and pitch it to a problem that it is not designed to scale to. What is the ‘you must be at least this tall to ride’ envisioning for FireHydrant? How large slash complex of an organization do you need to be before this starts to make sense? Because I promise, as one person with a single website that gets no hits, that is probably not the best place for— Robert: Probably not. Corey: To imagine your ideal user persona. Robert: Well, I’m sure you get way more hits than that. Come on [laugh]. Corey: It depends on how controversial I’m being in a given week. Robert: Yeah [laugh]. Corey: Also, I have several ridiculous, nonsense apps out there, but honestly, those are for fun. I don’t charge people for them, so they can deal with my downtime till I get around to it. That’s the way it works. Robert: Or, like, spite-visiting your website. No it’s—for us, we think that the ‘must be this tall’ is when do you have, like, sufficiently complicated incidents? We tell folks, like, if you’re a ten-person shop and you have incidents, you know, just use our free tier. Like, you need something that opens a Slack channel? Fine. Use our free tier or build something that hits the Slack API [unintelligible 00:18:18] channel. That’s fine. But when you start to have a lot of people in the room and multiple pieces of functionality that can break and multiple people on call, that’s when you probably need to start to invest in incident management. Because it is a return on investment, but there is, like, a minimum amount of incidents and process challenges that you need to have before that return on investment actually, I would say, comes to fruition. Because if you do think of, like, an incident that takes downtime, or you know, you’re a retail company and you go down for, let’s say, ten minutes, and your number of sales per hour is X, it’s actually relatively simple for that type of company to understand, okay, this is how much impact we would need to have from an incident management tool for it to be valuable. And that waterline is actually way—it’s way lower than I think a lot of people realize, but like you said, you know, if you have a few 100 visitors a day, it’s probably not worth it. And I’ll be honest there, you can use our free tier. That’s fine. Corey: Which makes sense. It’s challenging to wind up-sizing things appropriately. Whenever I look at a pricing page, there are two things that I look for. And incidentally, when I pull up someone’s website, I first make a beeline for pricing because that is the best way I found for a lot of the marketing nonsense words to drop away and it get down to brass tacks. And the two things I want are free tier or zero-dollar trial that I can get started with right now because often it’s two in the morning and I’m trying to see if this might solve a problem that I’m having. And I also look for the enterprise tier ‘contact us’ because there are big companies that do not do anything that is not custom nor do they know how to sign a check that doesn’t have two commas in it. And whatever is between those two, okay, that’s good to look at to figure out what dimensions I’m expected to grow on and how to think about it, but those are the two tent poles. And you’ve got that, but pricing is always going to be a dark art. What I’ve been seeing across the industry. And if we put it under the broad realm of things that watch your site and alert you and help manage those things, there are an increasing number of, I guess what I want to call component vendors, where you’ll wind up bolting together a couple dozen of these things together into an observability pipeline-style thing, and each component seems to be getting extortionately expensive. Most of the wake-up-in-the-middle-of-the-night services that will page you—and there are a number of them out there—at a spot check of these, they all cost more per month per user than Slack, the thing that most of us to end up living within. This stuff gets fiendishly expensive, fiendishly quickly, and at some point, you’re looking at this going, “The outage is cheaper than avoiding the outage through all of these things. What are we doing here?” What’s going on in the industry, other than ‘money printing machine stopped going brrr’ in quite the same way? Robert: Yeah, I think that for alerting specifically, this is a big part of, like, the journey that we wanted to have in FireHydrant was like, we also want to help folks with the alerting piece. So, I’ll focus on that, which is, I think that the industry around notifying people for incidents—texts, call, push notifications, emails, there’s a bunch of different ways to do it—I think where it gets really crazy expensive as in this per-seat model that most of them seem to have landed on. And we’re per-seat for, like, the core platform of FireHydrant—so you know, before people spite-visit FireHydrant, look at our pricing pitch—but we’re per-seat there because the value there is, like, we’re the full platform for the service catalog retrospectives, Runbooks, like, there’s a whole other component of FireHydrant—status pages—but when it comes to alerting, like, in my opinion, that should be active user for a few reasons. I think that if you’re going to have people responding to incidents and the value from us is making sure they get to that incident very quickly because we wake them up in the middle of the night, we text them, we call them we make their Hue lights turn red, whatever it is, then that’s, like, the value that we’re delivering at that moment in time, so that’s how we should probably invoice you. And I think that what’s happened is that the pricing for these companies, they haven’t innovated on the product in a way that allows them to package that any differently. So, what’s happened, I think, is that the packaging of these products has been almost restrictive in the way that they could change their pricing models because there’s nothing much more to package on. It’s like, cool there’s an alerting aspect to this, but that’s what people want to buy those tools for. They want to buy the tool so it wakes them up. But that tool is getting more expensive. There was even a price increase announced today for a big one [laugh] that I’ve been publicly critical of. That is crazy expensive for a tool that texts you and call you. And what peo—what’s going on now are people are looking, they’re looking at the pricing sheet for Twilio and going, “What the heck is going on?” Like, I—to send a text on Twilio in the United States is fractions of a penny and here we are paying $40 a user for that person to receive six texts that month because of a webhook that hit an HCP server and, like, it’s supposed to call that person? That’s kind of a crazy model if you think about it. Like, engineers are kind of going, “Wait a minute. What’s up here?” Like, and when engineers start thinking, “I could build this on a weekend,” like, something’s wrong, like, with that model. And I think that people are starting to think that way. Corey: Well engineers, to be fair, will think that about an awful lot of stuff. Robert: Anything. Yeah, they [laugh]— Corey: I’ve heard it said about Dropbox, Facebook, the internet— Robert: Oh, Dropbox is such a good one. Corey: BGP. Yeah okay, great. Let me know how that works out for you. Robert: What was that Dropbox comment on years ago? Like, “Just set up NFS and host it that way and it’s easy.” Right? Corey: Or rsync. Yeah— Robert: Yeah, it was rsync. Corey: What are you going to make with that? Like, who’s going to buy that? Like, basically everyone for at least a time. Robert: And whether or not the engineers are right, I think is a different point. Corey: It’s the condescension dismissal of everything that isn’t writing the code that really galls, on some level. Robert: But I think when engineers are thinking about, like, “I could build this on a weekend,” like, that’s a moment that you have an opportunity to provide the value in an innovative, maybe consolidated way. We want to be a tool that’s your incident management ring to retro, right? You get paged in the middle of the night, we’re going to wake you up, and when you open up your laptop, groggy-eyed, and like, you’re about to start fighting this fire, FireHydrant’s already done a lot of work. That’s what we think is, like, the right model do this. And candidly, I have no idea why the other alerting tools in this space haven’t done this. I’ve said that and people tend to nod in agreement and say like, “Yeah, it’s been—it’s kind of crazy how they haven’t approached this problem yet.” And… I don’t know, I want to solve that problem for folks. Corey: So, one thing that I have to ask, you’ve been teasing on the internet for a little bit now is something called Signals where you are expanding your product into the component that wakes people up in the middle of the night, which in isolation, fine, great, awesome. But there was a company whose sole stated purpose was to wake people up in the middle of the night, and then once they started doing some business things such as, oh I don’t know, going public, they needed to expand beyond that to do a whole bunch of other things. But as a customer, no, no, no, you are the thing that wakes me up in the middle of the night. I don’t want you to sprawl and grow into everything else because if you’re going to have to pick a vendor that claims to do everything, well, I’ll just stay with AWS because they already do that and it’s one less throat to choke. What is that pressure that is driving companies that are spectacular at the one thing to expand into things that frankly, they don’t have the chops to pull off? And why is this not you doing the same thing? Robert: Oh, man. The end of that question is such a good one and I like that. I’m not an economist. I’m not—like, that’s… I don’t know if I have a great comment on, like, why are people expanding into things that they don’t know how to do. It seems to be, like, a common thing across the industry at a certain point— Corey: Especially particularly generative AI. “Oh, we’ve been experts in this for a long time.” “Yeah, I’m not that great at dodgeball, but you also don’t see me mouthing off about how I’ve been great at it and doing it for 30 years, either.” Robert: Yeah. I mean, there was a couple ads during football games I watched. I’m like, “What is this AI thing that you just, like, tacked on the letter X to the end of your product line and now all of a sudden, it’s AI?” I have plenty of rants that are good for a cocktail at some point, but as for us, I mean, we knew that we wanted to do alerting a long time ago, but it does have complications. Like, the problem with alerting is that it does have to be able to take a brutal punch to the face the moment that AWS us-east-2 goes down. Because at that moment in time, a lot of webhooks are coming your way to wake somebody up, right, for thousands of different companies. So, you do have to be able to take a very, very sufficient amount of volume instantaneously. So, that was one thing that kind of stopped us. In 2019 even, we wrote a product document about building an alerting tool and we kind of paused. And then we got really deep into incident management, and the thing that makes us feel very qualified now is that people are actually already integrating their alerting tools into FireHydrant today. This is a very common thing. In fact, most people are paying for a FireHydrant and an alerting tool. So, you can imagine that gets a little expensive when you have both. So, we said, well, let’s help folks consolidate, let’s help folks have a modern version of alerting, and let’s build on top of something we’ve been doing very well already, which is incident management. And we ended up calling it Signals because we think that we should be able to receive a lot of signals in, do something correct with them, and then put a signal out and then transfer you into incident management. And yeah, we’re are excited for it actually. It’s been really cool to see it come together. Corey: There’s something to be said for keeping it in a certain area of expertise. And people find it very strange when they reach out to my business partner and me asking, okay, so are you going to expand into Google Cloud or Azure or—increasingly, lately—Datadog—which has become a Fortune 500 board-level expense concern, which is kind of wild to me, but here we are—and asking if we’re going to focus on that, and our answer is no because it’s very… well, not very, but it is relatively easy to be the subject matter expert in a very specific, expensive, painful problem, but as soon as you start expanding that your messaging loses focus and it doesn’t take long—since we do you view this as an inherent architectural problem—where we’re saying, “We’re the best cloud engineers and cloud architects in the world,” and then we’re competing against basically everyone out there. And it costs more money a year for Accenture or Deloitte’s marketing budget than we’ll ever earn as a company in our entire lifetime, just because we are not externally boosted, we’re not putting hundreds of people into the field. It’s a lifestyle business that solves an expensive, painful problem for our customers. And that focus lends clarity. I don’t like the current market pressure toward expansion and consolidation at the cost of everything, including it seems, customer trust. Robert: Yeah. That’s a good point. I mean, I agree. I mean, when you see a company—and it’s almost getting hard to think about what a company does based on their name as well. Like, names don’t even mean anything for companies anymore. Like Datadog has expanded into a whole lot of things beyond data and if you think about some of the alerting tools out there that have names of, like, old devices that used to attach to our hips, that’s just a different company name than what represents what they do. And I think for us, like, incidents, that’s what we care about. That’s what I know. I know how to help people manage incidents. I built software that broke—sometimes I was an arsonist—sometimes I was a firefighter, it really depends, but that’s the thing that we’re going to be good at and we’re just going to keep building in that sphere. Corey: I think that there’s a tipping point that starts to become pretty clear when companies focus away from innovating and growing and serving customers into revenue protection mode. And I think this is a cyclical force that is very hard to resist. But I can tell even having conversations like this with folks, when the way that a company goes about setting up one of these conversations with me, you came by yourself, not with a squadron of PR people, not with a whole giant list of talking points you wanted to go to, just, “Let’s talk about this stuff. I’m interested in it.” As a company grows, that becomes more and more uncommon. Often, I’ll see it at companies a third the size of yours, just because there’s so much fear around everything we say must be spoken in such a way that it could never be taken in a negative way against us. That’s not the failure mode. The failure mode is that no one listens to you or cares what you have to say. At some point, yeah, I get the shift, but damned if it doesn’t always feel like it’s depressing. Robert: Yeah. This is such great questions because I think that the way I think about it is, I care about the problem and if we solve the problem and we solve it well and people agree with us on our solution being a good way to solve that problem, then the revenue, like, happens because of that. I’ve gotten asked from, like, from VCs and customers, like, “What’s your end goal with FireHydrant as the CEO of the company?” And what they’re really asking is, like, “Do you want to IPO or be acquired?” That’s always a question every single time. And my answer is, maybe, I don’t know, philosophical, but it’s, I think if we solve the problem, like, one of those will happen, but that’s not the end goal. Because if I aim at that, we’re going to come up short. It’s like how they tell you to throw a ball, right? Like they don’t say, aim at the glove. They say, like, aim behind the person. And that’s what we want to do. We just want to aim at solving a problem and then the revenue will come. You have to be smart about it, right? It’s not a field of dreams, like, if you build it, like, revenue arrives, but—so you do have to be conscious of the business and the operations and the model that you work within, but it should all be in service of building something that’s valuable. Corey: I really want to thank you for taking the time to speak with me. If people want to learn more, where should they go to find you, other than, you know, to their most recent incident page? Robert: [laugh]. No, thanks for having me. So, to learn more about me, I mean, you can find me on Twitter https://twitter.com/bobbytables on—or X. What do we call it now? Corey: I call it Twitter because I don’t believe in deadnaming except when it’s companies. Robert: Yeah [laugh]. twitter.com/bobbytables https://twitter.com/bobbytables if you want to find me there. If you want to learn more about FireHydrant and what we’re doing to help folks with incidents and incident response and all the fun things in there, it’s firehydrant.com https://firehydrant.com or firehydrant.io https://firehydrant.io, but we’ll redirect you to dot com. Corey: And we will, of course, put a link to all of that in the [show notes 00:33:10]. Thank you so much for taking the time to speak with me. It’s deeply appreciated. Robert: Thank you for having me. Corey: Robert Ross, CEO and co-founder of FireHydrant. This featured guest episode has been brought to us by our friends at FireHydrant, and I’m Corey Quinn. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice, along with an insulting comment that will never see the light of day because that crappy platform you’re using is having an incident that they absolutely do not know how to manage effectively. Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com https://duckbillgroup.com to get started.

35m
Dec 05, 2023
How MongoDB is Paving The Way for Frictionless Innovation with Peder Ulander

Peder Ulander, Chief Marketing & Strategy Officer at MongoDB, joins Corey on Screaming in the Cloud to discuss how MongoDB is paving the way for innovation. Corey and Peder discuss how Peder made the decision to go from working at Amazon to MongoDB, and Peder explains how MongoDB is seeking to differentiate itself by making it easier for developers to innovate without friction. Peder also describes why he feels databases are more ubiquitous than people realize, and what it truly takes to win the hearts and minds of developers. ABOUT PEDER Peder Ulander, the maestro of marketing mayhem at MongoDB, juggles strategies like a tech wizard on caffeine. As the Chief Marketing & Strategy Officer, he battles buzzwords, slays jargon dragons, and tends to developers with a wink. From pioneering Amazon's cloud heyday as Director of Enterprise and Developer Solutions Marketing to leading the brand behind cloud.com's insurgency, Peder's built a legacy as the swashbuckler of software, leaving a trail of market disruptions one vibrant outfit at a time. Peder is the Scarlett Johansson of tech marketing — always looking forward, always picking the edgy roles that drive what's next in technology. LINKS REFERENCED: __ MongoDB: https://mongodb.com __ Transcript Announcer: Hello, and welcome to with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is . Corey: Welcome to . I’m Corey Quinn. This promoted guest episode of is brought to us by my friends and yours at MongoDB https://mongodb.com, and into my veritable verbal grist mill, they have sent Peder Ulander, their Chief Marketing Officer. Peder, an absolute pleasure to talk to you again. Peder: Always good to see you, Corey. Thanks for having me. Corey: So, once upon a time, you worked in marketing over at AWS, and then you transitioned off to Mongo to, again, work in marketing. Imagine that. Almost like there’s a narrative arc to your career. A lot of things change when you change companies, but before we dive into things, I just want to call out that you’re a bit of an aberration in that every single person that I have spoken to who has worked within your org has nothing but good things to say about you, which means you are incredibly effective at silencing dissent. Good work. Peder: Or it just shows that I’m a good marketer and make sure that we paint the right picture that the world needs to see. Corey: Exactly. “Do you have any proof of you being a great person to work for?” “No, just word of mouth,” and everyone, “Ah, that’s how marketing works.” Peder: Exactly. See, I’m glad you picked up somewhere. Corey: So, let’s dive into that a little bit. Why would you leave AWS to go work at Mongo. Again, my usual snark and sarcasm would come up with a half dozen different answers, each more offensive than the last. Let’s be serious for a second. At AWS, there’s an incredibly powerful engine that drives so much stuff, and the breadth is enormous. MongoDB, despite an increasingly broad catalog of offerings, is nowhere near that level of just universal applicability. Your product strategy is not a Post-It note with the word ‘yes’ written on it. There are things that you do across the board, but they all revolve around databases. Peder: Yeah. So, going back prior to MongoDB, I think you know, at AWS, I was across a number of different things, from the developer ecosystem, to the enterprise transformation, to the open-source work, et cetera, et cetera. And being privy to how customers were adopting technology to change their business or change the experiences that they were delivering to their customers or increase the value of the applications that they built, you know, there was a common thread of something that fundamentally needed to change. And I like to go back to just the evolution of tech in that sense. We could talk about going from physical on-prem systems to now we’re distributed in the cloud. You could talk about application constructs that started as big fat monolithic apps that moved to virtual, then microservices, and now functions. Or you think about networking, we’ve gone from fixed wire line, to network edge, and cellular, and what have you. All of the tech stack has changed with the exception of one layer, and that’s the data layer. And I think for the last 20 years, what’s been in place has worked okay, but we’re now meeting this new level of scale, this new level of reach, where the old systems are not what’s going to be what the new systems are built on, or the new experiences are built on. And as I was approached by MongoDB, I kind of sat back and said, “You know, I’m super happy at AWS. I love the learning, I love the people, I love the space I was in, but if I were to put my crystal ball together”—here’s a Bezos statement of looking around corners—“The data space is probably one of the biggest spaces ripe for disruption and opportunity, and I think Mongo is in an incredible position to go take advantage of that.” Corey: I mean, there’s an easy number of jokes to make about AmazonBasics MongoDB, which is my disparaging name for their DocumentDB first-party offering. And for a time, it really felt like AWS’s perspective toward its partners was one of outright hostility, if not antagonism. But that narrative no longer holds true in 2023. There’s been a definite shift. And to be direct, part of the reason that I believe that is the things you have said both personally and professionally in your role as CMO of Mongo that has caused me to reevaluate this because despite all of your faults—a counted list of which I can provide you after the show— Peder: [laugh]. Corey: You do not say things that you do not believe to be true. Peder: Correct. Corey: So, something has changed. What is it? Peder: So, I think there’s an element of coopetition, right? So, I would go as far as to say the media loved to sensationalize—actually even the venture community—loved to sensationalize the screen scraping stripping of open-source communities that Amazon represented a number of years ago. The reality was their intent was pretty simple. They built an incredibly amazing IT stack, and they wanted to run whatever applications and software were important to their customers. And when you think about that, the majority of systems today, people want to run open-source because it removes friction, it removes cost, it enables them to go do cool new things, and be on the bleeding edge of technology. And Amazon did their best to work with the top open-source projects in the world to make it available to their customers. Now, for the commercial vendors that are leaning into this space, that obviously does present itself threat, right? And we’ve seen that along a number of the cohorts of whether you want to call it single-vendor open-source or companies that have a heavy, vested interest in seeing the success of their enterprise stack match the success of the open-source stack. And that’s, I think, where media, analysts, venture, all kind of jumped on the bandwagon of not really, kind of, painting that bigger picture for the future. I think today when I look at Amazon—and candidly, it’ll be any of the hyperscalers; they all have a clone of our database—it’s an entry point. They’re running just the raw open-source operational database capabilities that we have in our community edition and making that available to customers. We believe there’s a bigger value in going beyond just that database and introducing, you know, anything from the distributed zones to what we do around vector search to what we do around stream processing, and encryption and all of these advanced features and capabilities that enable our customers to scale rapidly on our platform. And the dependency on delivering that is with the hyperscalers, so that’s where that coopetition comes in, and that becomes really important for us when we’re casting our web to engage with some of the world’s largest customers out there. But interestingly enough, we become a big drag of services for an AWS or any of the other hyperscalers out there, meaning that for every dollar that goes to a MongoDB, there’s, you know, three, five, ten dollars that goes to these hyperscalers. And so, they’re very active in working with us to ensure that, you know, we have fair and competing offers in the marketplace, that they’re promoting us through their own marketplace as well as their own channels, and that we’re working together to further the success of our customers. Corey: When you take a look at the exciting things that are happening at the data layer—because you mentioned that we haven’t really seen significant innovation in that space for a while—one of the things that I see happening is with the rise of Generative AI, which requires very special math that can only be handled by very special types of computers. I’m seeing at least a temporary inversion in what has traditionally been thought of as data gravity, whereas it’s easier to move compute close to the data, but in this case, since the compute only lives in the, um, sparkling us-east-1 regions of Virginia, otherwise, it’s just generic, sparkling expensive computers, great, you have to effectively move the mountain to Mohammed, so to speak. So, in that context, what else is happening that is driving innovation in the data space right now? Peder: Yeah, yeah. I love your analogy of, move the mountain of Mohammed because that’s actually how we look at the opportunity in the whole Generative AI movement. There are a lot of tools and capabilities out there, whether we’re looking at code generation tools, LLM modeling vendors, some of the other vector database companies that are out there, and they’re all built on the premise of, bring your data to my tool. And I actually think that’s a flawed strategy. I think that these are things that are going to be features in core application databases or operational databases, and it’s going to be dependent on the reach and breadth of that database, and the integrations with all of these AI tools that will define the victor going forward. And I think that’s been a big core part of our platform. When we look at Atlas—111 availability zones across all three hyperscalers with a single, unified, you know, interface—we’re actually able to have the customers keep their operational data where it’s most important to them and then apply the tools of the hyperscalers or the partners where it makes the most sense without moving the data, right? So, you don’t actually have to move the mountain to Mohammed. We’re literally building an experience where those that are running on MongoDB and have been running on MongoDB can gain advantage of these new tools and capabilities instantly, without having to change anything in their architectures or how they’re building their applications. Corey: There was a somewhat over-excited… I guess, over-focus in the space of vector databases because whatever those are—which involves math, and I am in no way shape, or form smart enough to grasp the nuances thereof, but everyone assures me that it’s necessary for Generative AI and machine learning and yadda, yadda, yadda. So, when in doubt, when I’m confronted by things I don’t fully understand, I turn to people who do. And the almost universal consensus that I have picked up from people who track databases for a living—as opposed to my own role of inappropriately using everything in the world except databases as a database—is that vector is very much a feature, not a core database type. Peder: Correct. The best way to think about it—I mean, databases in general, they’re dealing with structured and unstructured data, and generally, especially when you’re doing searches or relevance, you’re limited to the fact that those things in the rows and the columns or in the documents is text, right? And the reality is, there’s a whole host of information that can be found in metadata, in images, in sounds, in all of these other sources that were stored as individual files but unsearchable. Vector, vectorization, and vector embeddings actually enable you to take things far beyond the text and numbers that you traditionally were searching against and actually apply more, kind of, intelligence to it, or apply sounds or apply sme—you know, you can vectorize smells to some extent. And what that does is it actually creates a more pleasing slash relevant experience for how you’re actually building the engagements with your customers. Now, I’ll make it a little more simple because that was trying to define vectors, which as you know, is not the easiest thing. But imagine being able to vectorize—let’s say I’m a car company—we’re actually working with a car company on this—and you’re able to store all of the audio files of cars that are showing certain diagnostic issues—the putters and the spurts and the pings and the pangs—and you can actually now isolate these sounds and apply them directly to the problem and resolution for the mechanics that are working on them. Using all of this stuff together, now you actually have a faster time to resolution. You don’t want mechanics knowing the mechanics of vectors in that sense, right, so you build an application that abstracts all of that complexity. You don’t require them to go through PDFs of data and find all of the options for fixing this stuff. The relevance comes back and says, “Yes, we’ve seen that sound 20 times across this vehicle. Here’s how you fix it.” Right? And that cuts significant amount of time, cost, efficiency, and complexity for those auto mechanics. That is such a big push forward, I think, from a technology perspective, on what the true promise of some of these new capabilities are, and why I get excited about what we’re doing with vector and how we’re enabling our customers to, you know, kind of recreate experiences in a way that are more human, more relevant. Corey: Now, I have to say that of course you’re going to say nice things about your capabilities where vector is concerned. You would be failing in your job if you did not. So, I feel like I can safely discount every positive thing that you say about Mongo’s positioning in the vector space and instead turn to, you know, third parties with no formalized relationship with you. Yesterday, Retool’s State of AI report came across my desk. I am a very happy Retool customer. They’ve been a periodic sponsor, from time-to-time, of my ridiculous nonsense, which is neither here nor there, but I want to disclaim the relationship. And they had a Gartner Magic Quadrant equivalent that on one axis had Net Promoter Score—NPS, which is one of your people’s kinds of things—and the other was popularity. And Mongo was so far up and to the right that it was almost hilarious compared to every other entrant in the space. That is a positioning that I do not believe it is possible to market your way into directly. This is something that people who are actually doing these things have to use the product, and it has to stand up. Mongo is clearly effective at doing this in a way that other entrants aren’t. Why? Peder: Yeah, that’s a good question. I think a big part of that goes back to the earlier statement I made that vector databases or vector technology, it’s a feature, it’s not a separate thing, right? And when I think about all of the new entrants, they’re creating a new model where now you have to move your data out of your operational database and into their tool to get an answer and then push back in. The complexity, the integrations, the capabilities, it just slows everything down, right? And I think when you look at MongoDB’s approach to take this developer data platform vision of getting all of the core tools that developers need to build compelling applications with from a data perspective, integrating it into one seamless experience, we’re able to basically bring classic operational database capabilities, classic text search type capabilities, embed the vector search capabilities as well, it actually creates a richer platform and experience without all of that complexity that’s associated with bolt-on sidecar Gen AI tool or vector database. Corey: I would say that that’s one of those things that, again, can only really be credibly proven by what the market actually does, as opposed to, you know, lip-sticking the heck out of a pig and hoping that people don’t dig too deeply into what you’re saying. It’s definitely something we’re seeing adoption of. Peder: Yeah, I mean, this kind of goes to some of the stuff, you know, you pointed out, the Retool thing. This is not something you can market your way into. This is something that, you know, users are going to dictate the winners in this space, the developers, they’re going to dictate the winners in the space. And so, what do you have to do to win the hearts and minds of developers, you have to make the tech extremely approachable, it’s got to be scalable to meet their needs, not a lot of friction involved in learning these new capabilities and applying it to all of the stuff that has come before. All of these things put together, really focusing on that developer experience, I mean, that goes to the core of the MongoDB ethos. I mean, this is who we were when we started the company so long ago, and it’s continued to drive the innovation that we do in the platform. And I think this is just yet again, another example of focusing on developer needs, making it super engaging and useful, removing the friction, and enabling them to just go create new things. That’s what makes it so fun. And so when, you know, as a marketer, and I get the Retool chart across my desk, we haven’t been pitching them, we haven’t been marketing to them, we haven’t tried to influence this stuff, so knowing that this is a true, unbiased audience, actually is pretty cool to see. To your point, it was surprising how far up and to the right that we sat, given, you know, where we were in just—we launched this thing… six months ago? We launched it in June. The amount of customers that have signed up, are using it, and engaged with us on moving forward has been absolutely amazing. Corey: I think that there has been so much that gets lost in the noise of marketing. My approach has always been to cut through so much of it—that I think AWS has always done very well with—is—almost at their detriment these days—but if you get on stage, you can say whatever you want about your company’s product, and I will, naturally and lovingly, make fun of whatever it is that you say. But when you have a customer coming on stage and saying, “This is how we are using the thing that they have built to solve a very specific business problem that was causing us pain,” then I shut up, and I listen because it’s very hard to wind up dismissing that without being an outright jerk about things. I think the failure mode of that is, taken too far, you lose the ability to tell your own story in a coherent way, and it becomes a crutch that becomes very hard to get rid of. But the proof is really in the pudding. For me, like, the old jokes about—in the early teens—where MongoDB would periodically lose data as configured by default. Like, “MongoDB. It’s Snapchat for databases.” Hilarious joke at the time, but it really has worn thin. That’s like being angry about what Microsoft did in 2005 and 2006. It’s like, “Yeah, okay, you have a point, but it is also ancient history, and at some point you need to get with the modern era, get with the program.” And I think that seeing the success and breadth of MongoDB that I do—you are in virtually every customer that I talk to, in some way, shape, or form—and seeing what it is that they’re doing with you folks, it is clear that you are not a passing fad, that you are not going away anytime soon. Peder: Right. Corey: And even with building things in my spare time and following various tutorials of dubious credibility from various parts of the internet—as those things tend to go—MongoDB is very often a default go-to reference when someone needs a database for which a SQLite file won’t do. Peder: Right. It’s fascinating to see the evolution of MongoDB, and today we’re lucky to track 45,000-plus customers on our platform doing absolutely incredible things. But I think the biggest—to your point—the biggest proof is in the pudding when you get these customers to stand up on stage and talk about it. And even just recently, through our .local series, some of the customers that we’ve been highlighting are doing some amazing things using MongoDB in extremely business-critical situations. My favorite was, I was out doing our .local in Hong Kong, where Cathay Pacific got up on stage, and they talked a little bit about their flight folder. Now, if you remember going through the airport, you always see the captains come through, and they had those two big boxes of paperwork before they got onto the plane. Not only was that killing the environment with all the trees that got cut down for it, it was cumbersome, complex, and added a lot of time and friction with regards to flight operations. Now, take that from a single flight over all of the fleet that’s happening across the world. We were able to work with Cathay Pacific to digitize their entire flight folder, all of their documentation, removing the need for cutting down trees and minimizing a carbon footprint form, but at the same time, actually delivering a solution where if it goes down, it grounds the entire fleet of the airline. So, imagine that. That’s so business-critical, mission-critical, has to be there, reliable, resilient, available for the pilots, or it shuts down the business. Seeing that growth and that transformation while also seeing the environmental benefit for what they have achieved, to me, that makes me proud to work here. Similarly, we have companies like Ford, another big brand-name company here in the States, where their entire connected car experience and how they’re basically operationalizing the connection between the car and their home base, this is all being done using MongoDB as well. So, as they think of these new ideas, recognizing that things are going to be either out at the edges or at a level of scale that you can’t just bring it back into classic rows and columns, that’s actually where we’re so well-suited to grow our footprint. And, you know, I remember back to when I was at Sun—Sun Microsystems. I don’t know if anybody remembers that company. That was an old one. But at one point, it was Jonathan that said, “Everything of value connects to the network.” Right? Those things that are connecting to the network also need applications, they need data, they need all of these services. And the further out they go, the more you need a database that basically scales to meet them where they are, versus trying to get them to come back to where your database happens to sit. And in order to do that, that’s where you break the mold. That’s where—I mean, that kind of goes into the core ethos of why we built this company to begin with. The original founders were not here to build a database; they were building a consumer app that needed to scale to the edges of the earth. They recognized that databases didn’t solve for that, so they built MongoDB. That’s actually thinking ahead. Everything connecting to the network, everything being distributed, everything basically scaling out to all the citizens of the planet fundamentally needs a new data layer, and that’s where I think we’ve come in and succeeded exceptionally well. Corey: I would agree. Another example I like to come up with, and it’s fun that the one that leaps to the top of my mind is not one of the ones that you mentioned, but HSBC—the massive bank—very publicly a few years ago, wound up consolidating, I think it was 46 relational databases onto MongoDB. And the jokes at the time wrote themselves, but let’s be serious for a second. Despite the jokes that we all love to tell, they are a bank, a massive bank, and they don’t play fast-and-loose or slap-and-tickle with transactional integrity or their data stores for these things. Because there’s a definite belief across the banking sector—and I know this having worked in it myself for years—that if at some point, you have the ATMs spitting out the wrong account balances, people will begin rioting in the streets. I don’t know if that’s strictly accurate or hyperbole, but it’s going to cause massive amounts of chaos if it happens. So, that is something that absolutely cannot happen. The fact that they’re willing to engage with you folks and your technology and be public about it at that scale, that’s really all you need to know from a, “Is this serious technology or clown shoes technology?” Peder: [laugh]. Well, taking that comment, now let’s exponentially increase that. You know, if I sit back, and I look at my customer base, financial services is actually one of our biggest verticals as a business. And you mentioned HSBC. We had Wells Fargo on the stage last year at our world event. Nine out of the top ten world’s banks are using MongoDB in some of their applications, some at the scale of HSBC, some are still just getting started. And it all comes down to the fact that we have proven ourselves, we are aligned to mission-critical business environments. And I think when it comes down to banks, especially that transactional side, you know, building in the capabilities to be able to have high frequency transactions in the banking world is a hard thing to go do, and we’ve been able to prove it with some of the largest banks on the planet. Corey: I also want to give you credit—although it might be that I’m giving you credit for a slow release process; I hope not—but when I visit mongodb.com https://mongodb.com, it still talks up front that you are—and I want to quote here—oh, good lord, it changes every time I load the page—but it talks about, “Build faster, build smarter,” on this particular version of the load. It talks about the data platform. You have not effectively decided to pivot everything you say in public to tie directly into the Generative AI hype bubble that we are currently experiencing. You have a bunch of different use cases, and you’re not suddenly describing what you do in Gen AI terms that make it impossible to understand just what the company-slash-product-slash-services actually do. Peder: Right. Corey: So, I want to congratulate you on that. Peder: Appreciate that, right? Look, it comes down to the core basics. We are a developer data platform. We bring together all of the capabilities, tools, and functions that developers need when building apps as it pertains to their data functions or data layer, right? And that’s why this integrated approach of taking our operational database and building in search, or stream processing, or vector search, all of the things that we’re bringing to the platform enable developers to move faster. And what that says is, we’re great for all use cases out there, not just Gen AI use cases. We’re great for all use cases where customers are building applications to change the way that they’re engaging with the customers. Corey: And what I like about this is that you’re clearly integrating this stuff under the hood. You are talking to people who are building fascinating stuff, you’re building things yourself, but you’re not wrapping yourself in the mantle of, “This is exactly what we do because it’s trendy right now.” And I appreciate that. It’s still intelligible, and I wouldn’t think that I had to congratulate someone on, “Wow, you build marketing that a human being can extract meaning from. That’s amazing.” But in 2023, the closing days thereof, it very much is. Peder: Yep, yep. And it speaks a lot to the technology that we’ve built because, you know, on one side—it reminds me a lot of the early days of cloud where everything was kind of cloud-washed for a bit, we’re seeing a little bit of that in the hype cycle that we have right now—sticking to our guns and making sure that we are building a technology platform that enables developers to move quickly, that removing the friction from the developer lifecycle as it pertains to the data layer, that’s where the success is right, we have to stay on top of all of the trends, we have to make sure that we’re enabling Gen AI, we have to make sure that we’re integrating with the Amazon Bedrocks and the CodeWhisperers of the world, right, to go push this stuff forward. But to the point we made earlier, those are capabilities and features of a platform where the higher-level order is to really empower our customers to develop innovative, disruptive, or market-leading technologies for how they engage with their customers. Corey: Yeah. And that it’s neat to be able to see that you are empowering companies to do that without feeling the need to basically claim their achievements as your own, which is an honest-to-God hard thing to do, especially as you become a platform company because increasingly, you are the plumbing that makes a lot of the flashy, interesting stuff possible. It’s imperative, you can’t have those things without the underlying infrastructure, but it’s hard to talk about that infrastructure, too. Peder: You know, it’s funny, I’m sure all of my colleagues would hate me for saying this, but the wheel doesn’t turn without the ball bearing. Somebody still has to build the ball bearing in order for that sucker to move, right? And that’s the thing. This is the infrastructure, this is the heart of everything that businesses need to build applications. And one of the—you know, another kind of snide comment I’ve made to some of my colleagues here is, if you think about every market-leading app, in fact, let’s go to the biggest experiences you and I use on a daily basis, I’m pretty sure you’re booking travel online, you’re searching for stuff on Google, you’re buying stuff through Amazon, you’re renting a house through Airbnb, and you’re listening to your music through Spotify. What are those? Those are databases with a search engine. Corey: The world is full of CRUD applications. These are, effectively, simply pretty front-ends to a database. And as much as we’d like to pretend otherwise, that’s very much the reality of it. And we want that to be the case. Different modes of interaction, different requirements around them, but yeah, that is what so much of the world is. And I think to ignore that is to honestly blind yourself to a bunch of very key realities here. Peder: That kind of goes back to the original vision for when I came here. It’s like, look, everything of value for us, everything that I engage with, is—to your point—it’s a database with a great experience on top of it. Now, let’s start to layer in this whole Gen AI push, right, what’s going on there. We’re talking about increased relevance in search, we’re talking about new ways of thinking about sourcing information. We’ve even seen that with some of the latest ChatGPT stuff that developers are using that to get code snippets and figure out how to solve things within their platform. The era of the classic search engine is in the middle of a complete change, and the opportunity, I think, that I see as this moves forward is that there is no incumbent. There isn’t somebody who owns this space, so we’re just at the beginning of what probably will be the next. Google’s, Airbnb’s, and Uber’s of the world for the next generation. And that’s really exciting to see. Corey: I’m right there with you. What are the interesting founding stories at Google is that they wound up calling typical storage vendors for what they needed, got basically ‘screw on out of here, kids,’ pricing, so they shrugged, and because they had no real choice to get enterprise-quality hardware, they built a bunch of highly redundant systems on top of basically a bunch of decommissioned crap boxes from the university they were able to more or less get for free or damn near it, and that led to a whole innovation in technology. One of the glorious things about cloud that I think goes under-sold is that I can build a ridiculous application tonight for maybe, what, 27 cents IT infrastructure spend, and if it doesn’t work, I round up to dollar, it’ll probably get waived because it’ll cost more to process the credit card transaction than take my 27 cents. Conversely, if it works, I’m already building with quote-unquote, “Enterprise-grade” components. I don’t need to do a massive uplift. I can keep going. And that is no small thing. Peder: No, it’s not. When you step back, every single one of those stories was about abstracting that complexity to the end-user. In Google’s case, they built their own systems. You or I probably didn’t know that they were screwing these things together and soldering them in the back room in the middle of the night. Similarly, when Amazon got started, that was about taking something that was only accessible to a few thousand and now making it accessible to a few million with the costs of 27 cents to build an app. You removed the risk, you removed the friction from enabling a developer to be able to build. That next wave—and this is why I think the things we’re doing around Gen AI, and our vector search capabilities, and literally how we’re building our developer data platform is about removing that friction and limits and enabling developers to just come in and, you know, effectively do what they do best, which is innovate, versus all of the other things. You know, in the Google world, it’s no longer racking and stacking. In the cloud world, it’s no longer managing and integrating all the systems. Well, in the data world, it’s about making sure that all of those integrations are ready to go and at your fingertips, and you just focus on what you do well, which is creating those new experiences for customers. Corey: So, we’re recording this a little bit beforehand, but not by much. You are going to be at re:Invent this year—as am I—for eight nights— Peder: Yes. Corey: Because for me at least, it is crappy cloud Hanukkah, and I’ve got to deal with that. What have you got coming up? What do you plan to announce? Anything fun, exciting, or are you just there basically, to see how many badges you can actually scan in one day? Peder: Yeah [laugh]. Well, you know, it’s shaping up to be quite an incredible week, there’s no question. We’ll see what brings to town. As you know, re:Invent is a huge event for us. We do a lot within that ecosystem, a lot of the customers that are up on stage talking about the cool things they’re doing with AWS, they’re also MongoDB customers. So, we go all out. I think you and I spoke before about our position there with SugarCane right on the show floor, I think we’ve managed to secure you a Friends of Peder all-access pass to SugarCane. So, I look forward to seeing you there, Corey. Corey: Proving my old thesis of, it really is who you know. And thank you for your generosity, please continue. Peder: [laugh]. So, we will be there in full force. We have a number of different innovation talks, we have a bunch of community-related events, working with developers, helping them understand how we play in the space. We’re also doing a bunch of hands-on labs and design reviews that help customers basically build better, and build faster, build smarter—to your point earlier on some of the marketing you’re getting off of our website. But we’re also doing a number of announcements. I think first off, it was actually this last week, we made the announcement of our integrations with Amazon—or—yeah, Amazon CodeWhisperer. So, their code generation tool for developers has now been fully trained on MongoDB so that you can take advantage of some of these code generation tools with MongoDB Atlas on AWS. Similarly, there’s been a lot of noise around what Amazon is doing with Bedrock and the ability to automate certain tasks and things for developers. We are going to be announcing our integrations with Agents for Amazon Bedrock being supported inside of MongoDB Atlas, so we’re excited to see that, kind of, move forward. And then ultimately, we’re really there to celebrate our customers and connect them so that they can share what they’re doing with many peers and others in the space to give them that inspiration that you so eloquently talked about, which is, don’t market your stuff; let your customers tell what they’re able to do with your stuff, and that’ll set you up for success in the future. Corey: I’m looking forward to seeing what you announce in conjunction with what AWS announces, and the interplay between those two. As always, I’m going to basically ignore 90% of what both companies say and talk instead to customers, and, “What are you doing with it?” Because that’s the only way to get truth out of it. And, frankly, I’ve been paying increasing amounts of attention to MongoDB over the past few years, just because of what people I trust who are actually good at databases have to say about you folks. Like, my friends at RedMonk always like to say—I’ve stolen the line from them—“You can buy my attention, but not my opinion.” Peder: A hundred percent. Corey: You’ve earned the opinion that you have, at this point. Thank you for your sponsorship; it doesn’t hurt, but again, you don’t get to buy endorsements. I like what you’re doing. Please keep going. Peder: No, I appreciate that, Corey. You’ve always been supportive, and definitely appreciate the opportunity to come on again. And I’ll just push back to that Friends of Peder. There’s, you know, also a little bit of ulterior motive there. It’s not just who you know, but it’s [crosstalk 00:34:39]— Corey: It’s also validating that you have friends. I get it. I get it. Peder: Oh yeah, I know, right? And I don’t have many, but I have a few. But the interesting thing there is we’re going to be able to connect you with a number of the customers doing some of these cool things on top of MongoDB Atlas. Corey: I look forward to it. Thank you so much for your time. Peder Ulander, Chief Marketing Officer at MongoDB. I’m Cloud Economist Corey Quinn and this has been a promoted guest episode of , brought to us by our friends at Mongo. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review in your podcast platform of choice, along with an angry, insulting comment that I will ignore because you basically wrapped it so tightly in Generative AI messaging that I don’t know what the hell your point is supposed to be. Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com https://duckbillgroup.com to get started.

36m
Nov 30, 2023