Posts


  • Llm-d team released new post which is focusing on intelligent inference serving and how LLM is different from stateless web requests. Worth to read.

    Looking little bit more deeper on what should be the right architecture for AI workloads in my homelab I came across llm-d. It has been launched by CoreWeave, Google, IBM Research, NVIDIA, and Red Hat. Their statement really resonates:

    The objective of llm-d is to create a well-lit path for anyone to adopt the leading distributed inference optimizations within their existing deployment framework - Kubernetes.

    Llm-d building blocks are vLLM as inferencing engine, K8s as a core platform and Inference gateway to provide intelligent scheduling which is build for LLM type of workloads. I would highly recommend spend a bit more time to read through their announcement which explains very nicely the differences between typical workloads and LLM workloads.

    Even though the focus is on deploying large scale inference on Kubernetes using large models (e.g. Llama-70B+, not Llama-8B) with longer input sequence lengths (e.g 10k ISL | 1k OSL, not 200 ISL | 200 OSL) and mostly tested on 16 or 8 Nvidia H200 GPUs but there are parts like Intelligent Inference Scheduling which has been tested and run on single GPU.

    # 09:31 PM / AI, Linux, LLM

  • Measuring GPU Passthrough Overhead Using vfio-pci on AI Linux. I’m currently in the stage I’m assessing the impact of hypervisor on GPU performance. Does it make difference if I’m running AI workloads on Kubernetes or on hypervisor? It looks like the impact on performance of using GPU Passthrough seems negligable.

    Abylay Ospan who is one of Kernel maintainers:

    The performance impact of GPU passthrough via vfio-pci in AI Linux (Sbnb Linux) is impressively low-averaging around 1-2% across a range of LLM models. This makes it a highly viable option for running accelerated inference inside virtual machines, enabling isolation and flexibility without compromising performance.

    # 09:31 PM / AI, Linux, Homelab

  • I’m rebuilding this site. Since I got big plans in following months with building and playing more around on-prem AI infrastructure I feel like it would be great to give my site little bit of refresh. I have decided that I want to own my content - posts, shorter posts etc. so I will be posting mostly here with using Bsky, Twitter and LinkedIn to broadcast this content. This site will be in few months migrated from AWS on my own server. I have tried to give it much cleaner and simpler look. It should be easire to maintain and hopefully easier for you to read.

    I have changed the layout of the website so now I can write shorter posts and publish them quickly here without thinking about titles etc. Let’s see how this will go.

    # 09:31 PM / Blog
  • AI’s Security Crisis: Why Your Assistant Might Betray You. I have signed up to Corey’s mailing list probably 7 years ago and reading it almost everytime since then. It was especially at the beginning great way how to learn more about AWS. Just by accident and ran into his podcast with Simon Willison. I have followed Simon on Bsky for a while and he is in my opinion currently the most inspiring persona in AI space. He is talking about AI inferencing price and impact on environment, open source, blogging. I like the big where he is explaining how he is talking to his AI assistant when he goes for a walk. I thought I’m the only doing it and feel uncomfortable about it. Highly recommend it.

    # 09:31 PM / AI, Podcast

  • My new home AI server - Part 2 - MOBO, CPU, Memory, Storage etc.

    In this post which is part 2 of the series where I’m building my home server I will be focusing on the rest. The previous part was about GPU this time I will be focusing on MOBO, CPU, Memory, Storage etc. I have spent on this actually much more time than I was expecting. Maybe this will help someone in the future. At least it helped me to give my thoughts a bit more structure.

    [....2017 words]
    # 09:31 PM / AI, Linux, Homelab

  • My new home AI server - Part 1 - Intro, requirements and selecting the right GPU

    So far there were 2 big transitions in the networking space during my lifetime - internet and cloud. I was lucky enough to be part of internet transition end of 90s and beginning in 00s building the community internet provider Bubakov.net and oh how much I have learnt when building it! We used to take PCs, install Linux on them and run them as servers, routers and switches. We were learning how to do routing, deploy BGP or to deploy IPv6. It was absolute blast.

    Over a decade later I had lots of fun in Google and Natilik in mid 2010s building cloud infrastructure and my cloud skills. I was able to get lots of hands on as engineer, architect but also last few years as cloud practice leader and I do believe we are now in the next big transition in networking space caused by AI.

    I’m simple person. If I really want to understand something I have to build it, break it, touch it and so I feel like the only way to really understand how to build AI infrastructure I will have to start learning how to build it with my own hands. So I have decided that I will build my own server probably first time since 2005. I can’t be more excited about it! When I see what is happenning in semicondutor space for the last 5 years it feels like there is no better time to build own server than now.

    [....2472 words]
    # 09:31 PM / AI, Linux, Homelab

  • AI storage - MinIO AIStor and Pure Storage FlashBlade //EXA

    In this article I will not be talking about your typical enterprise. I will be talking about scale of OpenAI, Meta, Tesla and other large scale AI companies. I was planning to release this right after MinIO presentation but I didn’t because things has changed a week after the presentation with Pure Storage announcing FlashBlade //EXA which has removed some of the limitations. So I thought I will rewrite this to talk little bit more about what challenges MinIO and FlashBlade //EXA trying to solve and zoom in little bit into each product. I wasn’t able to test performance any of those solutions so even though performance is very important when it comes to AI storage I will be focusing purely on features and architecture and when it comes to performance I will work with some assumptions. I’m totally aware there are more products than MinIO and FlashBlade out there like WEKA but in this article I will be focusing only on those two.

    [....2193 words]
    # 09:31 PM / MinIO, Pure Storage, Cloud Field Day

  • The Race for Ultra Ethernet in Data Centers by Juniper Networks

    This is my first post from the analyst week I was invited by The Futurum Group to Tech Field Day. Juniper Networks was our first day and this post will be focusing on Juniper’s vision and approach to AI Datacenter Networking.

    There are more and more opinions that AI is at an inflection point. It seems that Generative AI has caused the public to wake up to AI technology, even though AI/ML has been around for at least the last 10 years. Whether this is true or not, I will leave for you to decide. What’s undoubtedly true is that this has started a race to build new infrastructure like we haven’t seen since the ’90s, and like in the ’90s, everything is about more and MORE bandwidth. Do you think 400G is enough? Forget it. 800G is now the minimum, and 1.6T is around the corner. And still, the network is the bottleneck.

    [....823 words]
    # 09:31 PM / Juniper, Cloud Field Day

  • Dell Technologies World 2024 - Day 1

    Dell Technologies is celebrating its 40th anniversary this year, and I’ve been invited by Dell Technologies and Arrow Electronics to Las Vegas to attend their annual Dell Technologies World 2024. Michael Dell immediately announced that this year’s Dell Technologies World conference is the “AI Edition,” which seemed to be the anticipated theme. He began his keynote by declaring that we are moving from the “age of computation” to the “age of cognition,” a statement I found quite profound. After experiencing the keynotes, it’s clear to me that few vendors are as well-prepared for this shift as Dell is.

    Dell’s strategy involves AI Factories, a collaboration with NVIDIA announced over 12 months ago under the project name Helix, which had its official launch at the GTC 2024 a few months back. This Dell-validated architecture introduces end to end AI infrastructure for enterprises. The Dell AI Factory architecture is unique because it’s comprehensive, encompassing compute, networking, storage, and more importantly endpoints. Currently, I believe no other vendor offers such an architecture—perhaps HPE/Juniper might in the future, but integrating these two portfolios will require time.

     

    [....657 words]
    # 09:31 PM / Dell Technologies, Conference

  • KubeCon 2024 Day 1: A Decade of Kubernetes

     

    As Kubernetes enters its second decade, the first day of KubeCon 2024 was marked by a series of great discussions and presentations.

    The opening remarks were delivered by a panel of experts including Priyanka Sharma, Timothée Lacroix, co-founder of Mistral AI, Paige Bailey from Google Gen AI, and Jeffrey Morgan, the creator of Ollama. They discussed the gap between Development and AI Research, similar to the previous gap between Development and Operations. Machine learning teams often overlook containers, a point underscored by Paige Bailey who noted that Google’s AI infrastructure engineering team is struggling to keep up with the demand for training larger models.

    [....345 words]
    # 09:31 PM / KubeCon, Conference

  • What did I Learn About AI in the Last 12 Months?

     

    Have you ever wondered what the future of AI will look like? How will it change our lives, our work, our society? If you are like me, you are probably fascinated by the rapid developments and innovations in this field. In this post, I want to share with you some of my thoughts and observations on the current state and trends of AI, based on my personal experience and research.

    The last 18 months have been absolutely fascinating. We have witnessed the release of ChatGPT3.5, a groundbreaking natural language generation model that can produce coherent and diverse texts on almost any topic. This model has sparked a lot of interest and excitement among both casual and technical users. For example, some people use ChatGPT to prepare a meal plan for the whole week, while others use it to generate code, poetry, or music. ChatGPT3.5 has also inspired an avalanche of new open source tools and models, as well as new vendors that offer various AI solutions and services. In Natilik, especially in the last 6 months, we have been trying to get our head around the use case for products like Co-Pilot, which is probably the first widely used ChatGPT product. We have also been exploring the infrastructure side of AI, such as open source LLMs, RAG, storage, network infrastructure, Azure OpenAI, etc.

    [....1060 words]
    # 09:31 PM / AI, LLM

  • Building AI Data platforms with WEKA

     

    In the realm of AI and Machine Learning (AI/ML) as well as image processing, the necessity for high-performance storage cannot be stressed enough. This involves high I/O capabilities with minimal latency, ensuring that operations are swift and seamless.

    Traditionally, many setups, whether in cloud environments like AWS, Azure, GCP, or Oracle, or on-premises, have been plagued by a primary challenge: storage that struggles to feed compute adequately. This issue is further intensified when when you are for instance company who is focusing on AI building LLM. These companies make significant investments in GPUs in public cloud, but often, the storage bottlenecks, characterized by prolonged load/write times, prevent them from achieving optimal performance.

    To address these challenges, a more streamlined AI data pipeline is crucial. A typical workflow might look something like this: Data Ingestion => Metadata lookups => High I/O operations => Data storage. The ultimate goal? To feed GPUs faster, potentially reducing the number of GPUs required or do more with the same amount of GPUs.

    [....486 words]
    # 09:31 PM / WEKA, CFD

  • Multi-Cloud Networking Solution by Prosimo

     

    There has been lots of good sessions and good conversations yesterday with vendors like Juniper or AMD but one company really stand out for me personally and that was Prosimo.io .

    I was following Prosimo.io last few years for multiple reasons. I was always big believer in multi-cloud, the team who have founded Prosimo.io is full of impressive people who for instance co-founded Viptela (Acquired by Cisco) or senior personas behind Cisco ACI BU and VCs behind this startup. Combination of these factors very often means that this is going to be interesting company. And sure it is!

    [....551 words]
    # 09:31 PM / Prosimo, CFD

  • Cloud Tech Field Day - Day 0

    Before Cloud Tech Field Day kicks off tomorrow, I decided to make the most of my Day 0 in Silicon Valley by meeting with some of Natilik key Multi-Cloud vendors. The day was packed with insightful conversations, and the first meeting I had was to have a breakfast with our newest vendor in our portfolio Spectro Cloud with their CTO, Saad Malik and Tenry Fu , Spectro Cloud CEO.

     

    We talked about inherent challenges faced by startups in securing funding. Saad, with his background as a co-founder, shared valuable insights into the tenacity required to navigate these hurdles. One of the most interesting parts of our conversation revolved around the ongoing debate between ARM and RISC-V. This isn’t just a technological competition; it’s a geopolitical shift too. We have discussed how this rivalry is shaping the tech industry and even impacting national interests.

    [....435 words]
    # 09:31 PM / Spectro, PureStorage, CFD

  • Quick summary on Pure//Accelerate 2023

      Pure//Accelerate 2023

    Before I will leave Pure Accelerate I wanted to quickly put together few thoughts on what I have seen this week in Las Vegas. This is quick dump which I have done in an hour so apologies for any typos. You could say that the only takeaway from Pure Accelerate should be that flash is displacing HDD which after many years of repeating the same message got little bit….flat but let’s not be cynical because if you look little bit closer there are some exciting innovations which are happening across PureStorage portfolio which will shape their portfolio for many following years.

    [....875 words]
    # 09:31 PM / PureStorage, Conference

  • CCW Estimate API Part 2 - Acquire Estimate

    If you have have finished reading the previous post I’m must congratulate you. I know that it was really boring and yes it must have been difficult to understand sometime but you have made it so now let’s do that fun part - programming it and testing it. For the purpose of this series I will be using Python but I know about engineers I’m currently talking they are using Go for this. I’m hoping these posts will help other engineers and we will see using this for other languages!

    [....1632 words]
    # 09:31 PM / API, CCW, DevNet

  • Intro into CCW Estimate API

    I work in pre-sales so working with CCW takes up a big part of my working day (joy!) and as a network automation enthusiast I’ve decided to automate the heck out of CCW. After a few people asked me to help them with CCW Estimate APIs, I decided to document the process so it can, hopefully, help others.

    [....1339 words]
    # 09:31 PM / API, CCW, DevNet

  • Traffic flows in Kubernetes Flannel

    I’m currently working on Kubernetes project and Flannel so I put together very short post about Kubernetes Flannel and how Flannel works from design and traffic flow perspective. It will not explain how to set up Flannel I may will do it in dedicated post.

    [....819 words]
    # 09:31 PM / Kubernetes, Flannel, CNI

  • Deep Learning with TensorFlow

    Before I will start talking about TensorFlow and Deep Learning let me give you disclaimer: I’m no expert in Deep Learning or Tensorflow but I love exploring new areas where I have zero knowledge and TensorFlow was great opportunity. This is VERY complex topic so I feel little bit bad that I had to compress so much information in such a short post but maybe I will write more posts about ML/AI and TensorFlow in the future.

    What do we want to build?

    • Car plate detection
    • Car plate recognition
    • Reading of car plate
    • Sending data to a database
    [....3583 words]
    # 09:31 PM / AI, TensorFlow, DevNet

  • Pure Accelerate 2019 conference

    We are living in a world where 86% of enterprise business are saying they will migrate all of their services into the cloud, but at the same time, 80% of the same enterprise businesses continue to spend on their on-prem infrastructure. What does this really mean? From my experience, I have found that businesses see the benefits of making the journey to the cloud, but for a multitude of reasons (skills, application requirements, bandwidth) they still maintain their on prem DCs. That’s why we have seen the emergence and growth of the term ‘hybrid-cloud architecture’, the best of both worlds.

    [....1447 words]
    # 09:31 PM / PureStorage, Conference

  • Intro into Cisco ACI and Terraform

    Recently I have started to explore Terraform as replacement of Ansible. Even though there are some great use cases for Ansible to manage your infrastructure as code through the time I have found out there are certain limitations. Luckily what I have also found out what that most of those limitations can be addressed by Terraform.

    [....1681 words]
    # 09:31 PM / ACI, Terraform

  • Running Home Assistant on Kubernetes cluster

    I have been using Home Assistant for a while to control my home devices. I have been running Home Assistant on one of my Raspberry Pis but with the cadence of new releases it became a bit annoying to upgrade Home Assistant every few weeks with occasionally doing roll back. To address those challenges I have decided to run Home Assistant in docker and since I was already running Kubernetes cluster at home I have decided to run Home Assistant on the cluster.

    [....1033 words]
    # 09:31 PM / Kubernetes, Home Assistant

  • My Tech Field Days 2019 in San Francisco

    What-a-week-this-was!  

    It all started around a month ago, when I was invited to speak at Cisco DevNet Create 2019 in San Francisco. It was such a privilege to be asked, to stand shoulder-to-shoulder with the top creative developers and visionary programmers. I am lucky to work for a company that also saw the value in me taking part and sent me off with their blessing. Everyone knows I am a well-versed and keen traveller, and I hop across the pond quite often, but this was my first trip of 2019 and opening up my own ‘Tech Field Days Blog’. I was keen to maximise every minute I could in sunny California - the home to some of the world’s leading tech companies and many of our trusted partners. Bring on the trip!

    [....998 words]
    # 09:31 PM / DevNet

  • CCIE SP - IS-IS Notes

    Routing domain
    • Network in which all routers run IS-IS routing protocol
    IS-IS areas
    • Network domain can be segmented => areas
    • Defined as stubs
    • All routers in the area requires to be configure in same way - either CLNP or IP
    • Level-1 routers are routing traffic between areas via closest Level-2 router
    [....2650 words]
    # 09:31 PM / CCIE, ISIS