Name: Instructions, Custom Agents, Prompts, Skills - Oh My!
Uploaded: 2026-03-26T19:16:09.615Z
Duration: 59 min 32 s
Description: Instructions, Custom Agents, Prompts, Skills - Oh My!

Transcript for "Instructions, Custom Agents, Prompts, Skills - Oh My!": Hello, everybody. Where are people dialing in from? I'm Harold. I work on Versus Code and GitHub Copilot. So exciting to talk about the things I actually work on and share the best practices we've been seeing, talking to customers, talk to developers, talking to people on Reddit and on X. So it's been it's been fast moving. So we're gonna talk about agent skills today. Maybe as a icebreaker. So oh, yes. But, like, where people come from, people are answering. So what what is what is a favorite skill plug in MCP kinda in context engineering that you've had lately discovered or you've seen demoed or that you use yourself? I love to hear. Share in the chat. And then let's get the get the show started while you share your wisdom. So and so I work on Versus Code. So I gotta demo a lot of Versus Code, but let's go into, like, what we're gonna there we go. So it's agents, prom, skills, plug ins. Oh my. And I'm missing MCP from the list. So there's a lot of stuff coming. And the gist of it is all these customizations matter. This this context engineering we're doing with these primitives and that you need to do. And, hopefully, I can give you a very dense overview of how how to do it and, like, what all these are because we cannot really make the model smarter all the time. Like, the model, you ask the question, it will do the research, it will give you an answer, and every one of these loops stands on its own. There's memory on the GitHub repository side. There's context to building that the platforms do to make the model can the answer smarter over time. But the rest of it is really this context engineering that that you have to do, that your team has to do to give you the gift a agent the context it needs, the the tools it needs to actually make changes you expect it to do, and then teach it when to use it and and how to collaborate with you using these tools. And there's a bunch of stuff. If you look at this as, one of my repos, and we just shipped this really nice customization UI that gives you a nice overview. So all of these are coming together for the workspace you're working on from my own like, I like this, explainer sites to create really nice visuals, and they're all bundled together to build a smarter experience that's customized for the agent and for everything else. K. Those teams as well. One second. I think this and then closing that. K. Back to the show. So we know about those primitives. So and then you probably heard many of them before, and you're wondering how does one compare to the other. They're gonna show them quickly, how they actually look like in a real code base. But we have on the one side, we have instructions. And oftentimes, they're most people use now agents.md, which is actually a standard, and they're included with every session, which means you can actually go overboard with adding too much text into these files, and then they're always in the context window of the agent. And that context window is comes at a premium. It impacts performance. It impacts the quality of the agenda experience. So we need to break these down a bit more. So the other section is file based instructions. I'm gonna show this also later on, but they are basic pattern matching based on, like, what you're actually working on, so allow you to be more specific in some areas. Slash commands or prompts are allow you to kinda one shots with workflows. Custom agents, you can actually select in Versus code and switch over to, like, test or development, or you can run them in a sub agent. They allow you to build these very constrained specific workflows, and we've seen a lot of success people building those. And then since beginning of the year, we have the agent skill spec, and they're kinda somewhat in between, like, the prompts and customer agents and complementary in how they bring in reusable capabilities and workflows. And then we have hooks, as the latest addition, on top of skills and stuff like that. They allow you to now actually be very specific on when things trigger and add determinism to the agent flow. And we're gonna have some examples of that coming up as well. And then MCP, the the favorite, now over a year old, that allows you to connect more of your external services. As a bonus, what probably many people have now heard is that the latest hype is plug ins, and they're not necessarily adding new primitive that the agent is aware of, but they allow you to bundle them nicely together for your domain expertise and easier sharing. So let's take a look at how this looks like in a in a in actual project. So this is Versus Code, and this is not just Versus Code that I have opened. They make it bigger. It's also actually the Versus Code repo. So one of the demos I like to share is how we actually work as a team using these instructions because, like, you can actually open up github.com/microsoft/vscode and browse along with me to see how we've been combining these customizations. And to kick things off, I actually gonna open the chat customization UI that we just just showed on one slide. So these are all the agent skills, instructions, prompts, hooks, SP servers, and plug ins that I'm using right now within the Versus Code repository. So we have in the agents. We have my one of my favorite ones are these two. So agents wise, we have a data agent that can query our telemetry from white within this code. So I can basically where the event is defined, I can kick off this agent, and this looks like this. I can go over here, go into the data agent, and now I could ask questions about a specific event I have. And why this works? Because the the the data agent has access to the Azure MCP, which is used to access our data and run queries. It has a role objective for workflow. So that's one we have skills skills for accessibility testing, adding new policy, IBM or Azure pipelines, lot of the kinda inner loop, but also outer loop and how we deploy our instructions. Let's point this out first. So we have the API versioning, for example. And that's our first sample is it has these apply to patterns, and that makes it file based. So anytime somebody is in the agent editing these files, these instructions will be pulled in and describe the specific area with a bit more details in how this works and how it all works. We have a whole bunch of prompts in the workspace, and I have a whole bunch of more prompts in in my own setup, and I will show these in a moment as well. We have hooks and SP servers and plugins. So that's the whole shebang. If you're in Versus Code, we're actually just rolling out this chat conversation, so you might not be seeing it yet. I'm in insiders, which is our nightly build, but we're now releasing weekly. So anytime you update within next week, you should see chat cast customizations if you wanna check it out. K. Going back into, like, what our instructions are, because I already showed them. And to one degree let me show you one instructions I missed, is the Copilot instructions file. So that's in dot GitHub /copilot instructions. And this is the one that's always on, and it's very sexy and just a few pages long. And it looks at, like, what this project is or the agent as it works on anything. It needs to understand if it's working on Visual Studio Code, give you the general outline of where it's at and some of the core architecture bits. And that's kinda what we believe the agent needs to know every time it does anything in this code base, and those are the always on instructions. Then what I'm showing here on the right, again, this is what this apply to. So you wanna focus on instructions. That's what the best way to start is onboarding agents, context engineering agents for your code base is creating these instructions and starting with things that the agent gets wrong. So, basically, where does the h the LM's training data mismatch how you actually work? And you can see this when the agent makes mistakes. It is when the agent doesn't generate using doesn't use the right testing framework, maybe because it didn't look at the right file or it used an old reference that it found somewhere else. So it's really about being specific, like finding mistakes, and then fixing those in instructions. Then also explaining sometimes why. So if it finds moment dot j s, is there a reason, like, then tell it, actually, you should use date FNS instead and bring some examples in as well. So code examples can be either a short code code sample, but they can be also be a link to the file that has actually follows the gold standard of how you wanna have code written. Sometimes people go overboard, so I think it's always good to ask the agent, cut it by half. Like, cut everything that's obvious if you just look at the code so you don't need to document how everything works in your code base. If an agent can just look at most files and understands what's happening. So you wanna focus on areas that the agent gets wrong. So that's the starting point. Next up is Skills. I already showed some skills. Just to give an example of what skills will look like, we have there's actually a public repo that I use in the workshop. Actually, today, I'm gonna give the workshop again. But to give you an idea of a skill that I really like, that's kind of very exemplary of, like, what what skills can be. This is a front end design skill, and so two things what it has. So one, it's it's has a name, and that's just to identify it. Actually, the folder, it has to be in the the same name. The more important part is that subscription. So by default, your agent will only see the subscription. Basically, it has a list of these are the scales available, these are the names, and these are their descriptions, which means the description has to really cover on when to load up the rest of the skill, and that's called progressive disclosure. So by default, the agent will only see description, which tells it load this when. And then in this case, it's use the skill when user asks to design and build web components. What's powerful with skills, in this case, the skill only has one file. Look at Versus Code. I do wonder that we have skill that has more. Policy, one file, one file. So that's a the basic skill is one file. And in this case, the skill comes with a script. So that's a cool part of of what what skills can do versus instructions. Skills can actually a, their progressive view is closed, you have a description on when to use it. And then once it's loaded, all of that the full file will be in context, and that that gives the agent overview. And then on demand, it can access the rest of these files in that folder, in this case, a script. So you can combine more references and have them progressively disclosed. You save context window if that initial scale dot m d is very small and targeted, and then it'll have, like, oh, if you need to script, here's the script as well if you need to monitor or build. So that's the benefit. I mean, like, more here. So all small skills. So that's that's where you start, and then you try to break out anything that's more optional into its own file or a script. So the next one so and then so what what skills are for versus instructions? So instructions are great at describing the code base you're currently looking at. So you wanna mostly focus them on this specific code base and then describing how the code works. In my case, so we already saw we have, like, one for CI analysis. So these are all specific capabilities, the domain suite task that the team did in the past and then that we wanna do again. So let me show you an example of this one. To get on, so I gotta run the app. I gotta open my my demo, which is a Mingo Mixer app. And now I'm gonna ask it, dark food or critically review the app as a dark fooder and provide feedback. So what does it do now? So the agent actually has access to the browser. So I'm gonna give it access to the browser. I'm gonna it can run anyways. So I'm giving the app. Gonna let it runs. And in its tools, down here, configure tools, we see there's access to the browser. So it's gonna open the browser and just do amazing things. So we're gonna run on SONET medium to get the faster response. So that's right now the the no no skill needed version, and it's gonna run. It's just to open the browser. It's connecting. It's captured as a screenshot. We have this nice carousel now. We can look at your screenshots, and it's gonna use the app, so it's not just smaller to yep. So it's cool now. The agent's gonna use everything it has in in in the browser and gonna actually click around and, use the application. So it checked off the first. I'm gonna do more triggering. Oh, that's not gonna that's that one. So yep. So we're gonna do that. Let's go back to skills. Let let it finish its thing, and then see see what it thinks about our little application. So now I I did this dark footing once, and now I actually wanna automate that. Like, dark fooding, if you haven't heard of Determia, that's basically a the idea of you using your own product and you acting like a user and having shaping your bringing your taste and your opinion with empathy of who the user would be. Like, we as Versus Code team, we dark food Versus Code because we use the product daily to work on Versus Code. For the agent to give it a dark fooding mindset is a great way to trigger the agent into a more, like, critical user centric way of testing the app versus just telling it to do a UI review where it might find accessibility, sizing problems, bugs. But with dark footing, you're expected to find more taste and opinion as well in the application. And then once you had a skill, because they just sit in context and are always available, you can actually compose them. So let's go back to the finish. Oh, we got a beautiful. Let's make it big. If you haven't used that yet, that's our kind the maximized chat views, so you can see all your sessions going on or have a lot more space for reading. So you can see it took a bunch of screenshots. It finished the bingo game, and I went back to the original point. So critical. Identical boards every game. No way to unmark a square. That seems a bug. You're stuck. Back button's out and debipes all progress. I think the last review I did was, like, the design is so boring. It's, like, filling out a form. So which you wouldn't get from a UI view. You would get, oh, it's a it's a working UI. Wonderful. But what we can do now, like, this is a cool review, and I wanna do it again. I wanna fix stuff and wanna do it again. But now I can actually say, create skill for dark fooding. Because it did dark fooding. It did the flow. It did the analysis. It used the browser. It gave me response. I do like the format, the polished designs, good breakdown, and I wanna do it again. And now I can just bring it in as a skill because the agent understands how to create skills. I get that out of the box. So it's we actually have a skill, so it's very meta. We have a skill that's called agent customizations, which is all the informations, information on how skills work. It's reading that, and then it will create the skill for me. And then once I have it, it will put it in into the project as well. If it has any clarifying questions, it will ask those as well. But that's, I mean, that's what I see most people do to create skills. They use the agent. They give it more context, and they work with the agent to create the skill. So very few people I see handwrite their skills. So you see open the browser. It's understands the workflow, generates the feedback, kinda analyzes the conversation we just had, and puts it back. Okay. It's great. Once it creates a skill, I guess it restarts it. So there's a draft now, and that's kinda the outline of what it will put in a skill. So open the app, walk through user journey, trigger key states, read the source code just to check, and then findings. So pretty cool. And that's composable now. I can now, for example, ask it, review the code, which could be one skill, and then also do dark foodie. So combining multiple skills and training them through natural language can also work. And they're very efficient to load if you follow the best practices. So you wanna keep your descriptions short because they're always in the context for the agent, and you wanna make sure they are spot on for when you think this agent is needed. Then you wanna bundle these helper scripts. So in this case, you saw a TypeScript that's actually being executed. You want the agent to just write scripts sometimes, but you wanna use a specific script. The agent is great at writing its own scripts, but bringing your own in adds to the safeguard. I mentioned this progressive disclosure. That initial skill dot m d will always be loaded. Sometimes might be loaded by default or by accident, like you saw my agent loading a front end design skill, probably looking at it just for reference, but it it shouldn't wouldn't need to. It's not a front end design question I ask. But because of front end, it thought it would be it still loaded the SkillMD. So you wanna make sure you break these out and reference the files. Just point to other files as needed to make that initial SkillMD small. I think it's important as you design these skills to also test them across different harnesses, like Codex, ClockCode, and Gemini. You don't wanna over specify it for, like, one harness and and their specific tool names as well. As you think about scripts, think about more deterministic behavior that you wanna capture in these in these scripts. Think about how the agent actually will look at the results for the scripts. So can there be more context generated from the scripts for the agent? And then also self documenting in the Skill MD actually tell the agent when and how to use the script. So that's oftentimes a question I get like this. Oh, there's a script in the Skill MD or in the in the Skill. Is it just run? Like, do you is it loading the skill and just runs the script? No. It's all it's the agent that will use the script as needed, so you need to be specific about when and how it uses it. K. So in our case, we now have a new file. It created a dark footing skill. And if you look at it, accepted, we have you are sharp opinionated dark footer. So gives you the persona, Description is pretty review or running web app. So that's kinda what it does. And then there's a use when asked to dog food. So it gives it a few keywords and tells it then, like, what will happen if this is being invoked. So that's that's all the agencies initially, and then it will load the rest. So not necessarily we could break this down maybe more, but all of this context is needed all the time, so we don't necessarily need to. Also, when you run this in this code, you'll get actually some example problems with this kind of the one I did, but you also get some other ideas. Like, maybe there's a fixed mood doc putting skill that we wanna do to, like, prioritize a plan from it. So there's a bunch of really helpful things that come with with these creation flows. Okay. Next up. So when just a reference, like, it's not just about workflows. You can also give API references. We've just showed product verification. Think about data fetching, what I showed for Kusto data. Business automation, can you recap? Can you fetch the latest meeting notes using MCPs like WorkIQ? So think about how you compose these to automate work more of your workflows. Prompts. One one prompt, just to show example, is I have in this repo is a setup prompt. I should do that UI. So my prompt in this repo is a very specific one. Like, I want this setup prompt to only run when I invoke it. I don't want the agent randomly trying to successfully build and run the workspace and try to set it up. So this is a new user tool to just get ready with this repository quickly. And it's cool because the agent actually checks dependencies and does linting and testing and all that, but it's a workflow I do wanna have control over because now I can actually go into the chat and do slash setup and get that running. And that's if I the other thing I can do with my prompts, if I have the the read me up in this, you can actually feature some of them as well. So you can think of in your read me, just tell people to run a specific prompt. But I don't want them to in natural language, ask the agent, please set up this repo, which could work, but I wanna have more more guidance in there. Custom agents is the next primitive, and that's where I think most the most resonating part is, like, think of it as a persona. One of the big custom agents we have is the plan node. If you if you're not using plan node, you're using AI wrong plainly, but always work with the AI and collaborate on the plan. In our case, you can just click on plan mode and actually read it. Which tools does it have access to? It has a subset of agents, and it will do a handoff in the end once it's done. So these are all primitives in agents. They can define tools constrain down what is actually happening in the agent, which doesn't work in in skills. And I can actually pick them on the top level. The other agent that's in this in my repo is a test driven development agent, And this is a combination, actually. This is more of an orchestrator. I have the TDD agent, test driven development, which has three phases, green, red, and refactor. And there's these settings as well where I can say, like, this TDD agent, it shouldn't actually be called by my agent or just be used. I wanna pick it. It's a user user selected model user selected sub agent agent. But then it has agents at its in its workforce, kind of. And so the the red agent, TD red, is not user invokeable, so it doesn't show up in this list down here. But it's available to the TD agent because it defines in in its list. So once, basically, what the TD agent will do, it will invoke TD red to write failing test, TD green to write minimal limitations and run tests, and then run a TDD refractor if if more work is needed. And so it it spans out to these other agents, and they have specific tools to a specific style of working, very minimal, but only in temptation, do not touch the test when you agree because you're supposed to make make the test pass. So that way you can you can compartmentalize and make these composable in a way and give them limited access as well. So what's what's important then as well, they they they can run as sub agents. So what I just showed with TDD green TDD and TDD green is that TDD green actually run-in its own context window. So TDD will say, please now with the tests were created, please now create these the the implementation. So but then TDD red and green will actually run-in its own context window, which means if I do let me go back to one of my prior sessions. That's archived. So we see we see it here. So we have this bigger. So TDD red road failing test. So TDD red is just looks like a tool call in that flow. But what happened with TDD red, they actually got a full prompt on what it's supposed to write, what to implement. We got a plan from the TDD supervisor agent. And then once it was done, it returned a response, and that's all a sub agent is. It gets an input, and it does its work and returns the output. And all of the when when we say isolated context is that all of these tool calls happen in their own agentic loop, and all the the TDD agent got back is this response, which is a really powerful way to reduce context and save tokens, especially if they have longer agentic loops. And then my PowerPoint crashed. K. Bouncing back to PowerPoint. Yes. I'll also share the, yeah. This is the repo. So if you wanna Versus codeagentlabs.coms, you wanna open it up. Otherwise, somebody will share it back to the stack, hopefully. Meanwhile, my slideshow's back. Nope. No. Didn't want to do a slideshow. I did have it configured to show in a window so I can demo. There we go. And start. Okay. Wonderful. We're back. From agent customer agents. So agents, users will switch to it. Plan agent example or the TDD agent. And sub agent, the orchestrator works. It could be my TDD agent, could be my main orchestrator, allows you to isolate. Next up is MCP. So MCP would be its own webinar, and we could run it on a monthly basis and still learn something new. I don't wanna go deeply into it. I I work in MCP as well. I talk a lot about MCP. I love talking about MCP. So if you wanna learn more about MCP, ask for another webinar. But the the gist is, it's an open source standard for connecting AI application to external systems. And the, there's a many questions around then skills versus MCP, skills versus CLI, and MCP versus CLI tools. But the overall, that let's let's compare to that people have been asking me. So if you look compare agents versus skills, you you see that I could in my agent for TDD, I constrained its tools down to writing code and running tests. I don't want it to search the web or check the GitHub's issues on something. I just want to reduce it down to, like, one specific workflow so that customization allows you that. Then you can also have this custom system prompt, and that allows you to to make it really sticky. So in a skill, the file is loaded, and then it's part of the context. And then it might load other skills, they might be conflicting. In an agent definition, it just knows that agent system problem. That makes it really sticky. And even over longer session time, the the agent prompt will stay top of mind versus scale, which is more of a complementary knowledge context bit. So that's one comparison. So if you want highly deterministic behavior or more deterministic, go for agents. If it's more workflow and guidance, it's skills. Then the bigger question I get more is skills versus MCP. So you will see that there's a lot of hype around just skills shipping CLIs, but then eventually, even for the Playwright skill, you still need to suddenly install the Playwright's CLI. So suddenly, you have to manage both that you have a skill and a CLI that is installed from somewhere else. Meanwhile, you don't wanna have necessarily a skill for Atlassian. You just wanna connect Atlassian with by logging into Atlassian, and then the MCP handles all the authentication. So usually when you have a stateful environment where the where, like, it opens a browser, it does stuff that's easier with MCPs as MCPs is very contextual and stateful in its spec design, but also it has built in of in all the goodies and the security of what you need for for well done authentication and not having secrets sitting around in your agent that will be used by the CLI. Skills, meanwhile, if you have CLIs, if you have build tools, if you have test runners, they they can make great use of that and allow you to formulate workflows, and they can be combined. I guess you're gonna show off in a bit. I gotta show it off now. I guess it fits better. The Pigma, for example, if you wanna check them out, they we did a livestream two weeks ago on our YouTube channel for Versus Code. They have a this this Pigma plug in, and, actually, it combines skills for different Figma workflows, but it uses underneath the MCP that comes from Figma as well. So the MCP is on the on on the domains in HP. It updates. They have full control over it. So you see the benefits of workflows plus tooling from the MCP and authentication on top. Okay. Hooks. And that's where I see less least hands up of people having tried it. But it's also a really powerful system. It's just a lot harder to manage because there's more determinism and scripting behind it, so it's not your usual writing some natural language. So we we see them, oftentimes used in by customers who wanna prevent behavior that have seen they have seen from the agent, and they always wanna prevent that. So you can actually a hook can run before any tool is executed. It can then look at the tool that the agent wants to execute and can block that. So that's one way. That's the deterministic behavior. There's no, there's no skill that says it, please don't delete files because the agent can occasionally ignore that. So it might work in 80%, but 20% is still like, oh, I really need to delete that, and there's a good reason that I'm gonna still do it. With hooks, that doesn't happen. If the hook prevents any use of RM, the hook will always prevent use of RM. If the agent finds another way to delete files, then your hook should cover more more aspects of that. So we see enforcement, and then the second one is code quality. And the example here I have in the code base is the TDD green agent, and that's actually an interesting way of composability. I have a hook in my custom agent now. So this hook is a stop hook, and stop hooks are executed when the agent stops. In this case, when your sub agent would return something, actually, that hooks runs before the hook before the sub agent returns, and it runs this this script. So it runs this as one for Windows and one one for Shell. And, basically, it will run the tests. Run test. That's that's run test. And then if the tests fail so NPM test. If the tests fail, it will return the test of failing, fix notation until all the tests are passing. So you give it this feedback because it tries to stop. You tell it, nope. There's tests to be done, which can lead to yeah. I that, like, it will always happen. Maybe you wanna limit how how many times you do that. Maybe the agent is unsupervised and just goes into a loop, but allows you to not have the agent finish and say, I all my tests are passing at 7%. There are only two tests that I didn't touch that are somehow red, but I that was me. So that can happen with agents. I I see it happen. I know you also had happened with you probably at some point. That hooks prevents that. Also, to check context, because hooks can return more context. We have one, for example, in our Versus Code docs when the agent edits a file without reading our documentation guidelines. We actually tell it, like, yeah, you updated that file, but you didn't read the guidelines. Now read the guidelines and make sure that all your edits, pass validation, and approvals we have talked about. So, yeah, we have session start, so you can firehook when the user starts asking question to, for example, get you more context, get more resources based on the question asked. You can provide the pretool use is probably the most common, like the running an agent before any tool runs or after. Maybe you can wanna look at the response and do some validation on that or some formatting. You can control compacting and how sub agents work and eventually that that stop hook as well. So that's the current state. Mobile happens very early in the hook world as there's no spec behind it. So right now, everybody's just trying to match what's kinda happening in the ecosystem, but it's definitely a very powerful system for for various use cases. So last lastly, we have plugins, which already mentioned as a bundling mechanism. So if I look at the we already looked at the Figma plugin, for example. I can also look at the the other example I have is the the the the Azure skills. So Azure skills as well has several skills in this in this plug in. So they basically reflected all of the Azure systems in different skills to then break it down into more references for Azure AI. There's a lot more references in here, and they break it down and how to work. So it's very progressive disclosure heavy, but it's one yeah. It's one plug in with multiple skills to to ship that developer experience for Azure to agents. So you wanna they're easy to distribute Git rebuilds, and I'm show that. So this is basically the Azure skills repo. There's a plug in on JSON that just find defines how it works. And I could just now go into Versus Code, go here, and say install plug in from source and get the Azure skills plug ins. So that's that's as easy as they are to install because they adjust the GitHub repository. The other way is the marketplace. So if you haven't haven't seen the marketplace yet, you can just go on plug them out here. So bundling customizations, that's what it's all about. It's it's all just bundles, and then the ease of install is kind of the the key benefit of these plug ins. So if I go in my extensions, I'm actually now do agent plug ins. I'm fast on this, but you have, like, the agent plug ins here. It's actually powered by the GitHub slash awesome Copilot repository. So you have if you're if you're new to your customizations, so when you wanna try it out, WorkIQ, if you're on Teams, Microsoft, then that's a great place to get that to connect all your Teams messages and your Outlook and everything else to your get a Copilot. Obviously, there's a whole bunch of other stuff to explore. And then you can also customize those marketplaces. So if I go into my settings, it says from marketplace. I see I have both Copilot plugins already installed, also Copilot, and I can I can add my own? And for example, the one we have as a team is the Versus Code TeamKit, and that's what we see as a pattern for adoption is having your own internal curated marketplaces that allow you to customize on a team basis or even on an organizational basis. Again, it's just a marketplace, a JSON that describes which plugins are available, and I can just point Versus Code to this being a marketplace, and it will discover all these plugins and allow me to install them. So that's the power of plugins. It's discoverability and ease of installation. I already put out Azure skills. So you see it also combining MCPs with workflows. So same system and Figma, same as, like, access to Figma through the MCP and then provides workflows on top of that. So that's the power of plug ins you should. You could ship one one skill that describes how to use Figma and maybe uses the API, but through the MCP, they can really context optimize and context engineer everything for an agent. So that's the benefit here of MCP as well. Again, let's look at like, you have all these things now running in your agent, but how do you really understand what's happening? And that's what we've been focusing on at, Versus Code, specifically. How do we can we help you understand what's happening as a used agent? So let's go in this last doc footer review I did. So it did, you can always go in and see what it reads and see what it did, but it's there's still a lot of clicking around. So we landed when we already had this these logs, and and a view on the side so you can see the logs. So we're very open about what's our system prompt and what what we do is you can click you already could see all of that. And now we have made it a lot nicer. So this is the view for humans to understand what's happening. So this is the app review log, and I can go in and actually see, what was loaded. I can see this is the flowchart. So I can see which agent got loaded, which hooks got loaded, and what are the details here. So these are loaded. These are disabled, because cloud hooks are disabled, which is the setting we have if you wanna use your cloud code hooks. So you see all the tool calls, you see responses, full details. It's a lot of context. Like, why? Yes. Sometimes you wanna see all that, and you wanna click around, and it looks looks beautiful. You might also just look at the the list, the logs in different ways. But you also just might ask the the agent, the troubleshoot. Why did my plot oops. Not not. So so we expose the rich information I just showed you and the diagnostics to the agent as well so you can use it to troubleshoot, with the agent as well. Because the agent is amazing at just reading its own logs and understanding why things are happening. Sometimes you can ask the agent in the conversation, like, why didn't you look at my skill? And then tells you, oh, yeah. That description is totally on point. I should have looked at it. I'm sorry. But this this troubleshoot really allows you to go a bit deeper and understand when when either things are really broken, they're not being used at all, or you see strange behavior and you try to, fix it. But you can also use it to create your, to create more customizations on top of what the agent learned or made mistakes on. So, that's try that now. It's it's in insiders already. I think it's there's some settings. We're just rolling it out to stable, but that's the the debug logs. If you if you struggle with customizations, you just wanna get a better understanding of how the agent works. There you go. The root cause is to be blocked. So it's and not in my settings. How to fix it, we can fix it here. So awesome. That was a I could've done it. I could've looked to my settings. I could've, like, Googled and everything else, but the agent can now answer these questions right away. Three things I recommend you to to look at if you wanna really scale scale customizations. The first one, I'm actually one of the contributors, is agent RC. So that's a command line tool slash Versus code extension slash CI tool to create customizations, to create initialize a repository for agents by creating customizations based on the code that's there. But then, also, it can run evals on these with these instructions. So it comes up with typical developer tasks and then see how they perform with and without the customizations in the repo, which I think, a, people struggle with what to put in customizations. And then, b, even once they have customizations, they don't know they they can drift over time as your code base updates, and that's a problem. You have to stay on top of that. So agent RC allows you to do that maintenance and detect if quality of your instruction from main site. The other one is Microsoft slash APM. They're all open source, nicely licensed, and easy to try out. So this is the agent package manager. If you think plug ins I just showed, they do have versioning in them, but there's no actual versioning system in plug in in the plug in spec. There's no spec yet. So there's no no versioning, no dependencies. So you still have a lot of wild growth around these kind of package ecosystems right now. A lot of organic growth, people are excited, but it's not easy to manage at scale. So this is this is one system that allows you to have well defined dependencies and versioning and something we hopefully roll out to more of GitHub Copilot as an ecosystem. So it's being adopted in more areas and becoming a kind of default for especially people who wanna have more control and treat agentic dependencies with more care. And lastly, this is actually pretty new, so I just, so I guess it makes it Wassa. Can't click it. Difficult. So Wassa is, another problem which you probably already had. Maybe there's a question already in the chat. How do I know my skills are good? I I just wrote them. Maybe you wrote them with the agent. Maybe you wrote them yourself. Maybe you installed them from somewhere, but you don't know if they're good. Like, they they might trigger it off enough. They might not actually do what they say they do. So Rasa is a CLI for eval ing agent skills. So full on evals, it will create test cases. It will run model judging. It will run benchmarks and compare results. It has different specific patterns it's looking for and can grade as well, so grading is involved. So Waza allows you to if you wanna really scale up your skill game and allow it to work across multiple users that just use them on a day to day basis. They don't wanna care about it. You just wanna ship great skills. Waza is something I recommend trying out. So these are three things. If you wanna go deep into skills and really want it that that big, which I do recommend. So I already showed one more examples. So how do you roll out and do it internally? So best practices I've seen companies being successful with, one is kinda where it mostly starts is community places. So give people a place to share what they come up with. Like, low bar, doesn't have to be high quality. As soon as you give it a bar and you give reviews, people will struggle. Like, I'm not sure if my skill is good enough. So you wanna have this organic bottom up growth and just give people a place to share and share their learnings. There might be some really good gems in there that you wanna then uplevel later on. Then teams are creative as well. Like, what what teams create for themselves doesn't necessarily work for other teams. A lot of people come up with their own agenda workflows, so teams need a place to share those. And lastly, think the gold standard is then once you have things you believe everybody needs access to, and that's the issue with the high bar. I see a lot of people struggle with that because they have this organic growth, and they're like, oh, we just ship the community sandbox to everybody, but they're not the quality. They're not evaled. Right? You don't know what people do, like, how they update them over time, but that organizational repository allows you to to bring back, like, the your specific PR reviews, your security guidelines that you feel, everybody needs. Also, an important thing about scaling plug ins and customizations beyond coding. I already showed a few examples of how can you do your CI creation, how you can think about my how you make slides, how you write a spec, how how could you prototype. So we do a lot of triage with with Adi as well. So we we move from monthly releases to weekly releases a lot by powered by the customizations that we put into our repo, not just for coding. So I'd recommend you to to expand your mind beyond that. So before going to questions, homework for you is if you haven't already, go into your repository, run the slash init command, and have it general instructions. That's kinda step one. If you run this in Versus Code, it will also recommend you some follow-up actions for skills and agents, which can be really interesting just to explore the different, kind of properties and see what the agent comes up with. It's really important you don't create them once, but what the agent gives you is a good first draft, and they will probably change over time as the agent makes mistakes. You wanna update those and and tweak them and make sure the agent doesn't make those mistakes again. So really make sure you iterate and have a mindset of these are in set in stone. And check out awesome Copilot for inspiration. That's a great place where community shares where Microsoft sharing many many of our teams are putting their stuff. So it's a great place get get an idea of what's possible with with all these parameters. End of slideshow. I'll share, and I'm ready for questions. Alright. Thank you so much, Harold. That was extremely informative, extremely dense. A lot of amazing engagement in the chat and in the q and a. Thank you everyone for for chiming in and and your thoughts. Quite a few sentiments that I wanna kinda summarize and see if we can get your thoughts on, Harold. Let me quickly pull up my notes. So, yeah, first question is how should a developer be thinking about maximizing their context window? You mentioned that there are certain trade offs to be made, and you don't necessarily want to be as verbose as possible. And then are there any tools available to see how much of the context window is being used? Yes. So if you in in this code, in the lower right, there's a little circle once you start chatting, and that shows the context window. Actually, Actually, we're shipping a few things where I noticed myself, I don't care about the context window as much, so the past recommendation was. So just to kinda change perspective a bit, past perspective was, if you have a new task, create a new chat, which is still a good pattern, like, creating a chat. But, also, if, like, if you another suggestion was if you hit the end of the context window and you're on a task, you might not you might actually wanna move it over into a new chat, which I think you don't need to do as much and anymore. So that's that's one thing you're doing because we're storing more checkpoints along the chat where the agent can actually access its past context and thinking, we're not losing as much during compaction because it's still available. And we also, for example, saving the plan. If you use plan load, that plan is now stored into session memory for us, which puts it top of mind for the agent as it works. So I have the agent working on massive plans for hours in the same session, and it keeps compacting. It keeps cleaning up its context, but it still stays on track and just nails every single requirement because it's still in in in that session memory. So that's probably the one area. The other area with context is the, I think, concern about too many MCPs. So or too many skills. There is still a concern with too many skills. We have to land some work there, but the idea of too many MCPs is no longer a problem. We what we do, basically, reduce the number of tools available to the agent, and then the agent can enable more tools as it as it needs them. So it really changes that concern as well. But, otherwise, yeah, new task, new chat is if you notice yourself using the same chat the whole time, like, that's that's a bad pattern. But otherwise, if you have one task, stay in the same window and and keep going at it. Very cool. Yeah. It sounds like in an ideal world, Versus Code and Copilot will do all the work under the hood to make sure that you don't have to worry about that solving that problem. There. were a lot of a lot of questions around multi repo projects and workspaces with tested projects and how the current folder you're in or the current repos that you're working with, how Copilot will treat the skills and plug ins and custom instructions in each of those environments. How do you recommend tackling these scenarios where where you're working across. projects? So one kudos to everybody. If if you use a multi repo, there is multi route workspaces. You can just add multiple folders into one workspace, so that's that's the way to do and what I showed also, that's how I use Versus Code, to work on Versus Code because I have the chat repository and the Versus Code repository. And when I work in them, I can take PRs from the same Versus Code space, basically, to both places. So that's that's the way to use those. It'll automatically pick up. the skills in each of them. All the skills from both repositories are combined. I think there there is some work we did based on feedback that the the instructions from one folder were also applied the other folders. So that's what we we usually gave the the agent now more specific context around that, but most of the that should be automatic. There's definitely some bleed over from one into the other repo because the agent, when you open the agent, isn't isn't aware of, like, which folder you're in. So just be specific about what you're working on. That that should help if you see, like, the the wrong scope from one repo conflicting with another. That's that's an interesting point. Could kinda see if we can what we can do more there. Got it. And I think we recently launched the chat customizations view in Versus Code and preview, and I think that was, an becoming a pain point of all these different file locations for all these different agents, sub agents, custom instructions, and skills. And, that view you showed several times looks like a very cool way to to get a a better user experience in managing all of these different context tools. Could you talk about. that feature, how to best leverage it, and what it was designed to fix? Yeah. So we just I think just the day we tweeted about it, just so our our advocacy team talking more about it. So if we I think we have the the x tweet as well. But the idea is basically that my previous demos were just clicking through different folders and trying to find all my customizations to show them to people, and that's that's now that view. Like, you don't know especially we there's a lot more sources we now grab customizations from. For example, custom agents could come from the repository. It could come from my own, could come from my organization. So there's a GitHub managed system for organizations that they can kinda sideload the customer agents for all of their engineers, which is great to roll out the standards of how you work in an organization. But then you've wondered, like, where does this come from? And then how does this work with the other thing? So having kinda one the one pane view of that, that's what we're going for. Plus all the kinda learning flow as well, like, guiding you more. Like, do you wanna create something new? Do you have instructions? So we wanna make this also a place for people to figure out what to do next and quickly enable, disable things, finding recommendations eventually later. So I think that's already people ask, like, how how can I find those things when I use it? So that's also a great place where we can bring people the right content that they might need as as they work on stuff. Very cool. And I just shared the tech customizations post on the Versus Code release notes. So, definitely, the Versus Code change log is an amazing resource for keeping up with all these recent ships. Another question, kinda a little bit broader. Do you have any thoughts about where this space is going? I guess that could take you in many directions. There were some questions around MCP and whether it had the same momentum it had last year. What do you think you'll be working on in three months? Yeah. So we have people are solving kind of the same problem if you look at it in a big picture. Like, lot of these skills as a standard didn't define how you get these skills, but then people also found that they wanna bundle more of the same thing. That's where plug ins came in. So there's a lot of evolution. I think then you see plug ins already mentioned, like, doesn't have versioning and dependencies. I think we just need to mature the primitives we have in better places. Like, we're now in a bit shipping better enterprise, allow this policies for MCPs as well. So it's always about how do you, as a developer, can can use them locally and just easily create them, and that's where we have, like, these create skills flows, maybe providing you in Versus code also with a better way to control quality. Like, you wanna have this inner loop for creating skills and understanding, like, that's a good skill. Like, I wrote a great description. It, like, it's it nails always when the lens gets used, now that's, like, you typing up 10 different queries and trying to fix that. So we need a better, you know, inner loop experience there. And then but also at scale, can you can you have the same thing in CICD? So, like, the tools I linked, like, Waza are great examples of, like, what we see within Microsoft where teams create skills, and they they struggle with, like, is it better now? Like, oh, we need to really rework this. Like, I don't know. Like, how do I know it's better? So there's a lot of that that tooling. We work actually working on some in product prompt support as well where we can point out, like, those prompt points you may hear are contradictory. Like, you should fix that. So that's been where it's going from a like, our perspective inside the editor. But, otherwise, yeah, the MCP is still I see traction. It's it's a lot of services expose their their their APIs and everything via MCPs. There's a there's a growing ecosystem of, like, trusted places that you can get the MCPs from. I think the initial hype of, like, everybody creating local MCPs is a little bit gone because now we have CLIs and skills for that. So there's definitely a little bit of consolidation happening of work where MCPs versus skills versus CLIs work as well. Or how do you combine them, right, as well? And is that is is that plug ins? Is plug ins a good do people think about plug ins as a way to combine different features, as a way to essentially manage something in a marketplace so that it's easy to reuse? How how should people think about plug ins specifically? Yeah. I think it's it's the the latter, the ease of reuse. So I think that's where the ecosystem right now is more kind of wild growth of plug ins. People are just really excited to bundle all kind of things together that they found useful for themselves. So I think as a as I mentioned, like, have this community growth, have give give it a place, in in your in your team for yourself. Just put stuff somewhere, but then figure out, like, what is the right reusable, composable plug in that you can share across an org. I think that that's the ongoing challenge, especially since you don't have an easy way for dependencies. So you're kinda but plug ins are definitely the way to go to to think about, like, multiple skills together so you can still reduce surfaces and and not overload everything. There is related, there is for example, in the MCP spec, there's an open extension to provide skills within MCP's. So that makes even, so as an MCP author, you could think like, oh, I wanna add workflows on top of this MCP, so I need to show a plug in. Like, Figma could just say, like, these these skills are all coming from the MCP server as well. So you don't need to plug it. You just add the MCP, and it has everything you need. So there's some ongoing exploration, like, the best patterns are. So that's that's yeah. That's everything in the AI space. It's being explored. Amazing. Well, thank you so much, everyone, for for attending. Thank you, Harold, for for sharing your knowledge. I did share a link. So the keeping up with Copilot series is back next month with effectively measuring Copilot impact and ROI as well as making AI a developer team sport, after that. So I shared the series link, so be sure to register for those sessions. You can also share this link with anyone who missed the session, and they'll be able to view it on demand. So thank you again, everyone. Be sure to check out, that series link, and have a good day. Thanks. Everybody.