An Introvert’s Guide to Pair Programming (Spoiler…It’s AI

Last week, I was invited to speak at South Florida Tech Hub's DevDive event. We talked about AI, specifically large language models, and how developers can integrate them into their workflows. If any of you were there, thank you for attending! All the post talk conversations were a lot of fun.

In keeping with the open-source community spirit, I've decided to share my unedited script below offering insight into how I structured the talk and managed time to maintain a smooth flow. Originally, I thought an hour of time might be challenging to fill, but day-of I found myself having a lot more to say more than the time would allow.

I’ll be posting a video on YouTube soon going through the Ollama and Continue install process. Subscribe to my channel to get notified when it’s live. For now, I’ve added all my recommended resources just below, and I hope that makes finding what you need easier.

Resources

Tools

News and Communities

Recommended Models

These are the models I'm currently running day to day on my 2021 M1 MacBook Pro with 16 RAM.

Chatting

mistral-nemo:12b (`ollama pull mistral-nemo`) Bar none, my current favorite model is one [built in collaboration with NVIDIA](https://mistral.ai/news/mistral-nemo/). Mistral NeMo is rock-solid at reasoning, coding, and chatting. And even more importantly, it's great at following explicit instructions. Great for all around day to day work. I throw text formatting, reasoning, and basic coding tasks and it handles everything like a champ.

llama-3.1:8b (`ollama pull llama3.1`)

From Meta, Llama 3.1 405B benchmarks near GPT-4 levels, although you most likely won't be able to run that model locally. Step down to the 8B model and you'll still get a great model to chat and work through problems on. I've noticed it's slower than Mistral NeMo, but sometimes it understands complex problems better.

Coding

granite-code:8b (`ollama pull granite-code:8b`)

A lightweight coding model from IBM trained on 116 programming languages. While it seems to fall apart at more complex tasks, it's fast at fixing and ideating existing codebases. I've debugged and tweaked multiple projects with just granite-code:8b at my side.

deepseek-coder-v2:16b (`ollama pull deepseek-coder-v2`)

This Mixture-of-Experts (MoE) model trained on 338 programming languages can most likely handle any dev project your working on. It's my favorite coding model, but I rarely ever run it since it slows my computer to a crawl. But if I'm deep in a problem and need some support, this is the model I'll call after granite-code.

Autocomplete Models

These models are strong replacements for GitHub's Copilot in VS Code. While not as creative with their answers, they can easily autocomplete whatever code your about to write, before you write it.

I prefer `granite-code:3b`, but `starcoder2:3b` seems to be a faster model in my day-to-day usage.

Where to Find Me


Script

6:00PM - Intro

Thank you to South Florida TechHub for putting this on. Thank you to Rebecca Bakels for thinking of me and inviting me here tonight. And thank *you* for showing up tonight, taking a chance on me, and helping to foster the developer community we have here in South Florida.

Who am I? Greg Barbosa

Why am I qualified to be here? I've recently sunset my conversion rate optimization agency and now working full time as the Director of Innovation and Systems at Avulux, a brand selling clinically proven lenses that filters light to help people living with migraines.

Before that, I was a writer and the product manager at the 9to5Mac network of sites. Before that, an iOS QA/test engineer working on MDM products directly with customers like RIM (Blackberry).

What are we talking about? Today we'll be exploring what it looks like to use generative AI, specifically large language models (LLMs), in your software development workflow. We'll kick off the conversation by going over the current ethical and societal implications of these systems, and how we can still engage with them locally and privately if we so choose.

When/where can I start using this type of AI? Today! There's a good chance you can run at least one AI model on your computer today, and we'll cover that.

Why AI? Our current stage of AI is the worst it'll ever be, and the best it has ever been. Thanks in large part to the scientists, mathematicians, and various levels of engineers involved in making this generation of AI, one that many of us can participate in and benefit from it.

6:10PM - Ethical and Moral Implications

Creative art theft

It's no secret that OpenAI and Anthropic, are actively using publicly accessible data to train their LLMs. It's also no secret they financially benefited from taking others creative work, repackaging it into a tool, and making millions.

publicly accessible data != public data

And while these models are trained to never explicitly recreate someone's work, the fact that they have to put guardrails in for that already means they have the ability to do so.

If these models were built off anyone's work, then these people should be compensated. That's where open LLMs come in to play. These large language models are made publicly available for others to download for free, inspect, and use. In some cases, these models are even licensed for commercial use.

Having free access to these models is great, but none of that absolves these companies from creative theft.

Privacy

Most interactions made through ChatGPT and Claude's websites are used to further train the in-house models. (But there are enterprise options out there to avoid this though.) This also means that your requests are running through their servers. If you're doing sensitive work, you might prefer to do it in a way that no one else even has the possibility of seeing what you're doing.

And that's the best part about these LLMs. Once downloaded, these models are also available offline. No internet access needed. And because they're *local* models, they never send or receive web data, unless you specifically tell it to. This means your conversations and work stay local and away prying eyes.

For my day job, I fed a local model internal work docs, and now it can quickly scope out copy and layout design for our product pages. And all of this, is offline.

Environmental Impact

The environmental impact of AI is still yet to be fully realized. The amount of compute and cooling needed to handle these systems is still in it's early stages. UC Riverside’s report states that for roughly every 10 - 50 GPT-3 responses, 500ml (or about one water bottle) worth of water is consumed.

This is the one of the biggest reasons I moved from private to open LLMs. If I had even a small chance of reducing my overall energy usage, why not try it out?

Small tangent

If the AI conversation on data theft and privacy concerns you, I implore you to also look into ad-blocking and web browsing privacy basics. For example, did you know that a few months ago Chrome enabled a feature that turns it into a top-of-funnel ad tracking platform?

Further Conversations

  • What happens to jobs when AI takes over?

  • Can you truly defeat *bias* in AI?

  • How do we define truth to an AI?

  • Dead Internet Theory

  • Will AI produce forgetfulness?

  • Who's to blame when the AI really goofs?

  • Environmental impact

  • How to avoid weaponization?

  • How to define "weaponization"?

  • Economic impact

6:20 - Where do I start?

To get started using LLMs locally, we'll be using three tools: Ollama, VS Code, and the VS Code extension Continue.dev.

Ollama

Ollama makes downloading and running large language models significantly easier. It's also the thing that will eventually connect with VS Code.

Models

We'll be using Ollama to download models. Think of models as the result of different people's improvements and takes on what an LLM can do. For example, there are some models that can "see" what's in an image, and some models that really good at coding work specifically.

VS Code

VS Code is Microsoft's code editor, free for all systems.

Continue.dev

Continue.dev is the VS Code extension that will connect our large language models downloaded from Ollama into VS Code. We're not only going to be able to chat with our code, but it'll give us the chance to integrate docs, write git commit messages, debug issues, and so much more. When used together, they may just improve your development workflow.

Why Continue.dev?

If you've been on X lately you've probably seen the major hype around Cursor. Cursor is an AI code editor based on VS Code. In the early days, Cursor allowed you to use local models for free, but eventually pulled that feature.

Continue.dev lets you use any LLM you want, whether it's local or something like Claude and ChatGPT. You can configure API endpoints, API keys, and models to your liking.

And although we're talking about Continue, there are other apps out there that work very similarly. And a lot of them support Ollama too!

But I'm biased here. I started the year as a die hard Cursor fan, but found my way to Continue and been a fan ever since.

6:30 - Get started using LLMs

Grabbing an LLM

Coding: ollama pull granite-code:3b-instruct-128k-q4_K_M

Why this model? Great for today's demo. It's 2.1GB total, has a large token context window size, and has a good balance between accuracy and performance for a local machine.

Chat: ollama pull gemma2:2b-instruct-q4_K_M

Why this model? Gemma is a 1.7GB size model that's great at chatting. It doesn't have an extensive knowledge base and seems to hit language barriers quickly, but it's great for back and forth chatting, especially with context.

On models

I've specifically chosen these models because they are small and have a higher likelihood of running on more machines. From experience, multiplying your system's VRAM by 80% will tell you the largest model size you could safely run. For example, if your computer has 16GB of VRAM, then the largest model you might want to try and download is 12.8GB.

At the end of the talk I'll share some other models I use, and what I use them for, some of these models are about 10GB large, but there are definitely even larger ones out there.

The larger the model's file size, usually the better it is, but also the more computing processing power you need to use them.

Configuring this LLM with VS Code

(in the virtual machine)

  • Install VS Code

  • Install Continue.dev in VS Code

    • Configure Continue.dev to use the Ollama autodetect feature

    • Drag Continue.dev to the right-hand side

  • Share some shortcuts

6:40 - How to use LLMs with your code using Continue.dev

What LLMs won't do

In the same way you have code completion, docs, and reference materials, LLMs are just another tool in your toolset. They are not a panacea for your development woes. They won't magically make you a better developer, but they sure as hell can help you speed that along.

Context

With LLMs, context is king. With Continue.dev, you could combine your project-specific code question, reference your entire codebase, point to official docs, and have your LLM give you an even better, more qualified answer.

Instead of copying and pasting code from a browser window into VS Code, you're having a context-aware conversation right where you code. I found this helps me stay in longer flow states too.

When using Continue and similar tools, start by giving your LLM context. I like to initiate conversations by describing my project, outlining my current task or goal, and then detailing out the project's tech stack. This context drastically improves the LLMs ability to give me actionable answers.

It's the difference between getting a response in JavaScript or TypeScript.

Roles

LLMs are phenomenal at what they do, and they do it even better when you treat it with a specific role in mind. Keep your questions aligned with that in mind, and explicitly communicate.

In my experience, you can treat LLMs like junior developers, senior developers, and/or pair programmers.

Junior developers

Without much context, you can often point the LLM at your code and have it update some functionality. It may not be the cleanest or most performant code, but it'll get things done quickly enough to validate functionality.

I recently used this method to debug an open source terminal plugin. The plugin lets you interact with LLMs, but based on my tests, I could tell the system prompt needed some tweaks. My only obstacle was the Bash script codebase, something I know very little about.

With the LLM, I was able to add logging into the app, stylize the UI in a way I like, and improve overall error handling.

Senior developers

These are the type of requests that go beyond simple code adjustments into writing out small applications. With the right context, goal setting, and clarity, you can work alongside the LLM to build out a small app with you.

I recently did this when I needed to run multiple whois lookups on a spreadsheet of IP addresses for a list of DNS records. I knew a short program hitting an API endpoint would make this work easier, so I asked the LLM to do it.

Original query:

Write Python that:
pings an IP address to confirm it's responding
After receiving a response it runs a 'whois' on that IP
Prints out in a Markdown table the ip address, the ping response time, and the name of who owns the address

Within two minutes I had a working app and was able to validate 25+ DNS records. In this case, it didn't matter if the code was performant or perfectly written, as long as it got the job done.

Remember to verify the code that the LLM gives you. Make sure you can read it to at least understand where to debug next. I once spent 2.5 hours debugging an LLM's JavaScript code because of random race conditions. It ended up being as simple changing a variable from `const` to `let`.

Pair programmer

And my favorite way of using LLMs with code is as a pair programmer. It's rubber ducky programming for the 21st century. I ideate projects and goals, scope out workloads, and even write snippets of code together. In these conversations the goal is to work through a problem, quickly understand the workloads of certain tasks, and MVP small functionality towards the bigger project.

Often times I'll ask it to push back with counterpoints to help expand my thinking.

This method of development has helped me think in new ways, work through problems faster, and give me the vocab I need to go on and continue learning.

Chat

The most basic way to work with your code and an LLM is to use the chat box built into Continue.dev. You can hop into there and start asking your LLM questions immediately. The power comes in when you use the '@' to give your chat context. You can reference

Typing @ allows you to select from a list of context providers to feed into your question. So if you want to chat about a specific file, or folder of files, or docs, or Google searches, you can use it here!

Fixing bugs

For example, I wanted to prototype a bug fix for the Zed code editor's new Ollama integration. It had hardcoded context length instead of using the model's specific lengths. I've never seen or written in Rust before, but I opened the project in VS Code and asked Continue "@codebase where in my codebase would I find where the ollama context length is being set".

After a couple of manual searches and a few back and forths with the LLM, I found the bug's location. After a couple more messages, I had a working fix that could query all of Ollama's models and automatically adjust the context length.

Note: I didn't submit this fix to the official repo. I wasn't confident in mine or the LLMs Rust coding capabilities that it would pass testing.

Using official docs

With the ability to reference docs, your LLMs get even more powerful. While learning Next.js 14 I'd often reference the docs to have the LLM explain something in different ways for me.

It's like a mini-search engine that connects your code to official docs.

6:50 - What else can I do with LLMs?

We've just scratched the surface on what's possible with these systems.

What kind of models exist? There are vision models made specifically for identifying content in an image. Models that are trained on large codebases to become solid coder LLMs. There are models specifically designed to maintain conversational flow and awareness that some D&D dungeon masters seem to like. Just yesterday an incredible text-to-speech model was released.

Now that you have Ollama installed, you can start using loads of third-party apps and plugins that work directly with your preferred's LLM.

Page Assist for Chrome lets you chat directly with a page's contents.

Ollama for Raycast lets you use pre-made and custom prompts to your LLM to get things done quickly. I have a custom prompt that takes my text, fixes up mistakes, keeps it in my voice, and makes it more concise. I use this multiple times a day with a keyboard shortcut.

Open WebUI let's you basically create your own version of ChatGPT with a local LLM, but gives you access to extra tools and functions made by the community. You can even define custom models that have pre-defined prompts with a specific knowledge base. I created a small version of this to help me quickly ideate landing pages for new products.

And there are so many more. Check Ollama's Community Integrations section for more!

Next
Next

The Reason Your Business is Stuck