Data Science & DevTools: GitHub Copilot
Renee Noble
Posted on March 20, 2024
Welcome to our third post of Developer Tools Week as we continue our learning journey into Data Science! Today, let's talk about the GitHub Copilot extension and how we can use it for focused learning inline, or in chat mode.
What We'll Learn
- What is GitHub Copilot?
- What can GitHub Copilot do?
- GitHub Copilot For Data Science
- Assignment: Visualize Data With GitHub Copilot
- Resources: Explore the 2024: Data Science Day Collection
Bookmark and revisit these for more information:
Resource | Description |
---|---|
Collection 1️⃣ | Skill up on Data Science Tools & Techniques |
Collection 2️⃣ | Skill up on Responsible AI Principles & Tooling |
Collection 3️⃣ | Build Generative AI Apps End-to-End with Azure AI |
You've created a consistent and reproducible environment with GitHub Codespaces. You've got that instrumented with Jupyter Notebooks for interactive coding and shareable insights. And you've setup a Visual Studio Code profile with Data Science extensions and settings for a more productive developer experience. You've even walked through the basic Visual Studio Code data science tutorial.
What can you do next to enhance your developer experience and improve your productivity?. How about a tool that helps you stay focused and on-task while providing AI-assisted code completion and content suggestion capabilities relevant to that task context? Let's talk about GitHub Copilot!
1 | What is GitHub Copilot?
GitHub Copilot is an AI coding assistant that can improve your productivity with proactive suggestions (via auto-completion) and reactive responses (via interactive chat). It is installed as a Visual Studio Code extension, bringing AI-assisted development capabilities to your editor including:
- Code generation. It can author code, tests, comments, and documentation to suit your request.
- Auto completion. It can predict what your code needs next, based on current context.
- Code refactoring. It can help you simplify code or fix issues with context-aware suggestions.
GitHub Copilot uses large language models (LLM) that are trained on large datasets of publicly available code on GitHub, and optimized for code-generation tasks. You can use the capability in two ways:
Passive Auto-Completion. As you write code or content, the AI predicts valid suggestions in real-time inline, in your editor. Simply accept valid suggestions, and customize them further if needed. For instance, the image shows how the AI suggests the next import in your Jupyter Notebook code cell - simply tab to accept the suggestion or esc to ignore it.
Interactive User Chat. Open a chat window (sidebar) or dialog (inline) to start an interactive question-answer session. You can accept/discard responses (inline) or copy/paste them from window. For example - I can now ask for more details about the suggested import in a chat window. Note how it uses the file as a context reference.
Think of GitHub Copilot as an assistive technology rather than an automation tool in your workflow. Since the technology uses an LLM, it may have inaccurate or outdated information - so make sure you validate responses before you accept them. Using this technology requires an active subscription but there is a 30-day free trial to help you evaluate the capability for your own needs. The Data Science Profile Template for Visual Studio Code adds this extension to your profile by default if used.
2 | What can GitHub Copilot do?
As we saw earlier, the AI can provide proactive suggestions (auto-completion) or reactive responses (user chat) using your current context (file or workspace) to ensure that responses are relevant to current tasks. In addition to these capabilities, the GitHub Copilot extension comes with built-in slash commands that are optimized for common tasks. Simply open the inline chat dialog (editor) or the dedicated chat window (side) and type /
to see the available commands as shown:
Chat Commands | Inline Commands |
---|---|
The table below provides a short summary of what the commands do. Note that commands that can impact the currently-open file (e.g., add or modify code) can be used in both inline (dialog) and chat (window) modes. Other commands that are more general work only in the chat (window) pane.
Command | Description | Usage |
---|---|---|
/doc | Add comments for code using right syntax | Inline and Chat |
/explain | Get code explanations in natural language | Inline and Chat |
/fix | Fix problems in the specified code | Inline and Chat |
/tests | Create unit tests for the selected code | Inline and Chat |
/newNotebook | Chat only | |
3 | GitHub Copilot For Data Science
So how can you use GitHub Copilot to enhance your Python Data Analysis and Visualization learning journey? Let's check out some simple examples using the code repo for this workshop.
3.1 Use /doc
to add comments
In this exercise, open the 3-copilot/02-cricketNotebook.ipynb
notebook in your Visual Studio Code editor and let's explore using the /doc
command to add comments to the first code cell. Not only does this encourage good documentation practices, it also helps you gain insight into what the code does if you are not the author (e.g., was auto-generated or written by someone else)
Inline Example | Chat Example |
---|---|
Note how the inline version provides a response that is formatted for inclusion into the code cell (with Accept/Discard) options) while the chat version provides more of an explanation of that code that could be used in a relevant user guide or tutorial. Hint: Try using the /explain
command in chat mode instead to see how it compares to the response above.
Another useful feature in chat mode is the follow-up question suggested (at the bottom, in blue). These suggested prompts are a great way to build our intuition by following suggested questions based on the previous query, but still relevant to the context of our current task. Contrast this with "googling" for answers or "searching Stack Overflow" where you need to context-switch (to use a new site) and may end up in rabbit holes that keep you distracted and away from your original task.
3.2 Use /tests
to write a unit test for the selected code
-
Let's try this out to see what happens. Select the same code cell above, then try using the
/tests
command inline. -
It gives you the option to "View In Chat" so let's take that route to see what tests are suggested. We see two basic tests (check for existence of valid and non-existent files) plus some suggestions (check for empty files or poorly-formatted ones).
You will get a prompt to "Create" tests which will try to create a new tests notebook, with some issues. Instead, try copying the test code over into a new Python cell and run it, to verify it works.
There are more complex tests to write - but this showcases the streamlined developer experience for common tasks in VS Code.
3.3 Use /newNotebook
to create notebooks
You can create a new Jupyter notebook with pre-written code and documentation for a stated goal. Let's try this out by asking GitHub Copilot to get us started with a notebook for visualizing the cricket data in the file we looked at before. This will create it as an untitled notebook with the outline specified in that response.
Save that to a file like 01-newNotebook.ipynb
and try running it with no modifications. Note how there are errors because of the filename mismatch. (Note: I deleted sections of the saved file for simplicity - so the outline is truncated).
We can stay in the flow by asking the AI to fix the issue itself.
Then run it to see the fix resolved, and output generated as required.
You can now try asking the AI to write new code or content sections for you to explore specific visualizations or seaborn syntax and capabilities. Check out the 01-seaborn-datawrangler-ipl.ipynb
file as an example of what is possible.
4 | Exercise: Try it yourself!
In the above exercises, we created a notebook using a sample data file, and walked through steps to get us to a basic data analysis and visualization result. Try replicating this with a dataset of your own (e.g., try the TED Talks dataset found under the 1-data/kaggle/
folder).
Want to learn where you can find more open datasets to explore? Watch out for the next post in this series, for some useful resources.
Bookmark and revisit these for more information:
Resource | Description |
---|---|
Collection 1️⃣ | Skill up on Data Science Tools & Techniques |
Collection 2️⃣ | Skill up on Responsible AI Principles & Tooling |
Collection 3️⃣ | Build Generative AI Apps End-to-End with Azure AI |
Posted on March 20, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.