I was joined this week by my colleague Margriet Groenendijk to look through some crime data to do a bit of an exploration and see what we could see in the way of biases in the data.

A 1080p version of this video is on Cinnamon

This is the recording from my ML for Everyone show that broadcasts every Tuesday at 2pm UK time on the IBM Developer Twitch channel.

We didn't know really what we were looking for, but wanted to see if some of the commonly heard views of biases in policing were visible in the data. This is a hugely political and emotive area at the moment. So much so that IBM has just launched Call for Code for Racial Justice to encourage tech projects to combat racism.

The data we used was taken from the UK police open data website: https://data.police.uk/

Margriet has a python notebook that we were using for the session:

IBMDeveloperUK / Data-Science-Lunch-and-Learn

Resources for weekly Data Science Lunch & Learns

Every Monday: Data Science Lunch & Learn

Online at lunch time on Crowdcast

Content

Many of the events use a Jupyter notebook to go through example code. We will mainly use Watson Studio to run these, but you can run them on any platform. To follow along in Watson Studio sign up for a free IBM Cloud account and create a Watson Studio service as described in these instructions.

Upcoming events

You?

We are busy planning new events and creating new content and material. Suggestions on topics and speakers are always welcome! Let us know by creating an issue in this repo

Coming soon

Trusted AI - learn about fairness and explainability
Data exploration with Python - series using various datasets
Deep learning series

Past events

29th March 2021: Update on COVID data analysis

Presenter: Damiaan Zwietering - twitter
Find all…

View on GitHub

We were specifically looking at "Stop and Search" data reported by the police force in the area I live in, Avon and Somerset Police.

The first thing we immediately found is that the data in itself can sometimes be confusing. For example ethnicity is broken down both by 'self reported' and 'officer reported'. Which in and of itself could be significant.

There are many different ways this data could be interpreted. And we'd need a lot more knowledge of the specific terms in the reporting to be able to draw any rigorous conclusions. But we wanted still to see what we could see.

One specific area we chose was the 'outcomes' of a stop and search versus the officer reported race. ie. if you suspected a bias in policing you might expect to see a higher prevalence of stop and searches carried out for one race for stops in which no action was subsequently taken.

We found that if you are classed as Black or Asian that the probability of the outcome being no action was 25% versus 30% for white people stopped. Of course we have to be aware of correlation versus causation here as there could be two plausible explanations for these numbers:

That BAME people stopped are 'let off' with no action more often.
That police officers are more likely to stop a BAME person for no offence.

So this was just a very superficial look at the data, but hopefully shows how you can use python notebooks, and the pandas library to explore and visualise the data.

If you want to learn more, then please drop by the IBM Developer Europe Twitch stream on Tuesdays from 2-3pm UK time, or have a look at the Data Science Lunch and Learn series we run on the IBM Developer Europe Crowdcast channel.

Blog

Exploring Bias in Crime Data

Matt Hamilton