Data Quality Assessment on Shootings in United States

As per today’s class, I started exploring a dataset that encompasses various aspects of shootings that occurred in the USA.

My focus was on understanding the data, identifying discrepancies, and addressing missing values in order to prepare the dataset for further analysis.

Missing Values : 

The dataset we are working with contains several columns with missing values, including “threat_type,” “City,” “County,” “Latitude & Longitude,” “Age,” and “Race.” These gaps in the data can significantly impact the accuracy and reliability of any analyses or models we wish to build.

Duplicate Records :

One of the specific issues we encountered was the presence of duplicate records in the “armed_with” column. Identifying and addressing these duplicates is essential to avoid skewing our analysis or modeling results.

Discrepancies :

  • Gender – In instances where gender information is absent, pertaining to individuals involved as suspects or victims, a notable concern regarding the credibility of the documented shootout arises due to the lack of identifiable names and details.
  • County – The dataset displays gaps in the county attribute, yet corresponding city data is provided. This prompts the inquiry as to whether efforts should be made to ascertain the missing county information based on the available city data.

As I continue, I will further explore techniques to clean and prepare the data, ensuring that it is reliable and fit for analysis.

Leave a Reply

Your email address will not be published. Required fields are marked *