This project implements a distributed data processing pipeline using Apache Spark and Hadoop HDFS to analyze competitive programming problems and solutions from the DeepMind CodeContests dataset. The ...