Hello, I'm Afonso and I'm 22 years old. I'm in the second year of a master's degree in Software Engineering at the University of Minho, currently finishing my thesis. I am a very outgoing, friendly, fun, and selfless person. I love playing football and going to the gym, listening to music, playing the guitar and I love being in contact with nature. And when I say contact with nature, I mean it, I love camping, hiking, fishing, cracking open a cold one while enjoying the view. I like to keep things simple and enjoy the little things in life, as I believe that a clear mind and being happy with ourselves is essential for creativity and productivity.
I built a simple secure password generator using Python, storing them in a json file. I implemented various features such as:
I have made some data analysis projects using Python and various libraries such as Pandas and Matplotlib and Excel. The objective was to analyze and visualize data from different sources and learning how to extract insights from it. In Excel, I created dashboards and reports to present the data in a more understandable way.
I have some products on my wishlist and I wanted to keep track of their prices. So, instead of checking the prices manually, I decided to automate this process, by creating a simple web scraper using Python and Beautiful Soup to extract information about my favorite products from "kuantokusta" website. The scraper collects data such as product names and prices, and stores it in a structured format. This project helped me improve my web scraping skills and learn more about data extraction techniques. I also implemented a github action so this script runs automatically at 08:00 UTC, 09:00 in Portugal, and 13:00 UTC, 14:00 in Portugal.
Musify is a music player app that I developed using WPF and C#. The app allows users to play, pause, skip, restart and go back to songs from a local music library. This project helped me improve my skills in GUI development, C# developing and working with audio files.
This project focuses on creating a machine learning model to predict an individual's physical fitness (is_fit) based on health data such as age, height, weight, heart rate, and blood pressure. The goal is to develop a model that can classify whether an individual is considered "fit" or not, based on their body metrics. In this project I applied different knowledge such as data pre-processing and analysis, outlier treatment, modeling and evaluation and hyperparameter tuning.
This is a simple machine learning project where I used a binning technique and Logistic Regression model to predict the quality of red wine based on various physicochemical properties.
This is a simple machine learning project where I used K-means and K-medoids clustering to identify and group different species of iris flowers based on their sepal and petal measurements.
This project involves creating a basic data pipeline for processing and creating a csv with processed cryptocurrency data from the CoinGecko API. The pipeline includes data ingestion, cleaning, transformation, and loading into a csv file.
The "Football Teams Data API" project demonstrates a simple data engineering pipeline, serving football team information through a web application. The purpose of the client-side (the web application) is to simulate data accessibility for other entities, such as data analysts in a company. The backend is a RESTful API built with Flask, which uses Pandas to process the data. The frontend, a single-page application in HTML, CSS, and JavaScript, consumes this API to display the data in an interactive table. Project management was handled with Jira, allowing for effective task tracking. During development, I've used libraries like python-dotenv to manage environment variables and Flask-CORS to solve the CORS issue, a common web security barrier that prevents communication between different domains. Solving this problem was a key learning point in the project.
This project, named "Movies Pipeline," is a complete data engineering solution that processes raw movie and TV show data and exposes the cleaned output through a Flask API. The pipeline uses PySpark to handle data processing tasks. It ingests raw data from JSON files, then uses Spark's native functions like explode to normalize nested data structures. This flattens the list of genre IDs for each movie or show, making it possible to efficiently join the data with separate genre mapping files. The pipeline then groups the data back together and stores the final, processed dataset in an optimized Parquet file format. For the API layer, the project uses Flask and PyArrow. Instead of running the resource-intensive Spark jobs on demand, the Flask API reads the pre-processed Parquet file into memory when it starts. This approach makes the API lightweight and responsive, allowing it to quickly serve data without the overhead of a Spark session. This robust architecture separates the heavy data processing from the fast data serving, creating a scalable and efficient solution.