Bias and Word Embeddings
What I did: I investigated how distributed numerical representations of words as vectors, or word embeddings, can yield problematic inferences, and how they can amplify human biases such as gender and religion, when used as inputs to neural NLP models. I read Stanford's GloVe dataset and wrote several Numpy functions using linear algebra tools and cosine similarity to evaluate analogies like "man is to woman as king is to queen." To help with analysis, I used the WordNet database to query senses of the word "person," and used these results to help spot trends and bias in the data more easily. I also visualized embeddings and their transformations with principal component analysis, and wrote about it in a Medium article which was featured by Towards Data Science!
Tools Used: Python, Numpy, Scikit-learn, Matplotlib
What I learned: I covered concepts like analogy evaluation in my natural language processing class, but it was helpful to interact with the data and explore what could lead to the creation of complex analogies. I also learned about applications of PCA and word senses when I was conducting my analysis.
Event: Berkeley Energy and Resources Collaborative (BERC) Hackathon 2017
Team: Pancham Yadav(Berkeley CogSci), Janaki Vivrekar(Berkeley CS+Math), Yajushi Mattegunta(Berkeley EECS), Pranav Bhasin(Berkeley EECS)
What we did: We investigated the relationship between the locations of tweets about climate change and of the sources of carbon emissions in the US, using data from the Twitter API and the Energy Information Administration, clustered the data points with K-nearest neighbors, and superimposed them on a map of the United States.
Role: On the backend, I worked on cleaning the data and used these data to create endpoints for a web server. On the frontend, I created the user interface such that it effectively displayed data from the backend on a map of the US.
What I learned: I witnessed firsthand the importance of designing a data visualization around the audiences' needs, and also learned how to effectively display data using maps. I also learned more about Pandas and the Twitter API.
Event: CalHacks 4.0
Team: Jay Wang(Berkeley Bio + CS)
What we did: We built a web app that predicts the lethality of the flu virus when different strains exchange genetic information. We accessed and manipulated genetic and protein data on 18,000 strains of the flu virus, and then used results from a recent research paper to determine if these strains were deadly, displaying our model's outcome in a web app.
Role: I worked on cleaning up the data, scraping the National Institute of Health's website, and packaging our models into a responsive web application.
Tools Used: Python, Pandas, Scrapy, BeautifulSounp, Numpy, Scikit-learn, Flask, HTML/CSS, Bootstrap
What I learned: This was my first time working with Pandas, so I learned how to use it to manipulate data from a CSV. It was also my first time creating a web app with a major backend, so I learned about best practices for modularizing code. Aside from this, it was fun walking my partner through the process of designing and developing a web application and furthering the latest biological research with computation.
What I did: A web app that helps users voice talk to their representatives about the issues they care about. I had read about the effectiveness of calling your representatives and the importance of state legislators when making policies. As a result, I decided to consolidate features like showing users contact info and online profiles for both their members of Congress and their state legislatures in addition to looking up and filtering current legislation, into one website.
Tools Used: Python, Flask, Open States API, Sunlight Congress API (Currently deprecated), HTML/CSS, Bootstrap, Flask, Jinja2, Heroku
What I learned: I learned how to use RESTFful APIs and Python for server-side development, and connected this with my knowledge of web development. I also learned about the process of deploying apps on cloud services such as Heroku and AWS.
Event: Berkeley Builds 2017
Team: Aaryaman Sen(Berkeley CogSci), Tejal Gala(Berkeley BioE), Aditya Palacharla(Berkeley BioE)
What we did: We designed a mobile app that would improve users' experiences when searching for community resources. We partnered with the nonprofit Health Leads, which seeks to improve the quality of their clients' medical care by helping them seek out resources such as food pantries to ensure they maintain a healthy diet. To offer a clean user experience, we decided to host their online services on a mobile platform, which would use speech recognition and natural language processing, as well as take the users' history into account when accessing Health Leads's services.
Role: I helped wireframe parts of the app and conduct research.
Tools Used: Sketch, Figma
What I learned: This was my first UI/UX design project, so I learned about human-centered design processes and best practices for designing a clean, intuitive user interface.
Links: Blog Post
Awards: Berkeley Builds Winner
Event: Hack UC Santa Cruz 2017
Team: Yajushi Mattegunta(Berkeley EECS), Ivon Liu(Berkeley EECS), Sukrit Arora(Berkeley EECS), Lawrence Cheng(Berkeley EECS)
What we did: We created a Chrome extension to generate the calorie content for each ingredient in a recipe. To do this, we wrote a web scraper to access the ingredients on a recipe's website and implemented a function to recognize which part of the string is a food and which part corresponds to the amount of the food using the Google Cloud Natural Language Processing API. We then scraped the USDA Food Composition Database with the list of foods as queries, and matched the query to the entry that was the most similar. After that, we called the USDA's Food Composition Databases API to return the calories in each ingredient and converted these values to correspond to the amounts specified in the recipe.
Role: I worked on building the scrapers, calling the USDA's API, and styling the extension.
What I learned: I found that when there isn't a readily available API, web scraping is always an option. I also learned the importance of partitioning tasks when working on teams to create a finished product.
Awards: 3rd place in Tech Cares at Hack UC Santa Cruz
Course: CS 61B(Algorithms and Data Structures) Final Project
What I did: I wrote the backend of a web server to display a map of the area around Berkeley. This app supported operations such as scrolling, zooming, and route finding. I constructed a quadtree of image files to be rendered based on a query, and implemented the A* algorithm to find the shortest path between points on the map.
Tools Used: Java, Maven, XML, OpenStreetMap data
What I learned: I made key decisions about which data structures to use based on technical specifications and learned about connecting real-world data with structures like graphs.
Gender and Color Perception
Course: CogSci 88 (Data Science and the Mind)
Team: Vincent Ngo (Berkeley CogSci)
What we did: We analyzed data from the World Color Survey – which examined color perception in over 2500 speakers of 110 languages. The dataset contained the speakers' demographic information, so we decided to examine the differences between the responses of male and female speakers. We constructed a prototype model and aggregated each gender's distance from the model, then conducted an A/B test on the responses to test for statistical significance, concluding that there is no significant difference between the genders.
Role: I worked on visualizing the data with histograms and scatter plots and implemented algorithms to create the model and for the A/B tests.
Tools Used: Python, Numpy, Scipy, Matplotlib
What I learned: Working with large corpuses of text and extracting the important information, creating basic statistical models over these data.