Small Python Projects: Build a News Dataset
One of the easiest projects that you can do in Python is creating a dataset by scraping a particular website In this project , we will use the PyGoogleNews library to extract Google News elements. We will optimize this this web scrapper to focus on a particular keyword, language and search engine location. Additionally, you will learn how to translate this with Texblob library and also create sentiment analysis on the titles.
Follow Along with the Video
Let’s Dig into the Code:
#let's add the libraries from pygooglenews import GoogleNews import pandas as pd #create the Google News API gn = GoogleNews(lang='jp',country="JP") # lets create a dictionary so that we can get the date of publish, link and title def get_titles(keyword): news= [] gn=GoogleNews(lang='jp',country='JP') search = gn.search(keyword) articles = search['entries'] for i in articles: article= {'title': i.title, 'link': i.link,"published":i.published} news.append(article) return news data = get_titles("ポケットモン") #lets save a data frame so that we can start translating what we have df = pd.DataFrame(data) # Here is texblob our natural language processing library from textblob import TextBlob # We use translate to with a from language to language blob.translate(from_lang='ja', to='en') # let's create a function that bring back sentiment and translateions def translation(text): blob =TextBlob(text) return str(blob.translate(from_lang='ja', to='en')) def sentiment(text): blob=TextBlob(text) return blob.sentiment.polarity df['translation'] = df['title'].apply(translation) df['sentiment'] =df['translation'].apply(sentiment) # lets create an actual class import numpy as np df['Sentiment Class'] = np.where(df['sentiment']<0,"negative", np.where(df['sentiment']>0,"positive", "neutral")) #lets export the file df.to_excel('output_file.xlsx')