• Please visit and share your knowledge at our sister communities:
  • If you have not, please join our official Homebrewing Facebook Group!

    Homebrewing Facebook Group

Beer data collection ideas

Homebrew Talk

Help Support Homebrew Talk:

This site may earn a commission from merchant affiliate links, including eBay, Amazon, and others.

naptown1

Active Member
Joined
Sep 29, 2013
Messages
29
Reaction score
13
SHORT VERSION:
Looking for ideas on beer related data I could scrape/collect.

LONG VERSION:
I am working on building a scraping pipeline with Apache Airflow and Scrapy and may try running it on my Raspberry Pi once i get it up. I need some ideas on the type of data i may collect. In the past when playing with scraping i grabbed the top beers from each style along with their reviews and did some analysis and visualizations with Bokeh on that data. I was thinking of taking what i did before and just build it out a bit and add some data to augment what I can already grab.

Basically i am a data nerd and looking to build something out to work with some new tech.
 
giphy-5.gif
 
Those holes look a little big, not sure how that would work after a TB trip. Maybe cheese cloth instead? You could even squeeze out water for further accuracy.
 
it is the 'what the hell' part. do i need someone to vouch for me?? sorry i used to lurk for every and then stop visiting the site much cause life. we all can't just sit and drink barleywines in our parents basement playing video games and hitting refresh on TB
 
Friendly advice: sounds like you're a data scientist. Imo, a DS without project ideas is missing a critical skill, like if a mathematician popped into this forum saying he had just learned Fourier transforms and needed ideas for what he could use them on. Ideally, you'd have a problem in mind and would have learned those tools to solve it.

On to your tools, scraping isn't hard -- I did it with Scrapy alone. You really only need Airflow if you're trying to set up a regular incrementer (which raises the question, why?). So you're probably fine on just using Scrapy, and it's a useful skill. Instead of Airflow, I used a cron job -- which you can dump onto a remote instance. Airflow is the kind of thing I'd use to manage workflows, not a basic task; it's probably overkill.

What interests you about beer? And about this site in particular? Start from there and see if some data questions naturally bubble up. There's no shortage of them.

Here's a simple one: what are the top 10 most liked beers (from posts in DDT) on a daily basis?
 
Last edited:
Close i am a data engineer. Yeah would be better if I had a problem I was wishing to solve but really looking to play with some stuff and I learn more if i build something myself then just doing tutorials.

I have done a bunch a scraping, that is just a means to end. I would prefer just using an API or something. And the use of Airflow is to really see in ongoing basis how it functions. In addition the scraping would be just a part of the workflow.

thanks for the help it does have me thinking. I like the idea of starting with the question "what interests you about beer"
 
Close i am a data engineer. Yeah would be better if I had a problem I was wishing to solve but really looking to play with some stuff and I learn more if i build something myself then just doing tutorials.

I have done a bunch a scraping, that is just a means to end. I would prefer just using an API or something. And the use of Airflow is to really see in ongoing basis how it functions. In addition the scraping would be just a part of the workflow.

thanks for the help it does have me thinking. I like the idea of starting with the question "what interests you about beer"

Ah, if you're an engineer then disregard my first para. Definitely learn your tools. People like me depend on people like you to do our jobs.

Also: plot the trend of various TB phrases over time, with the viz showing the avatar of the person who originated it. That'll use Scrapy, pandas, ntlk, and Bokeh.

E.g.

"Julian, please" would be MarkIntihar?
"**** on ____ desk" would be MordorMongo?

Etc
 
Last edited:

Latest posts

Back
Top