kimono is a must have app/chrome extension for easily and quickly scraping data from a website with no code writing required. Download it here.
What's really nice is that kimono can be used to scrape data from twitter pretty much hassle-free, without the need to write any code. The kimono data extractor recognizes patterns in web content which you can easily optimize to get exactly what data you want, and leaving behind the rest.
The kimono blog is an excellent resource for seeing all the cool stuff other people have used it for, and then so can you. I'm going to give you an example of using kimono to scrape a twitter account (Neil deGrasse Tyson's to be specific) and then analyzing the texts of his tweets by visualizing the most common words with a word cloud/map using Wordle.
First, I just want to say that what takes kimono to the next-level is a chat window that starts-up automatically when you sign-in to your kimono account. A very helpful person is available to answer any questions and problem-solve any issues you face. I learned this first-hand as I attempted to scrape twitter but ran into problems. The kind assistant worked with me to find a solution, which it turns out is to use the mobile twitter site which is where we'll begin.
So, once you have the kimono plugin installed for chrome and have navigated to mobile.twitter.com, click on the kimono icon next to the address bar. You will notice that a yellow box will appear talking about Auth mode:
'Kimonify'ing a mobile twitter page requires you to login so continue by clicking the pulsating lock icon.
Follow the instructions by clicking the yellow USERNAME circle followed by the Phone, email or username box. Repeat for PASSWORD and SUBMIT as well then click Done.
Enter your twitter login information let the kimono magic happen. Now we want to navigate to the twitter page we are going to scrape. This requires us to click the NAVIGATION MODE icon and then we can search for the twitter account we want (the new Mr Cosmos Neil deGrasse Tyson). We have to click NAVIGATION MODE one more time and then select the appropriate twitter handle (@neiltyson). Now we're at the page we need to begin scraping.
Change the dialogue box at the top left which says property1 to something like tweet. Now scroll over any of the tweets in the tweet feed until a light yellow box overlays the text and left-click. You'll notice that the box becomes solid yellow and there is a 1 in the yellow circle at the top left. There should also be light yellow boxes overlaying other tweets with x✓ on the right. Click ✓ next to another tweet and all the tweets in the feed should turn yellow and a 30 should now be in the top left yellow circle.
If we were to run the kimono API at this point, it would only give us results for the 30 tweets displayed on the page. But there are many more tweets (over 4,000 for Mr Tyson) that we can extract and it only takes one more step. Scrolling down to the bottom of the page you will see a Load older Tweets button, this changes the page to the preceding 30 tweets. Click the bluish circle at the top right which is called PAGINATION and then click this Load older Tweets button and then Done.
Give your API a name, like Neil deGrasse Tyson Tweets, and click Create API. Follow the link provided and now you're ready to Start Crawling (extracting the data from Mr Tyson's twitter feed). Under the CRAWL SETUP tab, you can see the status of the crawl and can change the PAGINATION LIMIT from 1 page to 1,000 pages. The DATA PREVIEW tab lets you see 10 rows at a time, or to copy/download the entire dataset. Download the data as a CSV file.
[Analyzing text data is called text mining and can be done using tools like MonkeyLearn. An example of analyzing news headlines using kimono and MonkeyLearn can be found here.]
For our purposes, we're just going to simply visualize the most common words from Neil deGrasse Tyson's tweets. Copy the tweet column from the downloaded CSV file and paste it into the text box here to get an image looking something like this:
There may not be much statistically significant with this, but it's a nice insight into the message being shared by one of the most prominent scientists of our time.
Translate
Thursday, March 26, 2015
Monday, March 9, 2015
The data blogging adventure begins!
"Maybe stories are just data with soul."
This blog will chronicle my education and experimentation with data journalism. Whether it's chronicling my exploration and scraping of data from the web; critiquing already-published pieces; or trying (& most likely failing) to master the plethora of tools available to visualize data, I hope to give you an entertaining and interesting look inside the life of an aspiring data journalist.Brené Brown
And so they say here in Britain, Tally-ho!
Oh and if you want to make your own image with personalized text, check out Chisel. That's how I made the photo above (photoshopped mad scientist with my face not included).
Subscribe to:
Comments (Atom)




