¡Hola mis amigos!
I'm learning spanish, but I'm also annoyed with collaborating on LaTeX papers. That's why I've created the GDoc-LaTeXifier so the syntax will be clear when I collaborate on a paper with a remote friend.
But now we both want to compile a PDF on our machines. So I've created the tiny shell script that downloads the paper and runs PDFLaTeX.
The problem is that this opens a new terminal window and runs the script. I've been able to sort it out so that it closes the terminal window when it's done, by on my friend's mac it doesn't, so he ends up with a ton of open windows.
Enter - the GDoc/LaTeX compiler GUI.
Continue reading "GDoc/LaTeX compilation GUI with Tkinter/Python [w/ code]"
Just sharing a snippet of code. Part of a project I'm doing, I need to analyse the links in the Wikipedia corpus. While using the API is one solution, it doesn't retain the order of where links appear in the page. It also returns links that are not part of the main text, which makes the linkage DB very cluttered.
So, I set out to parse the raw MediaWiki format all Wikipedia articles are written in, to get only the relevant links and in order. I call them contextual because they live inside the text and have context.
Initially I used string matching, and other complex string scraping parsing methods. It was a bust. There are too many end-cases to deal with. That is when I discovered PyParsing, the excellent parsing library. It did the job, and here are the results.
Continue reading "Getting all the links from a MediaWiki format using PyParsing"
I'm a fan of Last.fm online radio, and I have a habit of marking every good song that I hear as a "loved track". Over the years I got quite a list, and so I decided to turn it into my jogging playlist. But for that, I need all the songs downloaded to my computer so I can put them on my mobile. While Last.fm does link to Amazon for downloading all the loved songs for pay, I'm going to walk the fine moral line here and suggest how you can download every song from existing free YouTube videos.
If it really bothers you, think of it as if I created a YouTube playlist and now I'm using my data plan to stream the songs off YT itself..
Moral issues resolved, we can move on to the scripting.
Update (4/27/12): youtube-dl.py has moved: https://github.com/rg3/youtube-dl/, and also added a very neat --extract-audio option so you can get the songs in audio right away (it basically does a conversion in a second step).
Continue reading "Download all your Last.fm loved tracks in two simple steps"
I believe that every builder-hacker should have their own little Swiss-army-knife server that just does everything they need, but as a webservice. You can basically do anything as a service nowadays: image/audio/video manipulation, mock-cloud data storage, offload heavy computation, and so on.
Tornado, the lightweight Python webserver is perfect for this, and since so many of the projects these days have Python binding (see python-tesseract), it should be a breeze to integrate them with minimal work.
Let's see how it's done
Continue reading "10 lines-of-code OCR HTTP service with Python, Tesseract and Tornado"