goScrap, Linkedin scraping

LinkedIn, Facebook, Twitter, consider for a moment all that data we provide to these social networks, now that percentage remove those who think in the privacy of your data. Well we already have 90% of people appearing in a simple google search. 

Based on this we can create a service similar to Linkedin Intro without giving them access to all our mail. Here is a proof of concept for scraping results in google search, linkedin or facebook. 

First of all, where and what we want to obtain, in this case who ^^
query = raw_input("Name AND company OR location >> ")
A simple way is to prove that appears using our favorite search engine.
Then go to the content provider that best suits our needs. We want public profiles, such as Facebook or Linkedin.

The following will be review the source code

After some coding here we have the results =) Okay, I know you have to get name, company, place of residence. This information can be gained by an address type "erik.ganna@company.com"

#alex lee kingdom google

#div class="headline-title title" style="display: block;">
#Sourcing | Talent Acquisition Specialist at Google
div class="image zoomable" id="profile-picture" style="display: block;">
#img alt="Alex Lee" class="photo" height="100" src="http://m.c.lnkd.licdn.com/mpr/pub/****.jpg" width="100" />

import urllib2
import urllib
import json
import sys
import re
import random

from HTMLParser import HTMLParser
from xml.etree import cElementTree as etree
url = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&"
query = raw_input("Name AND company OR location >> ")
query = urllib.urlencode( {'q' : query } )
response = urllib2.urlopen (url + query ).read()
data = json.loads ( response )
results = data [ 'responseData' ] [ 'results' ]
for result in results:
 url = result['url']
 from urlparse import urlparse
 host = urlparse(url)
 if getDomain(url) == "linkedin":
  if host.path.find("/in/") or host.path.find("/pub/"):

Filled Under:


Post a Comment