Datanice – DIY : Use Instagram data to plan your next vacation

DIY : Use Instagram data to plan your next vacation

When I travel, I always want to get the most of the city I'm visiting. One way is to talk to local people and get advices about which spots you shouldn't miss. But I wish I could have the point of view of the past visitors... Which places did they enjoy the most ? Where is the best spot to watch the sunset ? The best selfies you can get ? I also want to have a look of some beaches before choosing one or getting an idea about what some areas looks like... The idea here is to ask Instagram about the nearby popular spots.

After looking into the Instagram API and playing around with it, I came up with the following script.

import os
import json
from collections import Counter
import pandas as pd
from instagram.client import InstagramAPI

INSTAGRAM_ACCESS_TOKEN = ''
INSTAGRAM_CLIENT_ID = ''
INSTAGRAM_CLIENT_SECRET = ''

api = InstagramAPI(access_token=INSTAGRAM_ACCESS_TOKEN, client_id=INSTAGRAM_CLIENT_ID,client_secret=INSTAGRAM_CLIENT_SECRET)

def getNbLikes(listMedia):
    likes =0
    count =0
    for media in listMedia:
        likes = likes + media.like_count
        count = count + 1
    if count > 0:
        return likes/count
    else:
        return 0

def getTags(listMedia):
    tags = []
    for media in listMedia:
        for mediaTag in media.tags:
            tags.append(mediaTag.name)
    return Counter(tags)

def getMedia(locationId):
    medias =  api.location_recent_media(location_id=locationId)
    return medias[0]


bestLocations = [];
latD=48.858844
lonD=2.294351

for x in range(-10, 10):
    for z in range(-10,10):
        print(x,z)
        locations = api.location_search(lat=48.858844+x*0.001, lng=2.294351+z*0.001)
        for location in locations:
            likes = 0
            if not any(d['name'] == location.name for d in bestLocations):
                images = getMedia(location.id)
                likes = getNbLikes(images)
                tags = getTags(images)
                if len(images)>0 :
                    bestLocations.append(dict(name=location.name,latitude=location.point.latitude,longitude=location.point.longitude,likes=likes,tags=tags,id=location.id,nbrImages=len(images)))

finalData = pd.DataFrame.from_dict(bestLocations)
finalData.to_csv('instadata.csv', sep='\t', encoding='utf-8')

We first query for the locations around the coordinates of the location we wish to know more about and then we query for the photos of each location and get the number of likes and the number of pictures for it.

Note: You need to replace the Access Tokens and Client ID with the values you get from Instagram here.

After running the previous script for some time, we get a nice dataset that we can analyze with pandas.

> import pandas as pd
> df = pd.read_csv("instadata.csv",sep='\t')
> df.head(10)

This is the head of the dataFrame displayed in iPython Notebook.

Now let's see which spots have the most likes per picture and which ones have the most pictures

gr = df.groupby('name').sum()

After dropping the useless columns

my_plot = gr.head(30).sort(columns='likes',ascending=False).plot(kind='bar',figsize=[15,5])

This script could be improved with some text mining on the names, to combine the similar results. (You can see that we have multiple results for the Eiffel tower).

The next step is to visualize the nearby pictures.
I put up a little angularJS application where we can select a location and see a list of pictures per location.
I'll put the code online when I have more time.

Please let me know in the comments if you have any improvement ideas !

Posted on: Fri 04 September 2015