Jump to content

Recommended Posts

  • Administrators
Posted

So this will only appeal to a small subset of folks here.

But often when looking at a Gumtree ad I have to manually go through and collect all the image URLs and copy them here.
 
Gumtree has no API to make requests to and retrieve data from, so instead I've created a very basic script to help retrieve these URLs and put them in BBcode format quickly.
 
How it works?

 

I've uploaded a file called gumtree.txt you'll need to rename it to gumtree.py

 

NB: You'll need to install the libraries:

Requests

http://docs.python-requests.org/en/master/user/install/#install
BeautifulSoup

https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Take the following URL:
http://www.gumtree.com.au/s-ad/townsville/cars-vans-utes/1972-datsun-240z-coupe/1135792440
 
Using the command:
python gumtree.py

 

Screen Shot 2017-01-19 at 12.13.13 AM.png

The python script will read the contents of the webpage and return all the images for that specific listing in a nice list compatible with the forum software.
Screen Shot 2017-01-19 at 12.13.25 AM.png
 
I just copy paste that result below and bob's your tea pot. I'm not sure how much time I'll get but I may expand this to copy other information such as the ad text, price, location, and other URLs like eBay etc..

 

Taking it to the next level, in a similar way to how Youtube links works and automatically insert the video for you, simply copying / pasting a Gumtree URL, eBay URL, Bring A Trailer etc.. would post the ad information here with images description etc.. with no work on your part.

 

Or even better capture images at a specific URL and store them (in case they go missing later eg: Craigslist).

$_20.JPG
$_20.JPG
$_20.JPG
$_20.JPG
$_20.JPG
$_20.JPG
$_20.JPG
$_20.JPG
$_20.JPG
$_20.JPG

gumtree.txt

Posted

Good idea!

 

How long do the images on the Gumtree/eBay servers last though?

 

It could be interesting to expand your idea of having a forum tag and further automate it. The site could automatically find and cache ads (or the relevant bits of ads) for any S30s that go up.

  • Administrators
Posted

Good idea!

 

How long do the images on the Gumtree/eBay servers last though?

 

It could be interesting to expand your idea of having a forum tag and further automate it. The site could automatically find and cache ads (or the relevant bits of ads) for any S30s that go up.

 

In my experience Gumtree and eBay don't really remove the images. So they can last a long time! CarSales is another where the ad is pulled but the images are actively available on the server for a long time afterwards (sometimes years later).

 

But you are correct a more ideal solution would be to pull the images, ad information and store it locally (like an Internet Archive) for posterity. A tool like this with stored old ads from various sites would have made the Early Girl Bingo thread a lot easier to compile that's for sure!

 

There is a lot that could be done here, this is just a really basic start. Just trying to get a bit better at Python and programming in general. I'm not a brilliant programmer by any stretch of the imagination.

Posted (edited)

Very cleaver tool Gav - well done!

Dare I ask mate, but how does one access 'archived' photos on the carsales server?

Edited by Lurch ™
  • Administrators
Posted

Very cleaver tool Gav - well done!

 

Dare I ask mate, but how does one access 'archived' photos on the carsales server?

 

I haven't spent any time making a script for CarSales yet, but the process should be similar for existing ads.

 

For historical however:

 

It's tricky with CarSales because first you need to know what their URL structure was or is... going back a few years their old URLs were different. They have since updated them.

 

If I have the URL of the old listing then sometimes I can use www.archive.org to retrieve a version of the page. I can often inspect the page for references to images etc.. in some cases I can "recover" the images that were used. But the success rate isn't great.

 

Their URLs of images are often encoded in some way - which I'm not privy to yet (it could be a date stamp or some other ID).

 

For example this listing.

https://www.carsales.com.au/private/details/Datsun-240Z-1973/SSE-AD-4118194/?Cr=1

 

Where the AD ID is probably: SSE-AD-4118194

 

In that ad is a URL for an image:

 

https://carsales.li.csnstatic.com/carsales/car/private/cp5416041899455671651.jpg?aspect=FitWithinNoPad&height=700&width=1050

 

Where the image ID is : cp5416041899455671651.jpg so far I can't see any patterns between the 2.

 

It's best to capture the information up front (when the listing is live) if possible.

 

For comparison you can see this image was from HS30 00339:

 

http://liveimages.carsales.com.au/carsales/car/private/cp5422458402558076924.jpg?height=700&aspect=FitWithinNoPad&width=1050

 

Which is still live and accessible on CarSales.

 

If you could determine a correlation between Ad ID and the image ID you could probably figure out a way to quickly find images that are still hosted.

 

This isn't directly related but if you watch the video here (which I recommend): You can see a brute force way of finding images could be to cycle through various combinations and permutations (interestingly they are using a Python script for a lot of this work using booking ref. for Airlines - you'll never InstaGram or post photos of your boarding pass online again!).

 

http://www.theverge.com/2017/1/10/14226034/instagram-boarding-pass-security-problem-bad-idea

 

 

1 of the reasons I was looking at developing such scripts is to store historical data going forward so we don't run into these problems later where we can't find images or old listings.

 

There are various sites around the web that try and track old eBay listings, but I've never had much success with them, as they are not really specifically oriented toward classic car listings - rather any kind of listing.

 

I know when looking at Real Estate I often try and find old listings and information on what a property looked like before. In a similar way I think there is a market for finding information on used cars that have been listed in the past.

 

For the same reason I think that the Early Girl Bingo thread is so popular, people are interested in archived listings especially for special or unique cars.

 

But perhaps getting ahead of myself here..small steps first.

  • Administrators
Posted

Here is a CarSales script so far.

 

Using the command:
python carsales.py

 

Example URL:

https://www.carsales.com.au/private/details/Datsun-240Z-1972/SSE-AD-4425052/?Cr=3

 

Screen Shot 2017-01-19 at 11.00.24 PM.png

Produces the following (but for some reason only the first 6 images get pulled) - I think because additional images get dynamically pulled into the page - perhaps to speed up load times as you navigate through the gallery. So they are not present within the HTML source.

cp4783325420010230623.jpg
cp5132488402526346910.jpg
cp5169679788335617100.jpg
cp5730766289034617654.jpg
cp4893795275655038582.jpg
cp5412022845654217439.jpg

carsales.txt

Posted

Good idea!

 

How long do the images on the Gumtree/eBay servers last though?

 

It could be interesting to expand your idea of having a forum tag and further automate it. The site could automatically find and cache ads (or the relevant bits of ads) for any S30s that go up.

 

I wrote something similar for ozdat. It can scan a forum thread looking for off site image links and cache local copies of the images incase they go missing at a later. It can also try to retrieve missing images from archive.org

  • Administrators
Posted

Yeah I was thinking about doing something similar a while ago, nothing more frustrating than broken image references in a forum thread.

 

Is your script based in Python or something else? Do you have to manually invoke it, or does it automatically scan new threads and index/cache images that are hosted remotely?

Posted

Can you run python with your current hosting plan anyway?

 

The problem with the carsales script is that the other photos are stored on a separate gallery page, i've added some quick hacks to your script to make it work for all images.

 

I only have Python 3 installed though, so it might need some changes to work with Python2.

carsales2.txt

  • Administrators
Posted

Thanks for that, I'll have to double-check what you've changed to see how the CarSales site works.

 

I think the server might support Python scripts, but I haven't looked that far into it.

Posted (edited)

Thanks for that, I'll have to double-check what you've changed to see how the CarSales site works.

 

I think the server might support Python scripts, but I haven't looked that far into it.

Basically all it does is look for the link to the gallery (which all images have, so we just need to get the first one) and then request that page and parse the images on there.

 

But yeah, you're right about it dynamically pulling in/changing stuff - they are using AngularJS and the pages seem to be partially (but not entirely) rendered on the client side. The gallery source doesn't seem to have html "img" elements when you download it using "requests" (probably because it's not executing the JS) so i needed to change the parsing a little bit too - but the Regex was the same.

Edited by brent012
  • 3 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...