AW: Re: [Rails] browser simulator independent of web framework

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

AW: Re: [Rails] browser simulator independent of web framework

Luma-2
I'm extracting content from some websites. Currently I evaluate HTML code using Nokogiri. But the relevant content is not contained in the responded body of the HTTP GET request. This is because there is some Javascript code like $(window).load() or $(document).ready() that will send some Ajax requests and fill the original HTML code.

So I'm searching for some library that automatically executes Javascript code and Ajax requests just like a normal browser.

Martin

Von meinem Samsung Gerät gesendet.


-------- Ursprüngliche Nachricht --------
Von: Colin Law <[hidden email]>
Datum: 18.06.17 09:42 (GMT+01:00)
An: "Ruby on Rails: Talk" <[hidden email]>
Betreff: Re: [Rails] browser simulator independent of web framework

On 17 June 2017 at 22:58, Martin L. <[hidden email]> wrote:

> Hi all,
>
> Is there any browser simulator that fulfills these requirements:
>
> - gem written in Ruby
> - automatically performing Ajax and Javascript code (XSS is not an issue in
> my case)
> - independent of the frameworks used by the website (Rails, JavaEE, ASP.NET,
> ...)
> - only client-side
> - no testing
> - no browser dependency

What do you mean 'no testing'? If not for testing then what is it for?

Colin

--
You received this message because you are subscribed to a topic in the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rubyonrails-talk/H_YImOIzNNo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To post to this group, send email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/CAL%3D0gLvqHArcWpbZ5gsfCiPg0EF%3D4kD8QzZbC3KnicN58uAZ8A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/smyckbsskql3om4h9odf01hm.1497784510166%40email.android.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Re: [Rails] browser simulator independent of web framework

Colin Law
On 18 June 2017 at 12:21, Martin Luy <[hidden email]> wrote:
> I'm extracting content from some websites. Currently I evaluate HTML code
> using Nokogiri. But the relevant content is not contained in the responded
> body of the HTTP GET request. This is because there is some Javascript code
> like $(window).load() or $(document).ready() that will send some Ajax
> requests and fill the original HTML code.
>
> So I'm searching for some library that automatically executes Javascript
> code and Ajax requests just like a normal browser.

Understood. Don't think I can help I am afraid. Does the site not work
with js disabled in the browser?

Colin

>
> Martin
>
> Von meinem Samsung Gerät gesendet.
>
>
> -------- Ursprüngliche Nachricht --------
> Von: Colin Law <[hidden email]>
> Datum: 18.06.17 09:42 (GMT+01:00)
> An: "Ruby on Rails: Talk" <[hidden email]>
> Betreff: Re: [Rails] browser simulator independent of web framework
>
> On 17 June 2017 at 22:58, Martin L. <[hidden email]> wrote:
>> Hi all,
>>
>> Is there any browser simulator that fulfills these requirements:
>>
>> - gem written in Ruby
>> - automatically performing Ajax and Javascript code (XSS is not an issue
>> in
>> my case)
>> - independent of the frameworks used by the website (Rails, JavaEE,
>> ASP.NET,
>> ...)
>> - only client-side
>> - no testing
>> - no browser dependency
>
> What do you mean 'no testing'? If not for testing then what is it for?
>
> Colin
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Ruby on Rails: Talk" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/rubyonrails-talk/H_YImOIzNNo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [hidden email].
> To post to this group, send email to [hidden email].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/rubyonrails-talk/CAL%3D0gLvqHArcWpbZ5gsfCiPg0EF%3D4kD8QzZbC3KnicN58uAZ8A%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Ruby on Rails: Talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [hidden email].
> To post to this group, send email to [hidden email].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/rubyonrails-talk/smyckbsskql3om4h9odf01hm.1497784510166%40email.android.com.
>
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/CAL%3D0gLsaS8DzC2o0%3DHkKt_%2BMvvzqutm0mSHUj5jf7CoKvnqJEg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Re: [Rails] browser simulator independent of web framework

Luma-2
Am Sonntag, 18. Juni 2017 14:37:12 UTC+2 schrieb Colin Law:
On 18 June 2017 at 12:21, Martin Luy <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="nK7Cj7bqDQAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">marti...@...> wrote:
> I'm extracting content from some websites. Currently I evaluate HTML code
> using Nokogiri. But the relevant content is not contained in the responded
> body of the HTTP GET request. This is because there is some Javascript code
> like $(window).load() or $(document).ready() that will send some Ajax
> requests and fill the original HTML code.
>
> So I'm searching for some library that automatically executes Javascript
> code and Ajax requests just like a normal browser.

Understood. Don't think I can help I am afraid. Does the site not work
with js disabled in the browser?

Colin

Unfortunately they completely rely on js, there's nothing working without.

Is there some tool coming close to my use case, or some testing tool that I could use for my purpose without writing test?

Martin

>
> Martin
>
> Von meinem Samsung Gerät gesendet.
>
>
> -------- Ursprüngliche Nachricht --------
> Von: Colin Law <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="nK7Cj7bqDQAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">cla...@...>
> Datum: 18.06.17 09:42 (GMT+01:00)
> An: "Ruby on Rails: Talk" <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="nK7Cj7bqDQAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">rubyonra...@googlegroups.com>
> Betreff: Re: [Rails] browser simulator independent of web framework
>
> On 17 June 2017 at 22:58, Martin L. <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="nK7Cj7bqDQAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">marti...@...> wrote:
>> Hi all,
>>
>> Is there any browser simulator that fulfills these requirements:
>>
>> - gem written in Ruby
>> - automatically performing Ajax and Javascript code (XSS is not an issue
>> in
>> my case)
>> - independent of the frameworks used by the website (Rails, JavaEE,
>> <a href="http://ASP.NET" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2FASP.NET\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNF7Pk51mzZSFnzZieTnYFKSd1fGbw&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2FASP.NET\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNF7Pk51mzZSFnzZieTnYFKSd1fGbw&#39;;return true;">ASP.NET,
>> ...)
>> - only client-side
>> - no testing
>> - no browser dependency
>
> What do you mean 'no testing'? If not for testing then what is it for?
>
> Colin
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Ruby on Rails: Talk" group.
> To unsubscribe from this topic, visit
> <a href="https://groups.google.com/d/topic/rubyonrails-talk/H_YImOIzNNo/unsubscribe" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/topic/rubyonrails-talk/H_YImOIzNNo/unsubscribe&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/topic/rubyonrails-talk/H_YImOIzNNo/unsubscribe&#39;;return true;">https://groups.google.com/d/topic/rubyonrails-talk/H_YImOIzNNo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> <a href="javascript:" target="_blank" gdf-obfuscated-mailto="nK7Cj7bqDQAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">rubyonrails-ta...@googlegroups.com.
> To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="nK7Cj7bqDQAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">rubyonra...@googlegroups.com.
> To view this discussion on the web visit
> <a href="https://groups.google.com/d/msgid/rubyonrails-talk/CAL%3D0gLvqHArcWpbZ5gsfCiPg0EF%3D4kD8QzZbC3KnicN58uAZ8A%40mail.gmail.com" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/msgid/rubyonrails-talk/CAL%3D0gLvqHArcWpbZ5gsfCiPg0EF%3D4kD8QzZbC3KnicN58uAZ8A%40mail.gmail.com&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/msgid/rubyonrails-talk/CAL%3D0gLvqHArcWpbZ5gsfCiPg0EF%3D4kD8QzZbC3KnicN58uAZ8A%40mail.gmail.com&#39;;return true;">https://groups.google.com/d/msgid/rubyonrails-talk/CAL%3D0gLvqHArcWpbZ5gsfCiPg0EF%3D4kD8QzZbC3KnicN58uAZ8A%40mail.gmail.com.
> For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Ruby on Rails: Talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="nK7Cj7bqDQAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">rubyonrails-ta...@googlegroups.com.
> To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="nK7Cj7bqDQAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">rubyonra...@googlegroups.com.
> To view this discussion on the web visit
> <a href="https://groups.google.com/d/msgid/rubyonrails-talk/smyckbsskql3om4h9odf01hm.1497784510166%40email.android.com" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/msgid/rubyonrails-talk/smyckbsskql3om4h9odf01hm.1497784510166%40email.android.com&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/msgid/rubyonrails-talk/smyckbsskql3om4h9odf01hm.1497784510166%40email.android.com&#39;;return true;">https://groups.google.com/d/msgid/rubyonrails-talk/smyckbsskql3om4h9odf01hm.1497784510166%40email.android.com.
>
> For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/fe98ca87-9887-4385-a8b0-2cec3d82cdc6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: browser simulator independent of web framework

Jason Fleetwood-Boldt
In reply to this post by Colin Law


I think he's scraping someone else's site. 

You obviously can't do this with Ruby alone, as there is no headless web browser written entirely in Ruby (that's just nonsense)

If you can get phantomjs working on your production site, that's probably the way to go. Look deep into the internals of Capybara to understand how it drives phantomjs. With phantomjs, you basically have a headless web browser and you can use Capybara's DSL to access parts of the page, including evaluating scripts and parsing the DOM.

Just keep in mind phantomjs is an actual executable so it needs to be compiled and built for your production environment explicitly, which might be a little tricky depending on where your site is. 

But a little birdie told me a few months ago that the phantomjs team has decided that once Chrome has a headless mode, which I believe is forthcoming, they plan to abandon phantomjs in favor of Chrome's headless mode. Not sure if that's really true or when that will happen. 

-Jason



On Jun 18, 2017, at 8:36 AM, Colin Law <[hidden email]> wrote:

On 18 June 2017 at 12:21, Martin Luy <[hidden email]> wrote:
I'm extracting content from some websites. Currently I evaluate HTML code
using Nokogiri. But the relevant content is not contained in the responded
body of the HTTP GET request. This is because there is some Javascript code
like $(window).load() or $(document).ready() that will send some Ajax
requests and fill the original HTML code.

So I'm searching for some library that automatically executes Javascript
code and Ajax requests just like a normal browser.

Understood. Don't think I can help I am afraid. Does the site not work
with js disabled in the browser?

Colin


Martin

Von meinem Samsung Gerät gesendet.


-------- Ursprüngliche Nachricht --------
Von: Colin Law <[hidden email]>
Datum: 18.06.17 09:42 (GMT+01:00)
An: "Ruby on Rails: Talk" <[hidden email]>
Betreff: Re: [Rails] browser simulator independent of web framework

On 17 June 2017 at 22:58, Martin L. <[hidden email]> wrote:
Hi all,

Is there any browser simulator that fulfills these requirements:

- gem written in Ruby
- automatically performing Ajax and Javascript code (XSS is not an issue
in
my case)
- independent of the frameworks used by the website (Rails, JavaEE,
ASP.NET,
...)
- only client-side
- no testing
- no browser dependency

What do you mean 'no testing'? If not for testing then what is it for?

Colin

--
You received this message because you are subscribed to a topic in the
Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/rubyonrails-talk/H_YImOIzNNo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
[hidden email].
To post to this group, send email to [hidden email].
To view this discussion on the web visit
https://groups.google.com/d/msgid/rubyonrails-talk/CAL%3D0gLvqHArcWpbZ5gsfCiPg0EF%3D4kD8QzZbC3KnicN58uAZ8A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email].
To post to this group, send email to [hidden email].
To view this discussion on the web visit
https://groups.google.com/d/msgid/rubyonrails-talk/smyckbsskql3om4h9odf01hm.1497784510166%40email.android.com.

For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/CAL%3D0gLsaS8DzC2o0%3DHkKt_%2BMvvzqutm0mSHUj5jf7CoKvnqJEg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


----

Jason Fleetwood-Boldt
[hidden email]
http://www.jasonfleetwoodboldt.com/writing

If you'd like to reply by encrypted email you can find my public key on jasonfleetwoodboldt.com (more about setting GPG: https://gpgtools.org) 

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/A0A534AE-3ECA-41F9-9170-432A98DD743D%40datatravels.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: browser simulator independent of web framework

Walter Lee Davis
Also look at Mechanize, which I believe can do a headless JS scrape of a site. It's purely a scraper, so less likely to be so test-centric.

Walter

> On Jun 19, 2017, at 10:18 AM, Jason Fleetwood-Boldt <[hidden email]> wrote:
>
>
>
> I think he's scraping someone else's site.
>
> You obviously can't do this with Ruby alone, as there is no headless web browser written entirely in Ruby (that's just nonsense)
>
> If you can get phantomjs working on your production site, that's probably the way to go. Look deep into the internals of Capybara to understand how it drives phantomjs. With phantomjs, you basically have a headless web browser and you can use Capybara's DSL to access parts of the page, including evaluating scripts and parsing the DOM.
>
> Just keep in mind phantomjs is an actual executable so it needs to be compiled and built for your production environment explicitly, which might be a little tricky depending on where your site is.
>
> But a little birdie told me a few months ago that the phantomjs team has decided that once Chrome has a headless mode, which I believe is forthcoming, they plan to abandon phantomjs in favor of Chrome's headless mode. Not sure if that's really true or when that will happen.
>
> -Jason
>
>
>
>> On Jun 18, 2017, at 8:36 AM, Colin Law <[hidden email]> wrote:
>>
>> On 18 June 2017 at 12:21, Martin Luy <[hidden email]> wrote:
>>> I'm extracting content from some websites. Currently I evaluate HTML code
>>> using Nokogiri. But the relevant content is not contained in the responded
>>> body of the HTTP GET request. This is because there is some Javascript code
>>> like $(window).load() or $(document).ready() that will send some Ajax
>>> requests and fill the original HTML code.
>>>
>>> So I'm searching for some library that automatically executes Javascript
>>> code and Ajax requests just like a normal browser.
>>
>> Understood. Don't think I can help I am afraid. Does the site not work
>> with js disabled in the browser?
>>
>> Colin
>>
>>>
>>> Martin
>>>
>>> Von meinem Samsung Gerät gesendet.
>>>
>>>
>>> -------- Ursprüngliche Nachricht --------
>>> Von: Colin Law <[hidden email]>
>>> Datum: 18.06.17 09:42 (GMT+01:00)
>>> An: "Ruby on Rails: Talk" <[hidden email]>
>>> Betreff: Re: [Rails] browser simulator independent of web framework
>>>
>>> On 17 June 2017 at 22:58, Martin L. <[hidden email]> wrote:
>>>> Hi all,
>>>>
>>>> Is there any browser simulator that fulfills these requirements:
>>>>
>>>> - gem written in Ruby
>>>> - automatically performing Ajax and Javascript code (XSS is not an issue
>>>> in
>>>> my case)
>>>> - independent of the frameworks used by the website (Rails, JavaEE,
>>>> ASP.NET,
>>>> ...)
>>>> - only client-side
>>>> - no testing
>>>> - no browser dependency
>>>
>>> What do you mean 'no testing'? If not for testing then what is it for?
>>>
>>> Colin
>>>
>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "Ruby on Rails: Talk" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/rubyonrails-talk/H_YImOIzNNo/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> [hidden email].
>>> To post to this group, send email to [hidden email].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/rubyonrails-talk/CAL%3D0gLvqHArcWpbZ5gsfCiPg0EF%3D4kD8QzZbC3KnicN58uAZ8A%40mail.gmail.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Ruby on Rails: Talk" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to [hidden email].
>>> To post to this group, send email to [hidden email].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/rubyonrails-talk/smyckbsskql3om4h9odf01hm.1497784510166%40email.android.com.
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
>> To post to this group, send email to [hidden email].
>> To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/CAL%3D0gLsaS8DzC2o0%3DHkKt_%2BMvvzqutm0mSHUj5jf7CoKvnqJEg%40mail.gmail.com.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> ----
>
> Jason Fleetwood-Boldt
> [hidden email]
> http://www.jasonfleetwoodboldt.com/writing
>
> If you'd like to reply by encrypted email you can find my public key on jasonfleetwoodboldt.com (more about setting GPG: https://gpgtools.org)
>
>
> --
> You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
> To post to this group, send email to [hidden email].
> To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/A0A534AE-3ECA-41F9-9170-432A98DD743D%40datatravels.com.
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/7E161DC2-A4E6-4A12-8C13-1FE180EE57CB%40wdstudio.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: browser simulator independent of web framework

Luma-2
In reply to this post by Jason Fleetwood-Boldt
Thanks guys.. I'm trying Capybara with Poltergeist / phantomjs and the hints from https://github.com/teamcapybara/capybara#calling-remote-servers. I'll post my experiences here again.

Martin


Am Montag, 19. Juni 2017 16:18:48 UTC+2 schrieb Jason FB:


I think he's scraping someone else's site. 

You obviously can't do this with Ruby alone, as there is no headless web browser written entirely in Ruby (that's just nonsense)

If you can get phantomjs working on your production site, that's probably the way to go. Look deep into the internals of Capybara to understand how it drives phantomjs. With phantomjs, you basically have a headless web browser and you can use Capybara's DSL to access parts of the page, including evaluating scripts and parsing the DOM.

Just keep in mind phantomjs is an actual executable so it needs to be compiled and built for your production environment explicitly, which might be a little tricky depending on where your site is. 

But a little birdie told me a few months ago that the phantomjs team has decided that once Chrome has a headless mode, which I believe is forthcoming, they plan to abandon phantomjs in favor of Chrome's headless mode. Not sure if that's really true or when that will happen. 

-Jason



On Jun 18, 2017, at 8:36 AM, Colin Law <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="vJWUptY-DgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">cla...@...> wrote:

On 18 June 2017 at 12:21, Martin Luy <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="vJWUptY-DgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">marti...@...> wrote:
I'm extracting content from some websites. Currently I evaluate HTML code
using Nokogiri. But the relevant content is not contained in the responded
body of the HTTP GET request. This is because there is some Javascript code
like $(window).load() or $(document).ready() that will send some Ajax
requests and fill the original HTML code.

So I'm searching for some library that automatically executes Javascript
code and Ajax requests just like a normal browser.

Understood. Don't think I can help I am afraid. Does the site not work
with js disabled in the browser?

Colin


Martin

Von meinem Samsung Gerät gesendet.


-------- Ursprüngliche Nachricht --------
Von: Colin Law <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="vJWUptY-DgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">cla...@...>
Datum: 18.06.17 09:42 (GMT+01:00)
An: "Ruby on Rails: Talk" <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="vJWUptY-DgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">rubyonra...@googlegroups.com>
Betreff: Re: [Rails] browser simulator independent of web framework

On 17 June 2017 at 22:58, Martin L. <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="vJWUptY-DgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">marti...@...> wrote:
Hi all,

Is there any browser simulator that fulfills these requirements:

- gem written in Ruby
- automatically performing Ajax and Javascript code (XSS is not an issue
in
my case)
- independent of the frameworks used by the website (Rails, JavaEE,
<a href="http://asp.net" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fasp.net\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEvcDUH6J2ftQZWIFVuNHUDniphEA&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fasp.net\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEvcDUH6J2ftQZWIFVuNHUDniphEA&#39;;return true;">ASP.NET,
...)
- only client-side
- no testing
- no browser dependency

What do you mean 'no testing'? If not for testing then what is it for?

Colin

--
You received this message because you are subscribed to a topic in the
Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this topic, visit
<a href="https://groups.google.com/d/topic/rubyonrails-talk/H_YImOIzNNo/unsubscribe" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/topic/rubyonrails-talk/H_YImOIzNNo/unsubscribe&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/topic/rubyonrails-talk/H_YImOIzNNo/unsubscribe&#39;;return true;">https://groups.google.com/d/topic/rubyonrails-talk/H_YImOIzNNo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
<a href="javascript:" target="_blank" gdf-obfuscated-mailto="vJWUptY-DgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">rubyonrails-ta...@googlegroups.com.
To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="vJWUptY-DgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">rubyonra...@googlegroups.com.
To view this discussion on the web visit
<a href="https://groups.google.com/d/msgid/rubyonrails-talk/CAL%3D0gLvqHArcWpbZ5gsfCiPg0EF%3D4kD8QzZbC3KnicN58uAZ8A%40mail.gmail.com" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/msgid/rubyonrails-talk/CAL%3D0gLvqHArcWpbZ5gsfCiPg0EF%3D4kD8QzZbC3KnicN58uAZ8A%40mail.gmail.com&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/msgid/rubyonrails-talk/CAL%3D0gLvqHArcWpbZ5gsfCiPg0EF%3D4kD8QzZbC3KnicN58uAZ8A%40mail.gmail.com&#39;;return true;">https://groups.google.com/d/msgid/rubyonrails-talk/CAL%3D0gLvqHArcWpbZ5gsfCiPg0EF%3D4kD8QzZbC3KnicN58uAZ8A%40mail.gmail.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="vJWUptY-DgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">rubyonrails-ta...@googlegroups.com.
To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="vJWUptY-DgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">rubyonra...@googlegroups.com.
To view this discussion on the web visit
<a href="https://groups.google.com/d/msgid/rubyonrails-talk/smyckbsskql3om4h9odf01hm.1497784510166%40email.android.com" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/msgid/rubyonrails-talk/smyckbsskql3om4h9odf01hm.1497784510166%40email.android.com&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/msgid/rubyonrails-talk/smyckbsskql3om4h9odf01hm.1497784510166%40email.android.com&#39;;return true;">https://groups.google.com/d/msgid/rubyonrails-talk/smyckbsskql3om4h9odf01hm.1497784510166%40email.android.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="vJWUptY-DgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">rubyonrails-ta...@googlegroups.com.
To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="vJWUptY-DgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">rubyonra...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/rubyonrails-" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/msgid/rubyonrails-&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/msgid/rubyonrails-&#39;;return true;">https://groups.google.com/d/msgid/rubyonrails-talk/CAL%3D0gLsaS8DzC2o0%3DHkKt_%2BMvvzqutm0mSHUj5jf7CoKvnqJEg%<a href="http://40mail.gmail.com" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://40mail.gmail.com&#39;;return true;" onclick="this.href=&#39;http://40mail.gmail.com&#39;;return true;">40mail.gmail.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.


----

Jason Fleetwood-Boldt
<a href="javascript:" target="_blank" gdf-obfuscated-mailto="vJWUptY-DgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">te...@...
<a href="http://www.jasonfleetwoodboldt.com/writing" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.jasonfleetwoodboldt.com%2Fwriting\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEDgYb2R9Pc71odefP_iaMJ_TLgaA&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.jasonfleetwoodboldt.com%2Fwriting\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEDgYb2R9Pc71odefP_iaMJ_TLgaA&#39;;return true;">http://www.jasonfleetwoodboldt.com/writing

If you'd like to reply by encrypted email you can find my public key on <a href="http://jasonfleetwoodboldt.com" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fjasonfleetwoodboldt.com\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGb1ZrXS-vy-0emKzzghQyxDWw2gw&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fjasonfleetwoodboldt.com\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGb1ZrXS-vy-0emKzzghQyxDWw2gw&#39;;return true;">jasonfleetwoodboldt.com (more about setting GPG: <a href="https://gpgtools.org" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgpgtools.org\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHndH9tmbP6n6vEQOQUZHmsUsYe9g&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgpgtools.org\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHndH9tmbP6n6vEQOQUZHmsUsYe9g&#39;;return true;">https://gpgtools.org) 

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/b62e5fa7-9ac0-4d6f-ba7b-f9c5580172f3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.