Page 5 of 5

Re: How Do I Learn About [n]?

Posted: Sat Sep 01, 2012 5:35 pm UTC
by Ben-oni
Shivahn wrote:Ok, so from time to time a task comes up at work that is mind-numbing and repetitive, and I am lazy, so as soon as something like that shows up I try to find a way to make the computer do that for me. Well, one of those is coming up, but fixing it requires knowledge in an arena I have not coded in: internet interfaces. I need to basically take a massive file with information and format it (no big deal), then do stuff to it. I guess I'll describe what I do now. I log into a website (which creates a pop-up window that's the one I actually communicate with), use one of the fields on the site to search and see if the entity I'm working on is in the system, if not, click a link which brings me to a registration page and then fill that page in with data about the entity, click another hyperlink and mess with a couple of drop-down menus, click (I THINK it's a hyperlink) in a calendar-type thing, then click a checkbox next to a specific time, then click a save button.

I have a big file from which I can get all the data I need for the forms, but I don't know how to write a program that communicates with a website like this. I'd probably write the thing in Python, but a language-agnostic tutorial would be excellent. Does anyone have any suggestions? I know very basic network theory, but I wouldn't know how to begin either logging in with a console-based program or navigating menus, boxes, search fields, and so on with one.

There are two aspects to what you need to deal with. The first part is HTTP, which is pretty straightforward. You won't have to know the specifics of the protocol, just the nature of the GET and POST messages. You could write code to handle that (it's not hard), but every language already has an API written for doing so.

The second part is navigating the HTML DOM. This could be a bit finicky. If the web pages are poorly written and don't conform to standards, some HTML parsers could fail miserably. Anyways, you'll have to understand the HTML tree so you can figure out whether your entity is present. The code will look something like this:

Code: Select all

for entity in database:
   dom = httpGET("http://domain.name/path/page?entity=" + entity.id)
   if not entityExists(dom):
      httpPOST("http://domain.name/path/save", "field1=" + entity.field1 + "&field2=" + entity.field2)


That's very rough and full of bad practices, but it should get you started. Of course, ideally you'd just use SQL and be done, but that assumes you can access the database backend...

Re: How Do I Learn About [n]?

Posted: Sun Sep 02, 2012 4:36 pm UTC
by Shivahn
Hmm, I see. I'll need to look into those. I really don't know much about HTTP so there is quite a bit I have to learn.

Re: How Do I Learn About [n]?

Posted: Sun Sep 02, 2012 10:02 pm UTC
by bittyx
@Shivahn:

I'm a PHP dev, and haven't had to do a task like this, but if I were to do it, I'd likely use Python as well (the basics of what I'd do are pretty much the same, just using different libraries/syntax) - a fine opportunity to learn a bit of Python as well :D

First off, you should check whether your pop-up is loaded via AJAX or as a separate page or whatever. Now, assuming the popup has a <form> element (use Firebug in FF or Chrome developer tools or whatever, to find this out), it's probably POSTing your search query to some page (I have no idea how experienced you are with web-stuff - <form action="page.php" method="POST"> means that the form data is being sent via the POST method to page.php). Basically, you need to find out the exact URL that the form is posting to, and send data there from your script - it might also be that this is done via javascript, so you would have to dig into javascript and check where the search query is being sent to.

I'd post my actual search query using cURL (I see that Python has pycurl - I'd either try that, since I'm somewhat familiar with cURL, or play around with urllib2, as demonstrated here). You don't really need to know much about HTTP for this, since the libraries you use will take care of most of that (though if you can spare the time, lightweight projects like this one are the ideal place to learn something new!)

Let's say you find out that this is http://www.site.com/search.php - since this is behind a login, your script should also do that as well. As a quick hack, I'd likely just log in with my browser, and check the cookies - there will be some kind of session id or something there - I'd probably just copy all the cookies there and send them from my script as well, including also the user-agent string my browser sends (since a lot of sites try to match user-agents within a single session, to make session hijacking harder) - another important thing is to check whether (and how often) the site regenerates session IDs (basically, how often does the site send the session cookie with a different value), so I'd know whether my script also has to accept and set cookies - though I can't imagine this would be hard to implement anyway (cookies are basically just an associative array you keep memorized).

Now the form that posts the data is likely something like

Code: Select all

<form action="search.php" method="POST">
<input type="text" name="query" value="" />
<input type="submit" name="submit" value="Submit" />
</form>

That means that the POSTed request will be something like

Code: Select all

query=somethingsomething
(don't forget to urlencode somethingsomething), which is what you need to send to the search script - it's very likely though that your library (pycurl/urllib2/whatever) already deals with this, and all you need to provide is the raw POST array (PHP's cURL library has this so I assume others do as well).

Okay, so now we can fetch the resources we're interested in, and send data - but what do we do with them?

You probably want to use some DOM parser, because that's the easiest (and the correct) way to do stuff like this. A problematic situation would be if their pages have malformed HTML, but luckily, Python has Beautiful Soup which deals incredibly smart with bad HTML code.

At this point, you should manually inspect the HTML structure of all the relevant pages (ie. the link that brings you to the registration page, the fields in the registration page, etc.) and from then on it's just a matter of POSTing the correct data once again to the correct link (all of which you can find out with a quick inspection from your browser). Of course, I assume it's easy enough to retrieve your data from your big file, so that's mostly that...

Helpful hint: when traversing the DOM, you are pretty much only interested in how to get to the element you want in the easiest possible way, while identifying it uniquely. To be more specific, say you have something like this:

Code: Select all

<html>
<head>
<title>Page</title>
</head>
<body>
<div id="container">
  <form action="register.php" method="POST">
    <input type="text" name="item_name" value="Default item name" />
    <input type="text" name="item_qty" value="0" />
    <input type="submit" name="submit" value="Submit" />
  </form>
  <a id="go-to-drop-down-menus" href="some_other_script.php">Linky!</a>
</div>
</body>
</html>

Here, one way to reach the link (assuming you want to, say, find out the href the link is pointing to) is "html > body > div#container > a#go-to-drop-down-menus", which would be traversing the tree from the root - but since you know element IDs are unique within a document (assuming, again, that the HTML isn't awful), you can just find it via its id - "a#go-to-drop-down-menus" - in Beautiful Soup, you'd do something like soup.find(id="go-to-drop-down-menus").get('href') and that's it. Of course, I'm mostly just rambling here about stuff that comes to mind - this is really where you need to figure out the document structure on your own and improvise from there. Also, if you're familiar with CSS selectors (or jQuery selectors which mostly just augment those in CSS), you could maybe use pyquery, for a more familiar syntax.

Mostly, all that this mini-project comes down to is sending appropriate data to appropriate URLs (implementing cookies as well, since you need a login), and doing some light DOM traversal along the way to control some of the flow (ie. whether or not an entity should be registered), and find URLs that your hyperlinks lead to (assuming they even change at all - you should investigate to find this out).

I'm not very familiar with Python (I've played with it a bit for Project Euler purposes, but not much beyond that), but I'd probably rate this at about a few hours of work (including researching all the libraries I need, as well as how to use them). Ideally, if everything works on the first try, I'd do it in, say, an hour (maybe half an hour in PHP, but PHP isn't really well-suited to tasks like this), but since you're dealing with a third-party system you have no control over, unforeseen problems are likely to come up, and that's where most of the time would be lost. Well, ideally, you could ask the website-owners to just grant you direct access to their database, and do everything in a few minutes, but it seems that this is not an option :P

Of course, if you have any other questions, feel free to ask - I've tried to be language-agnostic, but since you've already mentioned Python, I've done some quick google searches to find some suitable libraries for this, so I hope I've helped you at least a bit.

Re: How Do I Learn About [n]?

Posted: Sun Sep 02, 2012 11:36 pm UTC
by thoughtfully
Another relevant Python library is mechanize, which is based on a PERL module of the same name. I'm sure cURL is available for PERL as well, or any number of languages. I haven't done much work of this sort, but you might find one of them is a better fit for you than the other, or they might be both useful for distinct subtasks, etc.

Re: How Do I Learn About [n]?

Posted: Tue Sep 04, 2012 6:13 pm UTC
by Shivahn
Thanks! There looks to be a lot I can look into. I appreciate the suggestions/ideas.

Re: How Do I Learn About [n]?

Posted: Tue Apr 23, 2013 4:59 pm UTC
by styrofoam
I'd recommend, instead of using Python or Perl, using Node.JS with JSDOM. Since it's just JavaScript and a DOM, you can use all the knowledge, documentation, and muscle memory of writing web apps to write your scraper, or even execute code from the page itself, if it relies on JS.

That's just me, though.

Re: How Do I Learn About [n]?

Posted: Mon Jul 29, 2013 4:40 pm UTC
by wolf99
Any good tutorial for VB.Net, slightly (but not too much) beyond the basics?

Im pretty swish at C with embedded systems, understand the basics of the concepts involved in OOP and have thrown programs together in VB6 some time ago.
So the "introductions" or "hello world" styly tutorials to VB dont normally go far enough. Other stuff I've come across has missed a giant gap of stuff from that level though...
I have tried working my way through MSDN but it seems not organised too well for linear learning, more for referencing.
Cheers

Re: How Do I Learn About [n]?

Posted: Tue Oct 01, 2013 4:54 am UTC
by Carnildo
Any suggestions for "Javascript as a 25th language" resources?

Re: How Do I Learn About [n]?

Posted: Wed Oct 02, 2013 8:50 am UTC
by Jplus
I think the sections "Features" and "Syntax" of https://en.wikipedia.org/wiki/Javascript fit the bill.

Re: How Do I Learn About [n]?

Posted: Wed May 21, 2014 3:43 pm UTC
by Diadem
Does anybody have some good resources on WPF / Xaml with C#?

I have been trying to learn this now for a few weeks, but it has the steepest learning curve in the history of learning curves. Most resources out there seem to assume you already know everything there is to know about C# or .net, both of which I have 0 experience with. And with xaml everything seems to depend on everything else, so I don't know where to begin.

Re: How Do I Learn About [n]?

Posted: Thu May 22, 2014 6:39 pm UTC
by Yakk
wolf99 wrote:Any good tutorial for VB.Net, slightly (but not too much) beyond the basics?

VB.Net basically is a thin skin on C#.net. The difference between the two languages is extremely small, with VB.net looking more like VB, and C#.net looking more like C++.

So I'd advise looking at both VB.net and C#.net sources. Find some interesting C# code, and learn how to transcribe it to VB.net.

Re: How Do I Learn About [n]?

Posted: Mon Jun 30, 2014 2:06 pm UTC
by rath358
So, I need to build a windows app in c++ that has a nice interface and such, instead of just taking arguments as I an used to doing for my coursework. I have been told that .NET is the way to go for this. (Qt appears to be out for legal reasons, and I have to stick with my c++ code because of another library I use)
Advice on where to learn the basics of .NET and how to integrate it with a c++ program?

Edit: upon further reading, it appears that although using c++ with winforms or wpf is "supported", it is not the suggested route to go down, the c++ winforms editor has been deprecated since VS2010, and there are few tutorials or useful repositories of knowledge for me to reference. So it looks like the best way forward is to design the interface in c#, and make a wrapper around the underlying CPP functionality. Any advice on how to get started with that?

Re: How Do I Learn About [n]?

Posted: Mon Jul 07, 2014 8:18 pm UTC
by FLHerne
rath358 wrote:Qt appears to be out for legal reasons.

It's under LGPL (among other things), so you can dynamically-link to it (as a DLL on Windows, or .so on Linux) from your program regardless of the license used for the rest of the program. :wink:

If you really need both static linking and a non-GPL license, Digia sell commercial licenses. No idea what the terms and pricing are, because my stuff is GPL. :P

Re: How Do I Learn About [n]?

Posted: Mon Jul 14, 2014 2:03 pm UTC
by rath358
I am developing this for a commercial software product and our legal team is really picky about third party software, so I am afraid that it is still out. Thank you for that bit of knowledge, though.

Re: How Do I Learn About [n]?

Posted: Mon Jul 14, 2014 2:29 pm UTC
by Yakk
Is it a windows "modern" app?

XAML mediated by C++/CLI?

It is a good habit to split your UI from your Engine anyhow. Tangling the two is code smell.

Re: How Do I Learn About [n]?

Posted: Mon Jul 14, 2014 2:45 pm UTC
by rath358
The c++ side is written as a win32 console application. I wrote a basic wrapper around it last week, but have been away since.
I am working on learning c# and putting together a simple UI in c# with WPF, but I am at a loss on how to translate my c++ code into the c++/CLI format that seems to be required to put the pieces together.

Edit (7-16-14): Once I found the right resource, I was able to compile the c++ code into a DLL, after only a little hair pulling. I haven't successfully called it from C#, but that is only because I haven't learned enough c# to hack a test together yet.
For future googlers, this tutorial got me on the right path. I had to do a little additional searching and a fair bit of debugging to get my code to compile properly, but it only ended up being a few lines of source code modification for the DLL itself, another dozen or two to test it, and of course setting it all up as a new project in VS 2013 and fiddling with the properties.
Thanks for all of the help! It might not seem like much, but you comments fixed a couple of bits of incomplete/incorrect knowledge, and really set me on the right path.

Re: How Do I Learn About [n]?

Posted: Mon Jul 14, 2014 5:54 pm UTC
by Yakk
Use C++/CLI as a layer between your C++ code and the C# code, don't turn your code into a bunch of C++/CLI.

Re: How Do I Learn About [n]?

Posted: Mon Jul 14, 2014 6:06 pm UTC
by EvanED
Another option would be to write pure C++ and compile to a DLL; then you can make native calls from C# to that DLL. I don't know how ugly this is and you might have to write a bit of glue code, but it may or may not be nicer than C++/CLI stuff.

Re: How Do I Learn About [n]?

Posted: Tue Jul 29, 2014 6:00 pm UTC
by Sizik
Any good resources for learning modern C++, given the fact that I know Java and am familiar with C?

Re: How Do I Learn About [n]?

Posted: Tue Jul 29, 2014 7:16 pm UTC
by Yakk
By "know Java", I assume you don't consider generics complicated (Ie, you have a reasonable level of expertise). Could you implement your own class-instance OO framework in Java or C (a "class" is plain old data that describes the layout of data in instances of that class, and methods to operate on said instances of the class (including creation and destruction). A reasonable framework also handles inheritance, virtual and non-virtual functions, and optional instance-local method overrides).

http://isocpp.org/blog/2014/03/effectiv ... ott-meyers looks good from the index.

If you are a novice at Java and C, that is probably the wrong spot to start.

Re: How Do I Learn About [n]?

Posted: Wed Oct 28, 2015 4:11 am UTC
by vodka.cobra
If you feel that it would be appropriate to add "application security" to the list in OP, there's a curated list on Github for that: https://github.com/paragonie/awesome-appsec

Re: How Do I Learn About [n]?

Posted: Thu Jun 30, 2016 6:19 pm UTC
by Quizatzhaderac
Can anyone recommend any good in depth resources on spring and/or hibernate? I could through together a pet shop app easily enough, but I"m responsible for a few legacy apps, so I'm actually much more interested in how things don't work than how hey do, if that makes any sense.

Re: How Do I Learn About [n]?

Posted: Mon Jun 10, 2019 8:39 am UTC
by Ereyokax
I have a friend who wants help with a "game maker" program he is trying to create. He wants me to learn a new programming language, and he suggests Java or Python.
Which should I use for this purpose? Or should I go with a completely different programming language? Keep in mind that I have used Alice2 for 2 years and I have no other programming experience.
And Id like some advice on DSLRs as I’d like to buy one for my wife.

I’m looking for something that has both great ease of use and auto settings for a novice and also the ability learn, experiment and take great pictures. I’d also like full HD video as well.

So far I’ve been looking at the Canon E650 and Nikon D3200. What is the STW weapon of choice and what should I be looking out for?

I’m looking at midrange (ish) as I don’t really want to have to upgrade in a few years. I’ve no existing lenses but hope to develop an assortment over the years.

Re: How Do I Learn About [n]?

Posted: Mon Jun 10, 2019 2:44 pm UTC
by Yakk
Python.

Java still has too much baggage from the designers who where traumatized by bad C++ code in the 1990s.