Native webrequests with WS2_32

I've been trying to create a back-end for a RSS Client I'm working on. I took a snippet of code for the basic architecture of the code (besides the code is also available on the Winsock2 man-page).
My question is beyond the scope of the basis itself, but rather of the (at least to me) unprecedented HTTP 301 Moved Permanently problem.

I now know that the Location variable refers to the new URI where the resource has been relocated to, but even making a GET request there doesn't do a thing.

So I started to think that it has something to do with the S of HTTPS, is there some kind of token or extra code I need to include in my snippet (not really mine lol) so I can make a proper "HTTPS GET REQUEST" (if that is even a thing).

I've been experimenting with C++ recently, I've got quite some prior experience on python, but most of python is wrapped around libraries and C++ was kind of my escape of it. So I'm trying to do this as low level of abstraction as it possible is (because ws2_32 is also a lib).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
#include <string.h>
#include <winsock2.h>
#include <windows.h>
#include <iostream>
#include <vector>
#include <locale>
#include <sstream>
using namespace std;
#pragma comment(lib,"ws2_32.lib")

int main(void){

WSADATA wsaData;
SOCKET Socket;
SOCKADDR_IN SockAddr;
int lineCount=0;
int rowCount=0;
struct hostent *host;
locale local;
char buffer[10000];
int i = 0 ;
int nDataLength;
string website_HTML;

// website url
string url = "www.facebook.com";

//HTTP GET
string get_http = "GET / HTTP/1.1\r\nHost: " + url + "\r\nConnection: close\r\n\r\n";


    if (WSAStartup(MAKEWORD(2,2), &wsaData) != 0){
        cout << "WSAStartup failed.\n";
        system("pause");
        //return 1;
    }

    Socket=socket(AF_INET,SOCK_STREAM,IPPROTO_TCP);
    host = gethostbyname(url.c_str());

    SockAddr.sin_port=htons(80);
    SockAddr.sin_family=AF_INET;
    SockAddr.sin_addr.s_addr = *((unsigned long*)host->h_addr);

    if(connect(Socket,(SOCKADDR*)(&SockAddr),sizeof(SockAddr)) != 0){
        cout << "Could not connect";
        system("pause");
        //return 1;
    }

    // send GET / HTTP
    send(Socket,get_http.c_str(), strlen(get_http.c_str()),0 );

    // recieve html
    while ((nDataLength = recv(Socket,buffer,10000,0)) > 0){        
        int i = 0;
        while (buffer[i] >= 32 || buffer[i] == '\n' || buffer[i] == '\r'){

            website_HTML+=buffer[i];
            i += 1;
        }               
    }

    closesocket(Socket);
    WSACleanup();

    // Display HTML source 
    cout<<website_HTML;

    // pause
    cout<<"\n\nPress ANY key to close.\n\n";
    cin.ignore(); cin.get(); 


 return 0;
}


Any help given is highly appreciated!
Sending:
You're sending and HTTP request, but you'll need and HTTPS these days, see the comment above.
 
send(Socket,get_http.c_str(), strlen(get_http.c_str()),0 );

You alrady have the length right?
 
send(Socket,get_http.c_str(), get_http.size(), 0);


Receiving:
You're only getting the reply payload, you're not processing it. Most websites will send javascript to be run by the browser, you're a long way from handling that.

You need to read everything, because you don't know what's coming back. You do that by reading in a loop, and stop when recv() returns zero, or as you've said Connection: close in the request, -1.

Don't check each byte, just append the whole block to your reply string, something like:
1
2
3
4
5
6
7
8
9
std::string reply;
ssize_t nbytes = 0;
for (;;) {
    char buffer[4*1024];
    nbytes = recv(Socket, buffer, sizeof(buffer), 0);
    if (nbytes <= 0)
        break;
    reply.append({buffer, std::size_t(nbytes)});
}


It's hard, that's why apps used to use something like WebKit, but now just run a headless browser to talk to websites properly (see SeleniumHQ).
Last edited on
Great response, is there any resource on where to start for handling JavaScript output properly? I'm keen on learning it since I really like the idea of making the best use out of memory and processing power.

Many thanks!
oh, if you are getting code back, its going to be a bigger project. Is it code or data that comes back?
Last edited on
Yes! I know maybe my reply was too naive, I meant if there's any learning resource to get to the level of implementing parsers correctly, I'm truly bored of just coding with "out of the box" libraries. Not to say that they're not useful, on the contrary. But i've been struggling to get to certain point of knowledge where I can feel comfortable approaching problems by myself like handling XML with a self written parser, and so on. I study a lot but there are just huge gaps between some of my abilities and trying to tie them all together it's going to take a long time.
Last edited on
I'm trying to write a RSS Feed for Vim.

It would be data, since I'd have to parse just XML responses. But I'd love to implement some kind of JavaScript parsing to also be able to display images.

Thanks for taking your time responding to such a basic inquiry :).
Last edited on
code parsers are rather advanced.
Data parsing can be exceeding simple and gets worse from there.

Lets take simple JSON type responses.
- its plain text usually (not sure if it even supports binary, but no one much uses it if it does)
- its format is usually obnoxiously simple: the name of what it is followed by the value it has. Sometimes you have some extras like {} symbols you can ignore :P

you can make a json parser, then, with little more than a C++ string and a <map> and some basic c++ toys like ascii to double (stod, to_string, atof, stringstream, or whatever else you like) or integer, a little validation, and some glue. Start with something like that.

Once that seems as simple as it is, you can move up to something like simple XML that may have a couple of nested fields. Then you have to figure out how to handle the recursive or tree like structure of the data where a this has a that which in turn has another inside it... and the nested items may or may not exist on every record ... and so on. Even so such a format can be simple enough that a DIY solution gets you what you need (I did any number of these fighting with databases, where I was mostly scanning for problems not actually extracting much of the data) without a big heavy library that needs the XSD input and all kinds of aggravations to get it going. You do NOT want to spend the time to do a real full bore XML parser, that is why we have libraries, but you can mess around with a simple format that is a little richer than JSON and not complicated enough to have to do all the bells and whistles that 'could possibly come up' in some formats.

I also advise that you parse a binary file into a data structure for practice. Here again this can be simple (much like the json, and actually can be read with a single .read() command into an appropriate struct that you built or vector of those) or complex (eg multiple record types, headers and footers, the next record type repeats 1000 times markers, recurrence patterns and other aggravations).

When you can do all that, THEN you may want to start with some simple code parsers; there are academic standard fake languages designed for your first crack at this with simple code language rules that you can get your feet wet on. If you still want to proceed doing that kind of work, you will need to study formal grammars and language theory a bit on top of the mechanics involved in coding it up. You can't code it if you can't read the specs, and the format they are presented in is rather challenging for a while.

Hopefully this helps set you a path if you want to study off on this tangent. I am no expert on it; Ive done a lot of really simple parses the bulk of which took spew from a machine (no human error possible, consistent formatting) and got it into a usable format. I haven't done one for a language since college, and as mentioned above, the one I did then was a fake language and created a fake assembly language output.
Last edited on
I can imagine. I once built a parser that recognized spaces between integers in Golang. Was quite an adventure.

I've parsed binary files while building parallel databases, and ordered them in arrays. struggled with that also quite a bit figuring out how to classify input :p, the road does look tough haha.
I want to take this path so I can get a better understanding of systems architecture. All these years reading about it, it just seem like 90% of the hard work is figuring out how to parse data; at all levels. Which is remarkable, considering that parsing, as you mentioned, can be as easy as it can be hard. I'll look into the examples you gave me to get a better grasp at the concept and then (hopefully soon enough) revisit this XML parser I haven't even begun to conceptualize.

The stuff with the sockets I'll figure out. I'm already reading the resources and the
https://www.rfc-editor.org/rfc/rfc2818. In Python such problems are so easy to ignore. Now I can see why it is a Scripting language rather than hard embedded programming. (don't even get me started on GIL)

Many thanks for the patience and proper answer. Have a nice day! :^)
Last edited on
Topic archived. No new replies allowed.