TranslateProject/sources/tech/20220310 How to use undocumented web APIs.md
DarkSun 68d3aac7cf 选题[tech]: 20220310 How to use undocumented web APIs
sources/tech/20220310 How to use undocumented web APIs.md
2022-03-11 05:02:39 +08:00

14 KiB
Raw Blame History

How to use undocumented web APIs

Hello! A couple of days I wrote about tiny personal programs, and I mentioned that it can be fun to use “secret” undocumented APIs where you need to copy your cookies out of the browser to get access to them.

A couple of people asked how to do this, so I wanted to explain how because its pretty straightforward. Well also talk a tiny bit about what can go wrong, ethical issues, and how this applies to your undocumented APIs.

As an example, lets use Google Hangouts. Im picking this not because its the most useful example (I think theres an official API which would be much more practical to use), but because many sites where this is actually useful are smaller sites that are more vulnerable to abuse. So were just going to use Google Hangouts because Im 100% sure that the Google Hangouts backend is designed to be resilient to this kind of poking around.

Lets get started!

step 1: look in developer tools for a promising JSON response

I start out by going to https://hangouts.google.com, opening the network tab in Firefox developer tools and looking for JSON responses. You can use Chrome developer tools too.

Heres what that looks like

The request is a good candidate if it says “json” in the “Type” column”

I had to look around for a while until I found something interesting, but eventually I found a “people” endpoint that seems to return information about my contacts. Sounds fun, lets take a look at that.

step 2: copy as cURL

Next, I right click on the request Im interested in, and click “Copy” -> “Copy as cURL”.

Then I paste the curl command in my terminal and run it. Heres what happens.


    $ curl 'https://people-pa.clients6.google.com/v2/people/?key=REDACTED' -X POST ........ (a bunch of headers removed)
    Warning: Binary output can mess up your terminal. Use "--output -" to tell
    Warning: curl to output it to your terminal anyway, or consider "--output
    Warning: <FILE>" to save to a file.

You might be thinking thats weird, whats this “binary output can mess up your terminal” error? Thats because by default, browsers send an Accept-Encoding: gzip, deflate header to the server, to get compressed output.

We could decompress it by piping the output to gunzip, but I find it simpler to just not send that header. So lets remove some irrelevant headers.

step 3: remove irrelevant headers

Heres the full curl command line that I got from the browser. Theres a lot here! I start out by splitting up the request with backslashes (\) so that each header is on a different line to make it easier to work with:


    curl 'https://people-pa.clients6.google.com/v2/people/?key=REDACTED' \
    -X POST \
    -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:96.0) Gecko/20100101 Firefox/96.0' \
    -H 'Accept: */*' \
    -H 'Accept-Language: en' \
    -H 'Accept-Encoding: gzip, deflate' \
    -H 'X-HTTP-Method-Override: GET' \
    -H 'Authorization: SAPISIDHASH REDACTED' \
    -H 'Cookie: REDACTED'
    -H 'Content-Type: application/x-www-form-urlencoded' \
    -H 'X-Goog-AuthUser: 0' \
    -H 'Origin: https://hangouts.google.com' \
    -H 'Connection: keep-alive' \
    -H 'Referer: https://hangouts.google.com/' \
    -H 'Sec-Fetch-Dest: empty' \
    -H 'Sec-Fetch-Mode: cors' \
    -H 'Sec-Fetch-Site: same-site' \
    -H 'Sec-GPC: 1' \
    -H 'DNT: 1' \
    -H 'Pragma: no-cache' \
    -H 'Cache-Control: no-cache' \
    -H 'TE: trailers' \
    --data-raw 'personId=101777723309&personId=1175339043204&personId=1115266537043&personId=116731406166&extensionSet.extensionNames=HANGOUTS_ADDITIONAL_DATA&extensionSet.extensionNames=HANGOUTS_OFF_NETWORK_GAIA_GET&extensionSet.extensionNames=HANGOUTS_PHONE_DATA&includedProfileStates=ADMIN_BLOCKED&includedProfileStates=DELETED&includedProfileStates=PRIVATE_PROFILE&mergedPersonSourceOptions.includeAffinity=CHAT_AUTOCOMPLETE&coreIdParams.useRealtimeNotificationExpandedAcls=true&requestMask.includeField.paths=person.email&requestMask.includeField.paths=person.gender&requestMask.includeField.paths=person.in_app_reachability&requestMask.includeField.paths=person.metadata&requestMask.includeField.paths=person.name&requestMask.includeField.paths=person.phone&requestMask.includeField.paths=person.photo&requestMask.includeField.paths=person.read_only_profile_info&requestMask.includeField.paths=person.organization&requestMask.includeField.paths=person.location&requestMask.includeField.paths=person.cover_photo&requestMask.includeContainer=PROFILE&requestMask.includeContainer=DOMAIN_PROFILE&requestMask.includeContainer=CONTACT&key=REDACTED'

This can seem like an overwhelming amount of stuff at first, but you dont need to think about what any of it means at this stage. You just need to delete irrelevant lines.

I usually just figure out which headers I can delete with trial and error I keep removing headers until the request starts failing. In general you probably dont need Accept*, Referer, Sec-*, DNT, User-Agent, and caching headers though.

In this example, I was able to cut the request down to this:


    curl 'https://people-pa.clients6.google.com/v2/people/?key=REDACTED' \
    -X POST \
    -H 'Authorization: SAPISIDHASH REDACTED' \
    -H 'Content-Type: application/x-www-form-urlencoded' \
    -H 'Origin: https://hangouts.google.com' \
    -H 'Cookie: REDACTED'\
    --data-raw 'personId=101777723309&personId=1175339043204&personId=1115266537043&personId=116731406166&extensionSet.extensionNames=HANGOUTS_ADDITIONAL_DATA&extensionSet.extensionNames=HANGOUTS_OFF_NETWORK_GAIA_GET&extensionSet.extensionNames=HANGOUTS_PHONE_DATA&includedProfileStates=ADMIN_BLOCKED&includedProfileStates=DELETED&includedProfileStates=PRIVATE_PROFILE&mergedPersonSourceOptions.includeAffinity=CHAT_AUTOCOMPLETE&coreIdParams.useRealtimeNotificationExpandedAcls=true&requestMask.includeField.paths=person.email&requestMask.includeField.paths=person.gender&requestMask.includeField.paths=person.in_app_reachability&requestMask.includeField.paths=person.metadata&requestMask.includeField.paths=person.name&requestMask.includeField.paths=person.phone&requestMask.includeField.paths=person.photo&requestMask.includeField.paths=person.read_only_profile_info&requestMask.includeField.paths=person.organization&requestMask.includeField.paths=person.location&requestMask.includeField.paths=person.cover_photo&requestMask.includeContainer=PROFILE&requestMask.includeContainer=DOMAIN_PROFILE&requestMask.includeContainer=CONTACT&key=REDACTED'

So I just need 4 headers: Authorization, Content-Type, Origin, and Cookie. Thats a lot more manageable.

step 4: translate it into Python

Now that we know what headers we need, we can translate our curl command into a Python program! This part is also a pretty mechanical process, the goal is just to send exactly the same data with Python as we were with curl.

Heres what that looks like. This is exactly the same as the previous curl command, but using Pythons requests. I also broke up the very long request body string into an array of tuples to make it easier to work with programmmatically.


    import requests
    import urllib

    data = [
        ('personId','101777723'), # I redacted these IDs a bit too
        ('personId','117533904'),
        ('personId','111526653'),
        ('personId','116731406'),
        ('extensionSet.extensionNames','HANGOUTS_ADDITIONAL_DATA'),
        ('extensionSet.extensionNames','HANGOUTS_OFF_NETWORK_GAIA_GET'),
        ('extensionSet.extensionNames','HANGOUTS_PHONE_DATA'),
        ('includedProfileStates','ADMIN_BLOCKED'),
        ('includedProfileStates','DELETED'),
        ('includedProfileStates','PRIVATE_PROFILE'),
        ('mergedPersonSourceOptions.includeAffinity','CHAT_AUTOCOMPLETE'),
        ('coreIdParams.useRealtimeNotificationExpandedAcls','true'),
        ('requestMask.includeField.paths','person.email'),
        ('requestMask.includeField.paths','person.gender'),
        ('requestMask.includeField.paths','person.in_app_reachability'),
        ('requestMask.includeField.paths','person.metadata'),
        ('requestMask.includeField.paths','person.name'),
        ('requestMask.includeField.paths','person.phone'),
        ('requestMask.includeField.paths','person.photo'),
        ('requestMask.includeField.paths','person.read_only_profile_info'),
        ('requestMask.includeField.paths','person.organization'),
        ('requestMask.includeField.paths','person.location'),
        ('requestMask.includeField.paths','person.cover_photo'),
        ('requestMask.includeContainer','PROFILE'),
        ('requestMask.includeContainer','DOMAIN_PROFILE'),
        ('requestMask.includeContainer','CONTACT'),
        ('key','REDACTED')
    ]
    response = requests.post('https://people-pa.clients6.google.com/v2/people/?key=REDACTED',
        headers={
            'X-HTTP-Method-Override': 'GET',
            'Authorization': 'SAPISIDHASH REDACTED',
            'Content-Type': 'application/x-www-form-urlencoded',
            'Origin': 'https://hangouts.google.com',
            'Cookie': 'REDACTED',
        },
        data=urllib.parse.urlencode(data),
    )

    print(response.text)

I ran this program and it works it prints out a bunch of JSON! Hooray!

Youll notice that I replaced a bunch of things with REDACTED, thats because if I included those values you could access the Google Hangouts API for my account which would be no good.

and were done!

Now I can modify the Python program to do whatever I want, like passing different parameters or parsing the output.

Im not going to do anything interesting with it because Im not actually interested in using this API at all, I just wanted to show what the process looks like.

But we get back a bunch of JSON that you could definitely do something with.

curlconverter looks great

Someone commented that you can translate curl to Python (and a bunch of other languages!) automatically with https://curlconverter.com/ which looks amazing Ive always done it manually. I tried it out on this example and it seems to work great.

figuring out how the API works is nontrivial

I dont want to undersell how difficult it can be to figure out how an unknown API works its not obvious! I have no idea what a lot of the parameters to this Google Hangouts API do!

But a lot of the time there are some parameters that seem pretty straightforward, like requestMask.includeField.paths=person.email probably means “include each persons email address”. So I try to focus on the parameters I do understand more than the ones I dont understand.

this always works (in theory)

Some of you might be wondering can you always do this?

The answer is sort of yes browsers arent magic! All the information browsers send to your backend is just HTTP requests. So if I copy all of the HTTP headers that my browser is sending, I think theres literally no way for the backend to tell that the request isnt sent by my browser and is actually being sent by a random Python program.

Of course, we removed a bunch of the headers the browser sent so theoretically the backend could tell, but usually they wont check.

There are some caveats though for example a lot of Google services have backends that communicate with the frontend in a totally inscrutable (to me) way, so even though in theory you could mimic what theyre doing, in practice it might be almost impossible. And bigger APIs that encounter more abuse will have more protections.

Now that weve seen how to use undocumented APIs like this, lets talk about some things that can go wrong.

problem 1: expiring session cookies

One big problem here is that Im using my Google session cookie for authentication, so this script will stop working whenever my browser session expires.

That means that this approach wouldnt work for a long running program (Id want to use a real API), but if I just need to quickly grab a little bit of data as a 1-time thing, it can work great!

problem 2: abuse

If Im using a small website, theres a chance that my little Python script could take down their service because its doing way more requests than theyre able to handle. So when Im doing this I try to be respectful and not make too many requests too quickly.

This is especially important because a lot of sites which dont have official APIs are smaller sites with less resources.

In this example obviously this isnt a problem I think I made 20 requests total to the Google Hangouts backend while writing this blog post, which they can definitely handle.

Also if youre using your account credentials to access the API in a excessive way and you cause problems, you might (very reasonably) get your account suspended.

I also stick to downloading data thats either mine or thats intended to be publicly accessible Im not searching for vulnerabilities.

remember that anyone can use your undocumented APIs

I think the most important thing to know about this isnt actually how to use other peoples undocumented APIs. Its fun to do, but it has a lot of limitations and I dont actually do it that often.

Its much more important to understand that anyone can do this to your backend API! Everyone has developer tools and the network tab, and its pretty easy to see which parameters youre passing to the backend and to change them.

So if anyone can just change some parameters to get another users information, thats no good. I think most developers building publicly availble APIs know this, but Im mentioning it because everyone needs to learn it for the first time at some point :)


via: https://jvns.ca/blog/2022/03/10/how-to-use-undocumented-web-apis/

作者:Julia Evans 选题:lujun9972 译者:译者ID 校对:校对者ID

本文由 LCTT 原创编译,Linux中国 荣誉推出