How to Analyze Your Log Data Using the Log Search API in InsightIDR

How to Analyze Your Log Data Using the Log Search API in InsightIDR

InsightIDR’s Log Search interface allows you to easily query and visualize your log data from within the product, but sometimes you may want to query your log data from outside the application.

For example, if you want to run a query to pull down log data from InsightIDR, you could use Rapid7’s security orchestration and automation tool, InsightConnect, to create a workflow that queries your log data and carries out automation scripts. Or, you could use a script that runs locally within your environment to retrieve a daily total of invalid logons. This is where the Log Search REST API can be used.

The example below is going to show you how to do the following:

  1. Get a list of logs in your account
  2. Build a request to run a query against a specific log within your account
  3. Execute the log search query and extract the results

1. Getting a platform API key

First, you will need to obtain an API key to authenticate the requests you make. You can access this from the API keys page in the Rapid7 Platform home page after you log in. You’ll need to be a Platform Admin in order to generate an API key, so if you have not been assigned this role, you will need to request the key from someone in your organization who is a Platform Admin.

2. Getting a list of your logs

To query your log data, you will need the ID of the log(s) in order to build up the parameters for our log search query.

The API endpoint for the logs resource is: https://us.api.insight.rapid7.com/log_search/management/logs.

Remember to change “us” in the URL to the region that your InsightIDR account is in (e.g., us, eu, ca, ap, au).

You’ll need to add a header to the request in order to authenticate. The header name is x-api-key, and the value will be the API key that was generated earlier.

Then, it’s simply a case of making a GET request to the endpoint. The response will be a JSON document, with all the logs in our account.

Here is a snippet from the response:

{
    "logs": [
 {
            "id": "d9a0ef9f-2fc6-4496-bd76-598f327588a2",
            "name": "AD Account Logins",
            "tokens": [
                "6d8e1728-82d1-4431-99e6-aaec34f314dc"
            ],
            "structures": [],
            "user_data": {
                "le_agent_filename": "",
                "le_agent_follow": "false"
            },
            "source_type": "token",
            "token_seed": null,
            "retention_period": "default",
            "links": [
                {
                    "rel": "Related",
                    "href": "https://us.api.insight.rapid7.com/log_search/management/logs/d9a0ef9f-2fc6-4496-bd76-598f327588a2/topkeys"
                }
            ],
            "logsets_info": [               
                {
                    "id": "90db93bb-283f-47ed-9a6f-a9ac9481661f",
                    "name": "Active Directory",
                    "links": [
                        {
                            "rel": "Self",
                            "href": "https://us.api.insight.rapid7.com/log_search/management/logsets/90db93bb-283f-47ed-9a6f-a9ac9481661f"
                        }
                    ]
                }
            ]
        },
       ...
}

The log you are interested in is called “AD Account Logins.” There is a lot of information about the log, including which log set it is in. You just need the “id” field, as this is the unique identifier that will be used to query its data. If you want to query multiple logs, you would need to build a comma-separated list.

Now that you have the ID, you are ready to start building the request that you’ll use to query the log data that you are interested in.

3. Choose the query

The log’s ID is one of just a few parameters that are needed. Next up is the LEQL query that you want to run. For this example, you want a count of the invalid logons in your Active Directory log on an hourly basis.

where(FAILED_BAD_LOGIN) calculate(count) timeslice(1h

As a quick refresher, this query will filter your log data to only return log entries where the string “FAILED_BAD_LOGIN” is found. It will then count all the matching log entries, and finally, as the results are returned as both an overall count and a time series, the final part of the query will return the results in hourly sections.

Now you have the log’s ID, the query, the last part is the time range for the query. You can choose to query from one specific timestamp to another, or you can use relative time frames. For today, let’s discuss the invalid logins, so you don’t need to work out the timestamps.

4. Building the query request

You now have the three parameters required for the query, so you can build the URL. This will start with the endpoint’s URL:

https://us.api.insight.rapid7.com/log_search/query/logs/

Now, you will add the log ID that you want to query:

https://us.api.insight.rapid7.com/log_search/query/logs/d9a0ef9f-2fc6-4496-bd76-598f327588a2

Next is the time range—note the question mark after the Log ID:

https://us.api.insight.rapid7.com/log_search/query/logs/d9a0ef9f-2fc6-4496-bd76-598f327588a2?time_range=Today

And finally, comes the LEQL query. It is good practice to URL-encode the query. Note the ampersand between the time range and query.

https://us.api.insight.rapid7.com/log_search/query/logs/d9a0ef9f-2fc6-4496-bd76-598f327588a2?time_range=Today&query=where%28FAILED_BAD_LOGIN%29%20calculate%28count%29%20timeslice%281h%29

5. Making the request

Again, you need to add the x-api-key header to your request so it can be authenticated. A JSON document will be returned, but there’s a difference from the first call that you made to get the list of logs.

Instead of receiving a HTTP 200 response code to indicate a successful request, you’ll receive a HTTP 202 response code, and the results of your query are not in the document body. Instead, there is a URL. Here is a snippet from the response:

{
   ….
    "progress": 0,
    "events": [],
    "links": [
        {
            "rel": "Self",
            "href": "https://us.api.insight.rapid7.com/log_search/query/b0bb0930-ab27-4698-88f5-edd909a41403:0:f114c54370bd54ed04948ba4f88ce4b6d8a0b6aa:50:17ac4071531d54f24d5cda7d194815d26007c185:?log_keys=d9a0ef9f-2fc6-4496-bd76-598f327588a2&time_range=Today"
        }
    ],
    ….
}

This is because a query may not always return results immediately, especially if the query is being run against a lot of data. Rather than requiring the request to remain active until the results are returned, the URL that is provided allows the consumer of the API to keep polling until the results are returned. The progress field will provide an estimate of the query’s status.

As with the first two requests that have been made, you need to add the x-api-key header before you attempt to access the URL that was returned.

If the query has not yet completed, you will again receive a HTTP 202 response, with a URL. If it has completed, then you will get a HTTP 200 response, and this time the query’s results will be in the document body.

6. Getting the results back

Here is the result of the successful query:

{
    "logs": [
        "d9a0ef9f-2fc6-4496-bd76-598f327588a2"
    ],
    "statistics": {
        "cardinality": 0,
        "granularity": 3335708,
        "from": 1574208000000,
        "to": 1574251364214,
        "type": "count",
        "stats": {
            "global_timeseries": {
                "count": 1803
            }
        },
        "groups": [],
        "status": 200,
        "timeseries": {
            "global_timeseries": [
                {
                    "count": 128
                },
                {
                    "count": 133
                },
                {
                    "count": 145
                },
                {
                    "count": 153
                },
                {
                    "count": 146
                },
                {
                    "count": 146
                },
                {
                    "count": 166
                },
                {
                    "count": 147
                },
                {
                    "count": 144
                },
                {
                    "count": 163
                },
                {
                    "count": 76
                },
                {
                    "count": 86
                },
                {
                    "count": 170
                }
            ]
        },
        "count": 1803
    },
    "leql": {
        "statement": "where(FAILED_BAD_LOGIN) calculate(count) timeslice(1h)",
        "during": {
            "from": 1574208000000,
            "to": 1574251364214,
            "time_range": "Today"
        }
    }
}

7: Understanding the results

The statistics of the response is the overall calculation for the time range that you selected. In this example, you can see that there were 1,803 failed logins in total today.

The timeseries section returns the breakdown of failed logins per hour, so you can see the distribution over time.

You can run any query that you run in the UI, and the structure of the results you get back will vary depending on the type of query you run. Groupby queries will return an array similar to above, while a query with no calculation query will return an array of log entries.

Putting it all together

Now, let’s hook everything up together. In this example, we will use python, but one of the many benefits of a REST API is that there are so many different ways to interact with them.

Want to play around with the API without any coding? Check out Postman.

This python script combines the steps listed above. You need to replace the values near the top of the script with the correct ones for your account.

import urllib
import urllib3
import json
import time

# Your InsightOps data center (us/eu/ca/au/ap)
REGION = "us"

# Enter your API key here
API_KEY = "XXX"

# no need to change this
BASE_URL = "https://" + REGION + ".api.insight.rapid7.com/log_search"


def get_logs():
  url = BASE_URL + "/management/logs/"

  http = urllib3.PoolManager()
  request = http.request("GET", url, headers={"x-api-key": API_KEY})

  json_object = json.loads(request.data)
  for logs in json_object['logs']:
     log_name = logs['name']
     log_id = logs['id']

     # Change this to define how you choose which log(s) you want to get the ID for
     if (log_name == "AD Account Logins"):
        return log_id


def run_query(url):
  http = urllib3.PoolManager()
  request = http.request("GET", url, headers={"x-api-key": API_KEY})

  if (request.status == 202):
     # we got a continue response, which means we need to poll until we get the results
     print("Received 'continue' response, polling again")
     json_object = json.loads(request.data)
     continue_url = json_object['links'][0]['href']
     run_query(continue_url)
  elif (request.status == 200):
     json_object = json.loads(request.data)

     if 'links' in json_object:
        # We got a successful response, but it's just partial results.
        # We can get a running total if we want at this stage, but we need to keep polling to get the final result

        fn_name = str(json_object['partial']['type'])

        continue_url = json_object['links'][0]['href']
        if 'partial' in json_object:
           print("Partial results returned (" + str(json_object['partial']['stats']['global_timeseries'][fn_name]) + "), polling for final results")
        run_query(continue_url)
     else:
        # The response no longer has a link for us to poll, so we're all done.

        # get the name of the calculation that was done
        fn_name = str(json_object['statistics']['type'])
        print("Query is complete, final calculation result: " + str(json_object['statistics']['stats']['global_timeseries'][fn_name]))
  else:
     # uh oh
     print("Looks like we got an error")
     print(request.data)


if __name__ == '__main__':
  print("Getting a list of all the logs")
  log_id = get_logs()
  print("Filtering to get the log ID:" + str(log_id))
  leql = "where(FAILED_BAD_LOGIN) calculate(count) timeslice(1h)"
  timerange = "Last 30 days"

  url = BASE_URL + "/query/logs/"
  url = url + log_id + "?time_range=" + timerange + "&query=" + leql

  if log_id != "":
     print("Starting my LEQL query")
     run_query(url)

Once you’ve done that, you can run it from a command prompt / terminal window like this: python3 queryMyLog.py. Here’s the output of the script:

Getting a list of all the logs
Filtering to get the log ID:d9a0ef9f-2fc6-4496-bd76-598f327588a2
Starting my LEQL query
Received 'continue' response, polling again
Partial results returned (28631.0), polling for final results
Partial results returned (78296.0), polling for final results
Query is complete, final calculation result: 105615.0

In conclusion

Now you know how to interact with your Log data using the REST API, allowing you to run queries against your log data outside of the InsightIDR user interface.

By combining scripts with tools such as InsightConnect, you can open up a whole new set of possibilities to automate your environment, or embed data from InsightIDR in your local environment.

You can read more about the InsightIDR API in our documentation.

Start a free trial of InsightIDR today

Get Started

Original Source