As a Splunk Architect, I get quite a bit of questions surrounding what Splunk actually is. While Splunk is highly documented, it is quite complex. Unfortunately, that can leave people confused on where to start.
What is Splunk Enterprise? Splunk is an absolutely beautiful, fully customisable, logging and monitoring software. There are may built in features, along with apps built by Splunk and the community.
Example Splunk Enterprise Infrastructure:
A) Search Head: This is the server that, after data has been: received, processed, indexed accordingly, displays the results to screen. These results are based off manual searches, created reports and/or dashboards. For some apps, such as Enterprise Security, it may require you to install on a dedicated Search Head.
B) Indexer: This is the server that receives data, indexes information you tell it to, and when running searches, correlates information and sends back to the Search Head to be displayed.
C) License Master: Licensing is solely based on indexed data, not by the number of servers or the amount of network traffic sent. You can set a server to just be the license master, however in smaller environments that isn't necessary. You can install a License Master as a second part to a Search Head for example. In Splunk Enterprise version 6.0 and newer, the license violations are no-enforcement. Read more here.
D) Deployment Server: This magical thing is a necessity, even in small deployments. Imagine, you have 60 servers in production, you make a change that you want to roll out to each system. Without a Deployment Server you would need to login, make the configuration change, to each and every one of the 60 servers. Now imagine this happens twice, three times, four; that sounds horrible, so don't do it. A Deployment Server allows for automated deployment of both Apps and the associated configurations. i.e. it tells the Universal Forwarder what data needs to be sent to the Indexer.
E) Universal Forwarder: This is software that can be installed on supported devices that tells the device what data to send. It does not pre-index data, it does not remove data being sent. It cannot remove PII information, for example, if that is an issue you may need to consider using a Heavy Forwarder.
F) Heavy Forwarder: This is a full installation of Splunk, it allows for removing data that may not be allowed to be transmitted over a network, such as credit card information. It is rare to require a Heavy Forwarder. As an aside, this data would be already tagged and configured as your requirements, therefore technically it would be more data being sent. You would not use a Heavy Forwarder in the event that bandwidth or data limitation is a concern. I will not be covering Heavy Forwarders further, read more here.
G) SysLog Server: There will be devices not supported for Universal Forwarders, in that case you will likely send data via SysLog. It is best practice to have data sent to a SysLog server, and then sent along to the Splunk Indexer. Remember, Spunk is mainly a passive server*, it listens while servers send data to it, without confirmation. To reduce data being lost, it is a good idea to have a SysLog Server logging, and if Splunk needs to restart for any reason, it can pick up from there.
Note: I will not be covering Clustering, or Cluster Masters.
As you can see from the diagram above, the Splunk infrastructure isn't small. Additionally, virtualisation isn't recommended, especially for any Indexers. Read the Splunk Docs for System Requirements. The Universal Forwarder is a different installation than the others, read about the Universal Forwarder requirements here.
That completes my first Splunk Enterprise fundamentals post. Be aware that this series is solely covering on-premises and a typical smaller deployment. If you are looking at larger deployments and/or high availability requirements, you may need to investigate Clustering. I believe the approximate max one Indexer can handle is up to 250 GB/day of data, but I wouldn't quote me on that one... Please realise: more data, more concurrent searches, more users, means the more power your server is going to need. It's vital you review current and possible growth of your environment, before deploying!
*Splunk Indexers can be configured to download files one time, as well as check for updates to files on remote servers. You can also have tasks which run a script.