Discussion:
[ansible-project] ansible ec2_facts returns false data (if there is NAT on the system level; This is ok if You use AWS router interface gateway)
sirkubax
2015-07-14 12:16:49 UTC
Permalink
*THE PROBLEM:*
I've just realised why sometimes my playbook fills the template with false
data

This happens, when the instance is in my VPC subnet (with internet
gateway), while in configuration there is *NAT route table on the system
level*, then *reguest to the internet goes through NAT instance *and the
AWS response is *covered.*
Then the* NAT_instance facts *are *returned*, NOT the current_instance
facts about.


*THE DEBUGGING:*

If You look into the code, the ec2_facts fetch a bunch of requests to

'http://169.254.169.254/latest/meta-data'


in Example:

curl http://169.254.169.254/latest/meta-data/local-ipv4
*172.16.0.200*


while* real data* is

eth0: ***
inet *172.16.0.110*/24 brd 172.16.0.255 scope global eth0


THE INSTANCE CONFIGURATION:

$ ip r
default via 172.16.0.200 dev eth0
172.16.0.0/24 dev eth0 proto kernel scope link src 172.16.0.110
172.16.0.0/16 via 172.16.0.1 dev eth0

$ ip a

eth0: ***
inet *172.16.0.110*/24 brd 172.16.0.255 scope global eth0



If You keep remote files, You can check it Yourself

export ANSIBLE_KEEP_REMOTE_FILES=1

and then

python
/home/ubuntu/.ansible/tmp/ansible-tmp-1436872330.49-72199016469620/ec2_facts

will return as one of the facts:
"ansible_ec2_local_ipv4": "172.16.0.200",
(or run a curl)

curl http://169.254.169.254/latest/meta-data/local-ipv4


*THE CURRENT WORKAROUND:*

1. do NOT use (in *roles *nor *tasks*)
1. - action: ec2_facts
2. DRAWBACKS:
1. You will not have some variables available (*ansible_ec2_* will
be unavailable)*
2. You will have only *ec2_* facts *from you LOCAL* inventory
cache (ec2.py* if I'm correct now)
3. If You add in playbook ("gather_facts: True") then You can also
use *ansible_* facts *gathered by *setup.py* module
1. so instead of *ansible_ec2_local_ipv4* You can use
*ansible_eth0['ipv4]['address']*
4. *BUT* this can bring some problems when You have a role, that
expects some vatiable (example: ansible_hostname), but in the playbook You
have disabled system fact gathering ("gather_facts: False") - You
will have to be carefull
5. *OR* You would like to access some AWS variable, independent
form Your LOCAL cache
2. configure you VPC routing tables so it will point to
NAT-instance-interface, rather than IP address
1. 0.0.0.0/0 eni-xxx / i-xxx
1. instead of:
1. 0.0.0.0/0 igw-zzzzz + system routing tables
2. Then You do not have to override the routing table on the system
level
3. You rely on AWS Router
4. DRAWBACKS
1. You will have to change the routing table in the VPC, pointing
to other phisical interface, when Your NAT instance will shut down
1. vs
2. If kept with system routing table, You will lunch new
NAT-instance with "old IP address" attached

*QUESTIONS / CONCLUSION:*

1. Be aware about ec2_facts limitation
2. If possible - rely on Amazon Routing Table
1. How You prevent SPOF in Your VPC subnets?
2. What is Your best-practise to configure VPC subnet (private and
public), so they have internet outside access (for github, apt), and are
still safe without SPOF that is NAT-instance?
--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-project+***@googlegroups.com.
To post to this group, send email to ansible-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/e901c654-1d06-46c2-8c7b-09253c96d235%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Igor Cicimov
2015-07-15 00:21:37 UTC
Permalink
I'm using Ansible with AWS VPC's, where most of them have public and
private subnets, and have never had the problem you are seeing. This is
definitely a misconfiguration on your side and nothing to do with Ansible.
The ec2_facts is doing the right thing, there is no other way of collecting
data except querying the meta-data repository which is what the AWS CLI
tools do anyway. Meaning you will get wrong data using AWS CLI as well.
Don't forget you are in the cloud and your networking is configured in the
hypervisor/SDN level and NOT on instance level. Meaning you can create as
many network interfaces as you want on instance level and set IP's on those
but none of them will work since you have bypassed the SDN and there is no
record of those in the meta-data repository. Which finally means that
collecting facts on the instance locally really means nothing if those
values don't match what is in the meta-data repository.

Now that we have that cleared, lets move to your problem, which looks to me
is AWS routing tables. Or more specific the lack of those. For an instance
to be in a private subnet it needs separate routing table from the VPC's
default one (which has IGW created for you when the VPC was created) that
has the NAT instance as IGW (internet gateway). And that is all you need,
you don't have to set any routing tables on the system level, the SDN will
route the traffic for you.

Hope this makes sense. Since you haven't provided any info about your
subnets, routing tables, ACL's etc. this is more of a guess what's going on
so please correct my assumptions if needed.

Thanks,
Igor
Post by sirkubax
*THE PROBLEM:*
I've just realised why sometimes my playbook fills the template with false
data
This happens, when the instance is in my VPC subnet (with internet
gateway), while in configuration there is *NAT route table on the system
level*, then *reguest to the internet goes through NAT instance *and the
AWS response is *covered.*
Then the* NAT_instance facts *are *returned*, NOT the current_instance
facts about.
*THE DEBUGGING:*
If You look into the code, the ec2_facts fetch a bunch of requests to
'http://169.254.169.254/latest/meta-data'
curl http://169.254.169.254/latest/meta-data/local-ipv4
*172.16.0.200*
while* real data* is
eth0: ***
inet *172.16.0.110*/24 brd 172.16.0.255 scope global eth0
$ ip r
default via 172.16.0.200 dev eth0
172.16.0.0/24 dev eth0 proto kernel scope link src 172.16.0.110
172.16.0.0/16 via 172.16.0.1 dev eth0
$ ip a
eth0: ***
inet *172.16.0.110*/24 brd 172.16.0.255 scope global eth0
If You keep remote files, You can check it Yourself
export ANSIBLE_KEEP_REMOTE_FILES=1
and then
python
/home/ubuntu/.ansible/tmp/ansible-tmp-1436872330.49-72199016469620/ec2_facts
"ansible_ec2_local_ipv4": "172.16.0.200",
(or run a curl)
curl http://169.254.169.254/latest/meta-data/local-ipv4
*THE CURRENT WORKAROUND:*
1. do NOT use (in *roles *nor *tasks*)
1. - action: ec2_facts
1. You will not have some variables available (*ansible_ec2_*
will be unavailable)*
2. You will have only *ec2_* facts *from you LOCAL* inventory
cache (ec2.py* if I'm correct now)
3. If You add in playbook ("gather_facts: True") then You can
also use *ansible_* facts *gathered by *setup.py* module
1. so instead of *ansible_ec2_local_ipv4* You can use
*ansible_eth0['ipv4]['address']*
4. *BUT* this can bring some problems when You have a role, that
expects some vatiable (example: ansible_hostname), but in the playbook You
have disabled system fact gathering ("gather_facts: False") -
You will have to be carefull
5. *OR* You would like to access some AWS variable, independent
form Your LOCAL cache
2. configure you VPC routing tables so it will point to
NAT-instance-interface, rather than IP address
1. 0.0.0.0/0 eni-xxx / i-xxx
1. 0.0.0.0/0 igw-zzzzz + system routing tables
2. Then You do not have to override the routing table on the system
level
3. You rely on AWS Router
4. DRAWBACKS
1. You will have to change the routing table in the VPC,
pointing to other phisical interface, when Your NAT instance will shut down
1. vs
2. If kept with system routing table, You will lunch new
NAT-instance with "old IP address" attached
*QUESTIONS / CONCLUSION:*
1. Be aware about ec2_facts limitation
2. If possible - rely on Amazon Routing Table
1. How You prevent SPOF in Your VPC subnets?
2. What is Your best-practise to configure VPC subnet (private and
public), so they have internet outside access (for github, apt), and are
still safe without SPOF that is NAT-instance?
--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-project+***@googlegroups.com.
To post to this group, send email to ansible-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/dbf2d95e-49c7-4d3b-b3cc-29f42c87abe0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Igor Cicimov
2015-07-15 00:52:01 UTC
Permalink
Have to correct myself, you do provide the subnet information. So in answer
to you questions/conclusions they way I do it is:

- Use private routing table for the private subnets pointing to the NAT as
IGW
- Use 2 x NAT instances and NAT takeover script that modifies the the
private subnets routing table and points the IGW to itself in case the
other NAT instance has failed
Post by Igor Cicimov
I'm using Ansible with AWS VPC's, where most of them have public and
private subnets, and have never had the problem you are seeing. This is
definitely a misconfiguration on your side and nothing to do with Ansible.
The ec2_facts is doing the right thing, there is no other way of collecting
data except querying the meta-data repository which is what the AWS CLI
tools do anyway. Meaning you will get wrong data using AWS CLI as well.
Don't forget you are in the cloud and your networking is configured in the
hypervisor/SDN level and NOT on instance level. Meaning you can create as
many network interfaces as you want on instance level and set IP's on those
but none of them will work since you have bypassed the SDN and there is no
record of those in the meta-data repository. Which finally means that
collecting facts on the instance locally really means nothing if those
values don't match what is in the meta-data repository.
Now that we have that cleared, lets move to your problem, which looks to
me is AWS routing tables. Or more specific the lack of those. For an
instance to be in a private subnet it needs separate routing table from the
VPC's default one (which has IGW created for you when the VPC was created)
that has the NAT instance as IGW (internet gateway). And that is all you
need, you don't have to set any routing tables on the system level, the SDN
will route the traffic for you.
Hope this makes sense. Since you haven't provided any info about your
subnets, routing tables, ACL's etc. this is more of a guess what's going on
so please correct my assumptions if needed.
Thanks,
Igor
Post by sirkubax
*THE PROBLEM:*
I've just realised why sometimes my playbook fills the template with
false data
This happens, when the instance is in my VPC subnet (with internet
gateway), while in configuration there is *NAT route table on the system
level*, then *reguest to the internet goes through NAT instance *and the
AWS response is *covered.*
Then the* NAT_instance facts *are *returned*, NOT the current_instance
facts about.
*THE DEBUGGING:*
If You look into the code, the ec2_facts fetch a bunch of requests to
'http://169.254.169.254/latest/meta-data'
curl http://169.254.169.254/latest/meta-data/local-ipv4
*172.16.0.200*
while* real data* is
eth0: ***
inet *172.16.0.110*/24 brd 172.16.0.255 scope global eth0
$ ip r
default via 172.16.0.200 dev eth0
172.16.0.0/24 dev eth0 proto kernel scope link src 172.16.0.110
172.16.0.0/16 via 172.16.0.1 dev eth0
$ ip a
eth0: ***
inet *172.16.0.110*/24 brd 172.16.0.255 scope global eth0
If You keep remote files, You can check it Yourself
export ANSIBLE_KEEP_REMOTE_FILES=1
and then
python
/home/ubuntu/.ansible/tmp/ansible-tmp-1436872330.49-72199016469620/ec2_facts
"ansible_ec2_local_ipv4": "172.16.0.200",
(or run a curl)
curl http://169.254.169.254/latest/meta-data/local-ipv4
*THE CURRENT WORKAROUND:*
1. do NOT use (in *roles *nor *tasks*)
1. - action: ec2_facts
1. You will not have some variables available (*ansible_ec2_*
will be unavailable)*
2. You will have only *ec2_* facts *from you LOCAL* inventory
cache (ec2.py* if I'm correct now)
3. If You add in playbook ("gather_facts: True") then You can
also use *ansible_* facts *gathered by *setup.py* module
1. so instead of *ansible_ec2_local_ipv4* You can use
*ansible_eth0['ipv4]['address']*
4. *BUT* this can bring some problems when You have a role,
that expects some vatiable (example: ansible_hostname), but in the playbook
False") - You will have to be carefull
5. *OR* You would like to access some AWS variable, independent
form Your LOCAL cache
2. configure you VPC routing tables so it will point to
NAT-instance-interface, rather than IP address
1. 0.0.0.0/0 eni-xxx / i-xxx
1. 0.0.0.0/0 igw-zzzzz + system routing tables
2. Then You do not have to override the routing table on the
system level
3. You rely on AWS Router
4. DRAWBACKS
1. You will have to change the routing table in the VPC,
pointing to other phisical interface, when Your NAT instance will shut down
1. vs
2. If kept with system routing table, You will lunch new
NAT-instance with "old IP address" attached
*QUESTIONS / CONCLUSION:*
1. Be aware about ec2_facts limitation
2. If possible - rely on Amazon Routing Table
1. How You prevent SPOF in Your VPC subnets?
2. What is Your best-practise to configure VPC subnet (private and
public), so they have internet outside access (for github, apt), and are
still safe without SPOF that is NAT-instance?
--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-project+***@googlegroups.com.
To post to this group, send email to ansible-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/3378f92a-d933-4f6b-ad64-066ec04b51a0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Jakub Muszynski' via Ansible Project
2015-07-15 11:14:22 UTC
Permalink
Thanks Igor.

You are right, it is not ansible "bug", but an configuration-feature, tough
it is the "bad one" since it silently provides the false data. I had to dig
into the source code to track it down.
There could be some warning in ec2_facts detecting default route, but it
would be some work :/

---------------------
To sum up mine state - I've worked out the solution that is almost the same
You have provided :)
I will describe it in my words:

I did not provide enough data about my subnets
I have public subnet, and a private one. Faulty instances were in the
public subnet with their system-local-routeing table containing "default
via 172.16.0.200 dev eth0"). I have moved that instances to private subnet,
and set its routing table in the way, that the default traffic goes via
NAT-instance in public subnet:

Destination

Target

Status

Propagated

172.16.0.0/16

local

Active

No

0.0.0.0/0

eni-ezzzzb / i-2xxxx

Active

No


So that's exactly what You did stated :)

To fix the issue in the public subnet (with "default via 172.16.0.200 dev
eth0"), it would be enough to add

ip r a 169.254.169.254 via 172.16.0.1

verification:

curl http://169.254.169.254/latest/meta-data/local-ipv4


since the

modules/core/cloud/amazon/ec2_facts.py

defines the querry parameter as:

ec2_metadata_uri = 'http://169.254.169.254/latest/meta-data/'



So I'll have to add 2xNAT and I'll be happy :)




On Wed, Jul 15, 2015 at 2:52 AM, Igor Cicimov <
Post by Igor Cicimov
Have to correct myself, you do provide the subnet information. So in
- Use private routing table for the private subnets pointing to the NAT as
IGW
- Use 2 x NAT instances and NAT takeover script that modifies the the
private subnets routing table and points the IGW to itself in case the
other NAT instance has failed
Post by Igor Cicimov
I'm using Ansible with AWS VPC's, where most of them have public and
private subnets, and have never had the problem you are seeing. This is
definitely a misconfiguration on your side and nothing to do with Ansible.
The ec2_facts is doing the right thing, there is no other way of collecting
data except querying the meta-data repository which is what the AWS CLI
tools do anyway. Meaning you will get wrong data using AWS CLI as well.
Don't forget you are in the cloud and your networking is configured in the
hypervisor/SDN level and NOT on instance level. Meaning you can create as
many network interfaces as you want on instance level and set IP's on those
but none of them will work since you have bypassed the SDN and there is no
record of those in the meta-data repository. Which finally means that
collecting facts on the instance locally really means nothing if those
values don't match what is in the meta-data repository.
Now that we have that cleared, lets move to your problem, which looks to
me is AWS routing tables. Or more specific the lack of those. For an
instance to be in a private subnet it needs separate routing table from the
VPC's default one (which has IGW created for you when the VPC was created)
that has the NAT instance as IGW (internet gateway). And that is all you
need, you don't have to set any routing tables on the system level, the SDN
will route the traffic for you.
Hope this makes sense. Since you haven't provided any info about your
subnets, routing tables, ACL's etc. this is more of a guess what's going on
so please correct my assumptions if needed.
Thanks,
Igor
Post by sirkubax
*THE PROBLEM:*
I've just realised why sometimes my playbook fills the template with
false data
This happens, when the instance is in my VPC subnet (with internet
gateway), while in configuration there is *NAT route table on the
system level*, then *reguest to the internet goes through NAT instance *and
the AWS response is *covered.*
Then the* NAT_instance facts *are *returned*, NOT the current_instance
facts about.
*THE DEBUGGING:*
If You look into the code, the ec2_facts fetch a bunch of requests to
'http://169.254.169.254/latest/meta-data'
curl http://169.254.169.254/latest/meta-data/local-ipv4
*172.16.0.200*
while* real data* is
eth0: ***
inet *172.16.0.110*/24 brd 172.16.0.255 scope global eth0
$ ip r
default via 172.16.0.200 dev eth0
172.16.0.0/24 dev eth0 proto kernel scope link src 172.16.0.110
172.16.0.0/16 via 172.16.0.1 dev eth0
$ ip a
eth0: ***
inet *172.16.0.110*/24 brd 172.16.0.255 scope global eth0
If You keep remote files, You can check it Yourself
export ANSIBLE_KEEP_REMOTE_FILES=1
and then
python
/home/ubuntu/.ansible/tmp/ansible-tmp-1436872330.49-72199016469620/ec2_facts
"ansible_ec2_local_ipv4": "172.16.0.200",
(or run a curl)
curl http://169.254.169.254/latest/meta-data/local-ipv4
*THE CURRENT WORKAROUND:*
1. do NOT use (in *roles *nor *tasks*)
1. - action: ec2_facts
1. You will not have some variables available (*ansible_ec2_*
will be unavailable)*
2. You will have only *ec2_* facts *from you LOCAL* inventory
cache (ec2.py* if I'm correct now)
3. If You add in playbook ("gather_facts: True") then You can
also use *ansible_* facts *gathered by *setup.py* module
1. so instead of *ansible_ec2_local_ipv4* You can use
*ansible_eth0['ipv4]['address']*
4. *BUT* this can bring some problems when You have a role,
that expects some vatiable (example: ansible_hostname), but in the playbook
False") - You will have to be carefull
5. *OR* You would like to access some AWS variable,
independent form Your LOCAL cache
2. configure you VPC routing tables so it will point to
NAT-instance-interface, rather than IP address
1. 0.0.0.0/0 eni-xxx / i-xxx
1. 0.0.0.0/0 igw-zzzzz + system routing tables
2. Then You do not have to override the routing table on the
system level
3. You rely on AWS Router
4. DRAWBACKS
1. You will have to change the routing table in the VPC,
pointing to other phisical interface, when Your NAT instance will shut down
1. vs
2. If kept with system routing table, You will lunch new
NAT-instance with "old IP address" attached
*QUESTIONS / CONCLUSION:*
1. Be aware about ec2_facts limitation
2. If possible - rely on Amazon Routing Table
1. How You prevent SPOF in Your VPC subnets?
2. What is Your best-practise to configure VPC subnet (private
and public), so they have internet outside access (for github, apt), and
are still safe without SPOF that is NAT-instance?
--
You received this message because you are subscribed to a topic in the
Google Groups "Ansible Project" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/ansible-project/oTO0nk8Q-uc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
To view this discussion on the web visit
https://groups.google.com/d/msgid/ansible-project/3378f92a-d933-4f6b-ad64-066ec04b51a0%40googlegroups.com
<https://groups.google.com/d/msgid/ansible-project/3378f92a-d933-4f6b-ad64-066ec04b51a0%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-project+***@googlegroups.com.
To post to this group, send email to ansible-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/CAGqkPEjWFMvr%2BAFmjJLbbfd71xoyJh9mroYFbcMCbfPU4A%3DAqQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Loading...